The process of Machine Learning often consists of a learning phase by complex algorithms running over huge amounts of data. This phase requires massive computer power and thus scalability in form of resources is required. This is where relational database management systems fall short as we will discuss in the next post. Also Machine Learning does not require consistency while consistency is one of the leading properties of database management systems.

Machine Learning brings the following challenges when considering a database perspective (Chen, Han, and Yu 1996):

  • Handling of different types of data
    Database applications support a variety of data types. It is unrealistic to expect data mining systems to operate on all types of data due to diversity of data types in the field of Machine Learning. Instead it is suggested that specific data mining systems should considered that target the task at hand.
  • Efficiency and scalability of data mining algorithms
    Learning algorithms must be efficient and scalable. When a system is efficient it requires less resources for the task at hand.
  • Usefulness, certainty, and expressiveness of data mining results
    Machine Learning requires new performance measures that express the accuracy, efficiency, performance, and/or uncertainty of data mining results.
  • Expression of various kinds of data mining results
    Machine Learning models and results should be able to be understood by non-experts, which requires them to be visualized or expressed in high-level languages. Being understood by non-experts also gives to multiple expressions levels that target the person evaluating them.
  • Interactive mining knowledge at multiple abstractions levels
    Since its not always certain from the beginning what should be discovered, a query should provide additional directions for further exploration. So queries should allow for continuous refinement.
  • Mining information from different sources of data
    Machine Learning systems should have the ability to operate on data sources of various types. Additionally, the large data sets and wide distribution of data encourage the development of parallel computing in Machine Learning.
  • Protection of privacy and data security
    In the field of Machine Learning data is being viewed from different angles and abstractions levels. This imposes new security and privacy challenges.

Chen, Ming-Syan, Jiawei Han, and Philip S. Yu. 1996. "Data Mining: An Overview from a Database Perspective." IEEE Transactions on Knowledge and Data Engineering 8 (6): 866--83.