Introduction to Data Mining. What data mining is, what data mining isn't, and the four dimensions of data mining.

Introduction to Data Mining

Dylan | Jan 10, 2021

Post Thumbnail

What Exactly Is Data Mining?

Many people struggle to distinguish data mining from related concepts such as machine learning, data science, and big data analytics. Simply put, data mining is the process of discovering patterns in data. These crucial patterns reveal intrinsic and important properties of the objects in our dataset.

Unfortunately, one of the leading reasons for the confusion around data mining is the result of data mining having so many alternative names. Some of these names include:

  • Knowledge Discovery in Databases
  • Knowledge Discovery from Data (KDD)
  • Knowledge Extraction
  • Data/Pattern Analysis
  • Data Archaeology
  • Data Dredging
  • Information Harvesting

Concepts Related to Data Mining

Why exactly are concepts such as machine learning, data science, big data, data analytics often confused with data mining? Well as it turns out, these concepts are highly connected to the data mining process. Here’s a brief overview of some of these ideas and the role they play in the data mining process.

Techniques Used in the Data Mining Process

  • Machine Learning
  • Pattern Recognition

Systems that Support the Data Mining Process

  • Database Management Systems
  • Data Warehouses

Data Mining Is a Key Component to the Broad Fields

  • Big Data Analytics
  • Data Science

Specific Application of Data Mining

  • Business Intelligence

What Data Mining Is Not

To further clarify what Data Mining is, let’s briefly look at two examples of what it is not.

Data Mining is not looking up a phone number in a database. This data is already logged and exists.

Data Mining is not querying a search engine for pages containing the word “Flower”. This data is already logged and exists.

The Four Dimensions of Data Mining

Dimension One: Data to be Mined (Input)

Real world data can be characterized by its type and application. These factors will influence your options and choices in further dimensions.

Type
There are many different ways to represent information objects with data. Some examples of popular data types are:

  • vector/matrix itemsets
  • squences
  • time series
  • spatiotemporal data streams
  • graphs

Application
There are many different genres of data that can be used in data mining. Some examples include:

  • transactional data
  • text and web
  • multimedia
  • social and information networks
  • biological data
  • user behaviors

Dimension Two: Knowledge to be Discovered (Output)

This is also commonly referred to as data mining functionalities. There are three primary functionalities.

1. Lower-level Output
This includes general patterns of data, similarity of data, or associations of data.

2. Decision-driven Output
These include popular machine learning methods such as classification, clustering, trend/deviation, prediction, and outlier analysis.

3. Descriptive Statistics
Fundamental analysis and understanding of your data, such as measuring central tendency, dispersion, and association.

Dimension Three: Techniques Utilized (Connect Input to Output)

There are many techniques at our disposal such as data cubing, machine learning, statistics, pattern recognition, user modeling, visualization, and data-intensive computing.

Dimension Four: Applications Adopted (Where to use?)

Some popular examples of applied data mining across industries include:
  • Retail - Advertising, Market Segmentation
  • Telecommunication - Spam Call Detection
  • Banking - Loan Approvals, Estimate Credit Scores
  • Social Networks - Facebook, Twitter
  • Scientific Discoveries - Biology Data Mining
  • Web Search - Smart Question Answering
  • Stock Market Analysis - Choosing Stocks
  • Text Mining - Natural Language Processing
  • Clinics - Health Informatics

Conclusion

Hopefully you have a better understanding of data mining, both what it is and what it is not. For a deeper dive into the domain of data mining, I recommend the textbook “Data Mining: Concepts and Techniques” by Jiawei Han, Jian Pei, and Micheline Kamber. Purchase this book on Amazon (no affiliation). As always, I am more than happy to answer any questions in the comments. Thanks for reading and happy coding from Nimble Coding!