Introduction to Data Mining
Dylan | Jan 10, 2021
What Exactly Is Data Mining?Many people struggle to distinguish data mining from related concepts such as machine learning, data science, and big data analytics. Simply put, data mining is the process of discovering patterns in data. These crucial patterns reveal intrinsic and important properties of the objects in our dataset.
Unfortunately, one of the leading reasons for the confusion around data mining is the result of data mining having so many alternative names. Some of these names include:
- Knowledge Discovery in Databases
- Knowledge Discovery from Data (KDD)
- Knowledge Extraction
- Data/Pattern Analysis
- Data Archaeology
- Data Dredging
- Information Harvesting
Concepts Related to Data MiningWhy exactly are concepts such as machine learning, data science, big data, data analytics often confused with data mining? Well as it turns out, these concepts are highly connected to the data mining process. Here’s a brief overview of some of these ideas and the role they play in the data mining process.
Techniques Used in the Data Mining Process
- Machine Learning
- Pattern Recognition
Systems that Support the Data Mining Process
- Database Management Systems
- Data Warehouses
Data Mining Is a Key Component to the Broad Fields
- Big Data Analytics
- Data Science
Specific Application of Data Mining
- Business Intelligence
What Data Mining Is NotTo further clarify what Data Mining is, let’s briefly look at two examples of what it is not.
Data Mining is not looking up a phone number in a database. This data is already logged and exists.
Data Mining is not querying a search engine for pages containing the word “Flower”. This data is already logged and exists.
The Four Dimensions of Data Mining
Dimension One: Data to be Mined (Input)Real world data can be characterized by its type and application. These factors will influence your options and choices in further dimensions.
There are many different ways to represent information objects with data. Some examples of popular data types are:
- vector/matrix itemsets
- time series
- spatiotemporal data streams
There are many different genres of data that can be used in data mining. Some examples include:
- transactional data
- text and web
- social and information networks
- biological data
- user behaviors
Dimension Two: Knowledge to be Discovered (Output)This is also commonly referred to as data mining functionalities. There are three primary functionalities.
1. Lower-level Output
This includes general patterns of data, similarity of data, or associations of data.
2. Decision-driven Output
These include popular machine learning methods such as classification, clustering, trend/deviation, prediction, and outlier analysis.
3. Descriptive Statistics
Fundamental analysis and understanding of your data, such as measuring central tendency, dispersion, and association.
Dimension Three: Techniques Utilized (Connect Input to Output)There are many techniques at our disposal such as data cubing, machine learning, statistics, pattern recognition, user modeling, visualization, and data-intensive computing.
Dimension Four: Applications Adopted (Where to use?)Some popular examples of applied data mining across industries include:
- Retail - Advertising, Market Segmentation
- Telecommunication - Spam Call Detection
- Banking - Loan Approvals, Estimate Credit Scores
- Social Networks - Facebook, Twitter
- Scientific Discoveries - Biology Data Mining
- Web Search - Smart Question Answering
- Stock Market Analysis - Choosing Stocks
- Text Mining - Natural Language Processing
- Clinics - Health Informatics