data mining - OpenGenus IQ: Learn Algorithms, DL, System Design

Linguistic Data Mining and Corpus Linguistics

Linguistic Data Mining and Corpus Linguistics are two interrelated fields of computational linguistics that have gained significant attention in recent years. The article provides an overview of the key concepts and methods used in both, pros and cons and future prospects.

data mining

Forecasting flight delays [Data Mining Project]

The goal of this project at OpenGenus is to use historical data to create a forecasting model for flight delays.

Machine Learning (ML)

Predicting employee attrition [Data Mining Project]

Employee attrition is the process of employees leaving an organization for various reasons. In this article at OpenGenus, we have explained a Data Mining approach (with source code) to predict employee attrition.

Machine Learning (ML)

30 Data Mining Projects [with source code]

In this article at OpenGenus, we will explore some of the most interesting and innovative data mining project ideas that have been undertaken in recent years.

Machine Learning (ML)

Using ID3 Algorithm to build a Decision Tree to predict the weather

ID3 algorithm, stands for Iterative Dichotomiser 3, is a classification algorithm that follows a greedy approach of building a decision tree by selecting a best attribute that yields maximum Information Gain (IG) or minimum Entropy (H). We will use it to predict the weather and take a decision

Machine Learning (ML)

Porter Stemmer algorithm

Stemming is the process of reducing a word to its stem that affixes to suffixes and prefixes or to the roots of words lemma. We cover the algorithmic steps in Porter Stemmer algorithm, a native implementation in Python, implementation using Porter Stemmer algorithm from NLTK library and conclusion.

Algorithms

Expectation Maximization Clustering Algorithm

Expectation Maximization Clustering algorithm is much more robust than K-Means, as it uses two parameters, Mean and Standard Deviation to define a particular cluster. This simple addition of calculating the Standard Deviation, helps the EM algorithm do well in a lot of fail cases of K-Means

Algorithms

Mean Shift Clustering Algorithm

Mean Shift clustering is an unsupervised clustering algorithm that groups data directly without being trained on labelled data. It is hierarchical in nature. It starts off with a kernel, which is basically a circular sliding window. The bandwidth the radius of this sliding window is pre-decided

clustering algorithm

K+ Means Clustering algorithm

K+ Means algorithm is a clustering algorithm and an improvement to K means clustering algorithm and solves the problem of choosing K (number of clusters). It is great at detecting outliers and forming new clusters. The complexity is O(t*(k^2)*n) which is slightly more than K means algorithm

clustering algorithm

Introduction to Clustering Algorithms

clustering is an unsupervised learning problem, since it seeks to classify or divide a dataset based on attributes of the points themselves rather than any given labels.

clustering algorithm

K-means Clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. The algorithm will categorize the items into k groups of similarity, Initialize k means with random values For a given number of iterations: Iterate through

clustering algorithm

DBSCAN Clustering Algorithm

Density-based spatial clustering of applications with noise is a data clustering unsupervised algorithm. The key idea is to divide the dataset into n ponts and cluster it depending on the similarity or closeness of some parameter.