Google colaboratory is a free, web-based Jupyter notebook environment. It allows you to write and execute Python code, document your code using Markdown, visualize datasets, and is an excellent tool for data scientists.
Data analysis tools
The process of Data analysis is the process of collection, organization, transformation and modeling of data to draw conclusions, make predictions and also make informed decisions. Data scientists mostly use python for data analysis and also other tools like tableau for visualization.
Week 2: Machine Learning Basics
Feature engineering is done is to make data better for the problem you are trying to solve using machine learning. LASSO is a popular technique used to select features.
Hyperparameters express “higher-level” properties of the model such as its complexity or how fast it should learn and are usually fixeed before training. Learning rate is an example. Hyperparameter tuning is choosing a set of optimal hyperparameters for a learning algorithm.
Classification is categorizing data into different classes. This is based on making predictions using past examples. We feed some examples where we know what the correct prediction is into the model and the model learns from these examples to make accurate predictions in the future.
K-Nearest Neighbours is an algorithm which is used for classification and regression and is based on the idea of considering the nearest K data points for calculations. This example uses KNN for text classification.
Decision tree is a popular machine learning algorithm mainly used for classification. Usually, ID3 algorithm is used to build a decision tree.
Support Vector Machine
SVMs are a particularly powerful and flexible class of supervised algorithms for both classification and regression. It has many advantages and applications. It can be easily implemented
Week 4: Regression
Regressionis a statistical method used in various fields to find out how strong the relationship between a dependent variable and one or more independent variable is.
Information Retrieval can be defined as finding material of an unstructured nature that satisfies the information need from within large collections. It uses the concept of indexing . PageRank algorithm is used to rank web pages used for Google Search Engine.
In time series data we have a collection of observations of a single entity at different time intervals. Weather records, economic indicators and patient health evolution metrics — all are time series data.
Basics of Time Series Prediction
Time series prediction involves concepts like stationarity, moving averages, seasonality and many more which you should be familiar with in order to better understand time series forecasting.
Time series forecasting models and techniques
Future trend prediction is made by discovering and analyzing underlying patterns in the time series data. Various methods and models are used for the same.
Time series prediction techniques
Various artificial neural network models are put to use when performing a time series prediction. This article elaborates on a few models.
Confidence intervals expresses a range of values within which we are pretty sure that the population parameter lies.
A Bayesian model is a statistical model where we use probability to represent all uncertainty within the model, both the uncertainty regarding the output but also the uncertainty regarding the input to the model.
A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Another variant of this is the hidden markov model.
A/B testing is a famous testing technique used to compare two variants to determine the best of the two based on user experience. It is a randomized experimentation process.
Simulated annealing is a algorithm used in probability based on the physical annealing process used in metallurgy.
Monte carlo sampling techniques
Monte Carlo techniques are a group of computational algorithms for sampling probability distributions in a random manner.
Week 10: Projects
In This article contains a list of unique data science project ideas that you can explore.