Predicting Air Pollution Levels in New Delhi (Part One)

For those living in the National Capital Region, especially those in New Delhi, air pollution is a part of life. The grey smog is a routine and deadly visitor that arrives - unwanted, but not unexpected, every year during the winter months. This choking smog literally takes the sunshine out of the lives of many millions living in the region. It is a big problem that we need to wrestle with before things get out of hand.

This is a three-part series about how I applied Machine Learning to predict pollution levels in New Delhi. Part One focusses on the problem, some solutions, and how prediction can help. Part Two focusses on the data we require, and involves initial analysis. In the final part, we go over the implementation of the machine learning model.

The Air Quality Crisis

The source of the pollution? It is hard to pinpoint one thing alone as the source of all our air quality problems. The air quality levels in New Delhi only improve to acceptable levels after rainstorms wash the dust and the smog away, but this effect is only temporary. While the winter months are the worst, the busiest places in the city such as transportation hub of Anand Vihar, or highways such as the Yamuna Expressway (which had a 24-vehicle pileup one smoggy day in 2017) face dangerous levels of air pollution throught the year.

It is the combination of the high number of vehicles, coal-based power plants, factories and the burning of waste and leaves, along with the crop-burning practices in neighbouring farmlands that brings the city to its knees in the winter - made worse by the geographical factors such as stagnation of the air above the NCR which creates a blanket of death. The Ministry of Earth Sciences published a report (see references below) in 2018 that studied the factors behind the crisis and vehicular emissions were the biggest culprit, followed by industrial emmissions.

Health Impact

Air pollution is a problem that all major cities across the globe - from Beijing to Los Angeles, are trying to tackle. It is embarrasig to see my nation's capital city in the list of most polluted cities every year, usually at the top. National pride aside, I worry about the impact this has on the health of the citizens. There is definitely a clear link between chronic respiratory diseases and air quality.

For reference, here is a chart that shows the air pollution levels in 2018 - the green dots are the days where the air quality is satisfactorily healthy. This has been a problem for the past decade or so, even before the "Great Smog of Delhi" in 2017 prompted the first major political and legal attempts at fixing the issue. The hard to swallow truth is that air pollution is the fifth largest cause of death in our country - around 2 million people die each year. The Supreme Court of India itself called the situation in the capital in November 2019 "worse than narak (hell)" and the Hon. Justice Arun Mishra is quoted saying " is better to get explosives and kill everyone."

This is all extremely unfortunate - but it is not an exaggeration. India has the world's highest death rate from chronic respiratory diseases like asthma, according to the World Health Organisation. In Delhi, poor quality air irreversibly damages the lungs of 2.2 million or 50 percent of all children. As a resident of the NCR myself, all of this information is much more than disconcerting. My grandfather passed away suddenly last year from ILD (a chronic lung disease) - and environmental factors such as exposure to air pollution are known risk factors. All of this prompted me to turn my attention to this huge human problem for my internship project.


Before I talk about air pollution prediction, I would be remiss to not mention the initiatives to solve the air pollution problem that are being taken by governments at various levels and by various non-profit groups.

Attempts by the Supreme Court

The Supreme Court outlawed the farming practice of stubble-burning, a common practice in the neighbouring states of Haryana and Punjab. However, there are few alternatives that farmers can resort to, so they still go through with this traditional approach of burning their fields. It is thought that stubble-burning in these states combined with the winds sends smoke towards the capital in the winter.
The court in 2018 also tried banning the sale of firecrackers in the city, though this is again a ban that is tough to implement and easy to flout.

The Supreme Court has made various attempts (see references) at curbing air pollution over the last two decades, including mandating CNG for the Delhi bus network.

Attempts by the Delhi Government

The most widely known attempt brought forward post the 2017 health emergency was the attempt by the Arvind Kejriwal-led Delhi Government to implement a traffic rationing system popularly called the Odd-Even Scheme.

Here was the plan to reduce the city's traffic-
Vehicles with registration numbers ending with an odd digit will be allowed on roads on odd dates and those with an even digit on even dates.

Similar schemes have taken place before in other coutries - such as in Beijing before the 2008 Olympics and in parts of the USA during the Iraq war when gas prices were high. The scheme was mildly successful at reducing traffic conditions, but not so much at improving air quality - there are claims (see references) that it isn't strict enough to bring about real change. The scheme exempts two-wheelers, for example.

Other Attempts

  • A big initiative such as a 1,600 km long and 5 km wide "Great Green Wall of Aravalli" - a green ecological corridor along the Aravalli range from Gujarat to Delhi is being considered. The aim is to plant 1.35 billion new native trees over 10 years to combat the pollution crisis.

  • In October 2018, the immensely polluting Badarpur thermal power plant was also shut down that year. It was known to be the most polluting plant in India.

  • In December 2019, IIT Bombay, in partnership with the McKelvey School of Engineering of Washington University in St. Louis, launched the Aerosol and Air Quality Research Facility to study air pollution in India.

What good is predicting?

The benefits of a project should be thought about before we get into it, of course. Here are some things predicting air pollution levels can help us with:

  • Knowing face mask/air filtration needs for certain times of day. Just like we ask ourselves whether we should bring an umbrella if it will rain based on the forecast for the day - it is important to be prepared and not be caught unawares.
  • Often people only look at the pollution levels once a day (if at all). Simply looking out the window isn’t a reliable indicator of pollution.
  • Different pollution levels affect different people (based on age, preexisting health conditions) differently - so it would be good to receive advice based on that.
  • Knowing if it is a good idea to go outside such as to the park, and at what time. Also knowing what part of the city will have cleaner air at a certain time of day can be valuable.

An accurate air quality predictor can prevent unnecessary exposure to the environment while it is unsafe. While it cannot solve the air pollution crisis directly - it can allow us to be more proactive rather than reactive. Instead of waiting to see if there is smog outside, we can shut down schools and offices in the event of poor conditions and avoid people being exposed to toxic air.

What are we predicting?

This is an important question before we move ahead to the next part, which discusses the data we shall use and the initial analysis. What exactly are we trying to predict?

The simple answer is - Particulate Matter 2.5, or PM 2.5 for short.

PM 2.5 means particles in the air that are smaller than or equal to 2.5 microns in diameter. These particles are so small they can enter your bloodstream and cause all manner of ailments, and are the usual go-to for finding the air quality of a region like a city.

Other metrics, such as PM 10 (particles that are between 2.5 to 10 microns in diameter) or specific levels of greenhouse gases (carbon dioxide, methane) or even toxic gasses (sulphuric and nitric oxides), can also be used to predict air pollution levels.

The reasons I chose PM 2.5 were -

  1. It is very dangerous, due to the small particulate size
  2. The data for PM 2.5 in Delhi goes all the way back to 2014 and is consistent
  3. Other individual pollutants fluctuate drastically, much harder to predict reliably
  4. The clear relationship between PM 2.5 and the Air Quality Index

Coming Up

This was part 1 in a 3 part series on predicting air pollution in the capital of India, New Delhi. In Part 2 we will explore the data we need for our project, and make some visualizations to better understand it. Part 3 will focus on the machine learning model I have built, aka the implementation. Stay tuned!

Further Reading and References

Link to Part 2 of this series - The Data and Analysis
New York Times: Holding Your Breath in India
Delhi Air Pollution, MoES Report 2018
The Indian Supreme Court and Attempts at Tackling the Air Pollution Crisis
Odd-Even Scheme and Reasons for Failure