DATA, ANALYTICS & AUTOMATION for better healthcare
About Us

An Approach towards COVID-19 Case Estimation in Indian Regions

By Niti Jain


Emerging COVID-19 pandemic, with first case being reported on 30 January 2020 in Kerala, has emerged as a serious problem in the Indian healthcare fraternity. As of 2nd May 2020, the Ministry of Health and Family Welfare have confirmed a total of 37,336 cases, 9,951 recoveries and 1,218 deaths in the country[Ref:1].


In current scenario when data is insufficient, daily case additions being too volatile and no uniformity among the data recorded by different countries, the traditional forecasting techniques and predictive models may fail to provide stable results

Under such conditions, the estimation of the disease can be based on three key driving factors Ro- the reproduction number, duration for which the population of the region has been exposed to the disease outbreak and the rate of change of Ro over period of time due to external factors such as lockdown, violations (such as a Tablighi Jamaat religious congregation held in Delhi)

An epidemiologist explains the term "Ro" as how many people one person with the coronavirus can infect [Fig1]. If, Ro < 1: each existing infection causes less than one new infection, the outbreak will eventually fade off, if Ro = 1: each existing infection causes one new infection, the outbreak will not fade off but will remain stable, lastly if Ro > 1 (which is where most of the world is) : each existing infection causes more than one new infection.

According to CDC, Ro for COVID-19 disease is between 2.2-2.7 [Ref: 4]

However, it is important to note that Ro can vary for each country [Fig 2], even within the country the Ro can be very different. This can be due to various reasons such as population density, pre-existing diseases among the population (such as Asthma, Diabetes etc.), population age distribution, number of travellers from the hotspot locations (China, Italy, Germany etc.), socio-economic conditions of the population (Ex slums, number of individuals BPL), stage of spread at which the region is and many other external factors such as violations, lockdowns etc. Hence, it is very important to normalise the data as much possible and estimate Ro for each “similar” region.

At times external and non-quantifiable factors such as extension of lockdown, violations can be difficult to foresee, under such condition, techniques such as scenario testing and sensitivity analysis can come to rescue.

Across the world, lockdown has played a key role in reducing the Ro and thereby controlling the outbreak of the disease. As observed in different countries, the effect of lockdown starts showing on an average within 20 days, as Ro slowly and gradually start reducing [Fig 3].

In order to estimate COVID-19 cases in India the country can be sub-divided into Northern, Southern and Western regions. Northern region comprising of states Delhi, Haryana, J&K and Uttar Pradesh, while Kerala, Tamil Nadu and Karnataka can be grouped in South and Maharashtra lying in the West.

The starting Ro for each of the defined region can be adjusted in lines of the methodology described above. Using the adjusted Ro for each time-interval under a given scenario the growth of the COVID-19 infections in that region can be assumed to continue along a geometric progression.

Validation of our results on Covid-19 case estimation

Previously, (Ref: 6; LinkedIn post-dated 13th April 2020) we published COVID -19 case estimation as of 30th April for Northern (Delhi, UP, HR and J&K), Southern (Kerala, KT and TN) and Western (MH) Indian regions.

Apparently, (Ref: Table 1) actual case counts for ‘West’ and ‘North’ has been immensely close to our ‘realistic scenario’. While for the ‘South’ we saw a significant reduction in Ro, as the actual numbers were way lower than our ‘optimistic scenario’.

Table 1: Validation of results
Region Actual Case Counts * (as of 30th April) Predicted Case Counts
North (DLH,UP, HR, J&K) 6,645 cases 6,698 cases
South (Kerala, KT, TN) 3,385 cases 4,500 cases
West (Maharashtra) 10,498 cases 10,090 cases

Going further, we can assume that the outbreak in these defined regions will stabilize and Ro will steeply reduce, resulting in some relaxations, but still the government is less likely to provide lockdown relief in ‘red zones’ (such as Mumbai or Delhi-NCR).

Secondly, in our methodology we had highlighted that on an average during the first 20 days from origin of lockdown, Ro tends to be highly volatile, post that Ro slowly and gradually start reducing. Evidently (Ref: Fig 4), Ro for the country has begun to fall slowly post 22nd-23rd day

For the continuity of this reduction and to stabilize the outbreak within the whole country it is imperative to control the emergence of new ‘red zones’, specially under the situations where relaxation is being offered by the government in ‘green zones’ and arrangements are being made to send home stranded migrants.

Covid-19 Case Estimations as of 15th and 31st May 2020

In the below table (Ref: Table 2,3) we have extended our projections as of 15th and 31st May 2020 under different scenarios for our defined regions. Given the current conditions the anticipated case counts are likely to be between “Realistic” and “Optimistic” Scenario.

Table 2: COVID-19 case estimations (Active + Closed) by 15th May 2020
Region Realistic Scenario* Optimistic Scenario* Pessimistic Scenario*
North (DLH,UP, HR, J&K) 10,730 cases 8,074 cases 13,876 cases
South (Kerala, KT, TN) 4,324 cases 3,737 cases 6,156 cases
West (Maharashtra) 20,972 cases 11,831 cases 28,389 cases
Table 3: COVID-19 case estimations (Active + Closed) by 31st May 2020
Region Realistic Scenario* Optimistic Scenario* Pessimistic Scenario*
North (DLH,UP, HR, J&K) 15,481 cases 10,480 cases 30,016 cases
South (Kerala, KT, TN) 5,777 casess 4,540 cases 11,806 cases
West (Maharashtra) 29,612 cases 19,647 cases 83,470 cases
  • Realistic Scenario- no major changes in existing situation or Ro trend
  • Optimistic Scenario- steep reduction in Ro as observed in China/Spain
  • Pessimistic Scenario- significant changes in external factors worsening the current situation
  • Actual case count may vary

Not all COVID-19 cases go into hospitalization, taking into consideration the demographic distribution by age [Ref 2], PED and external factors within each of the defined regions, likely hospitalization, ICU admissions and fatality percentage can be estimated [Table4]. Estimation of Hospitalization, ICU admissions and Fatality should be based on 3-4 days prior records rather than current COVID-19 case counts.

Table 4: Hospitalization, ICU admission & Fatality Percentage
Region Hospitalization* ICU Admissions* Fatality*
North (DLH,UP, HR, J&K) 25.9%-38.3% 5.2%-7.7% 2.9%-6.1%
South (Kerala, KT, TN) 31.7%-46.9% 6.3%-9.4% 3.9%-8.2%
West (Maharashtra) 30.2%-44.8% 6.0%-9.0% 3.7%-7.6%

-By Niti Jain, Health Actuary and Data Science Consultant

©2020 MedML