Why we still don’t know how many, and who exactly died of Covid in India
Bhramar Mukherjee
On March 16, 2020, a bunch of us modelers, many from the Indian diaspora, started to track and predict the pandemic trajectory in India. India had 536 cases and 11 deaths at that time. We have tracked this data at our app covind19.org, every single day for the last 400 days. India stands at a staggering 26.5 million cases and nearly 300,000 deaths as of March 22. Seroprevalence studies and other investigative reports suggest the total toll of infections and deaths may be larger by orders of magnitude. But good models need good data, and we have operated under extreme data paucity from India leading to tremendous uncertainty around projection estimates, which are key for informing policy recommendations as well as to gauge healthcare resource needs, for example, the quantum of oxygen needed across India.
Math Matters: Data scarcity has many ramifications. For Wave 2, we need economic policies for kin
It is not even very detailed or granular level data missing from India. Finding simple things such as disaggregated counts for nationwide age-sex stratified data for Covid infections, hospitalisations and fatalities has been nearly impossible. In the early days, the Ministry of Health and Family Welfare in India released this data, but that dissemination quickly stopped. All we find are regional reports, snapshot figures in official briefs, screenshots from webinars, and media reports. One cannot scale up/impute this data to identify who is dying in India. Is there a difference in infection and fatality rates between men and women, young and old? Has this profile changed across Wave 1 and Wave 2?
The all-cause-mortality data from India has always been imperfect even before the pandemic with a large proportion of deaths not medically reported, particularly deaths that happen outside healthcare facilities and in rural areas. During the pandemic, the underreporting of Covid cases and deaths became a more acute problem. A part of the reason is of course the pressure on the system, natural problems related to misclassification of cause of deaths, limited testing, but there appears to be an effect of the desire to maintain public image. It is hard to tease apart the confluence of factors that leads to the underreporting. We could attempt to quantify this gap in officially reported Covid deaths through excess death calculations that many countries have done. A recent paper in the US shows 22% excess deaths in 2020 with 72% attributed to Covid-19. One would like to do a similar calculation if we had the mortality time series from India available to us. But we do not have this data from even 2019. This data scarcity has many ramifications, for example, if men and women in working age groups are dying in large numbers in Wave 2, we need economic policies to support families that are left behind with massive income losses. We need to estimate the number of Covid survivors who may end up being long-haulers and need additional healthcare for the next few years. How do we know the cost of this pandemic on society without accurate data? How do we make strategic investments into the future?
Another massive limitation is the sparsity of the genomic sequencing data from India to learn about the spread of the emerging variants and identifying the unknown variants in real time. Without integrating the sequencing data with epidemiologic surveillance, targeted sequencing of re-infections, cluster infections and breakthrough infections in the vaccinated, we will not know the properties of the variants and the effectiveness of the vaccines against them.
I have been asked many times, why we did not stop modeling the data from India since it is so poor and misleading? As a practising statistician in the US for more than two decades, I am acutely aware of the limitations of this data and the dangers associated with emphatic conclusions and predictions. It is still my opinion that we could and can capture some meaningful relative trends and signals after accounting for the data limitations. More importantly, our real time analysis of imperfect data helped the thousands who flocked to our app get a sense of what was coming. Hospital systems reached out to us for estimating the peak period of oxygen needs, state officials asked for predictions to inform their policies. All the absolute numbers we predicted are essentially wrong or useless, but there were signals that were discernable from the noise: the fact that there is an uptick in February, or the curve will peak in mid-May were reasonably correct assertions from our model and from models created by many other scientists. Though the height of the peak is hard to predict, we knew it was enough to overwhelm the healthcare system. There was no dearth of predictions or recommendations from us and others in the scientific community, it was a paralysis of actions that made this national catastrophe worse.
To conclude, I would like to share my personal narrative as an overseas citizen of India. On April 26, I received a call from my 81-year old father in Kolkata that he was running a fever. The next two weeks I fought one of the toughest battles of my life, to get a RT-PCR test done, to get a hospital bed, to get the right treatment — with my only lifeline being my contacts in India and WhatsApp. The blue ticks in WhatsApp told me he was breathing. An octogenarian erudite thespian, my father has always wanted to die on stage. I could not let him leave this world in a solitary room with only his phone next to him. There I was, co-fighting this transatlantic battle with my dad, who became a data point that I was analysing for the last 400 days. My deepest gratitude goes to the physician who answered each and every one of my frantic texts and helped my father to return home from the hospital. This is not just my experience.
The second wave has left very few resident Indians and persons of Indian origin living abroad untouched. We do not need fancy models. We know in our heart that the officially reported numbers simply do not add up. An investment in a robust data ecosystem and commitment to data sharing and transparency now will help India face the n-th wave of this pandemic.
