Over the course of the COVID-19 pandemic, policy makers and health care stakeholders have been hamstrung in their pursuit of health equity by data gaps and inconsistencies. Measuring inequities in the uptake of the COVID-19 vaccine, the best defense against severe disease, was hindered by critical limitations, including the lack of standardized procedures and category definitions for race and ethnicity data as well as inadequate capacity to share data across systems. Regional data turned out to be disparate and inconsistent. Federal data is consistent but is coded using antiquated standards that prevent granular analysis and leave a significant proportion of records with a missing race and ethnicity component. To make fair and equitable policy decisions, federal and regional data needs to be comparable without the danger of data artifacts leading to misdirected resources.
As we consider the problem, we know that people of color have borne greater COVID-19 infections, hospitalizations, and deaths perpetuated by longstanding inequities in health. For this reason, national recommendations regarding vaccine distribution focused on special efforts to ensure equitable access, particularly for disproportionately affected groups, including people of color. National and state-level vaccination data stratified by race and ethnicity reveal that the vaccination gap between White and minority populations has been narrowing since the start of the rollout. Yet these broad trends mask enormous local variations and do not facilitate an effective local-level response.
In the spring of 2021, we set out to create a daily updated, nationwide map of county-level vaccination coverage in the United States disaggregated by race and ethnicity by scraping publicly available data from state dashboards. It quickly became clear that differences in reporting styles and gaps in data coverage across states would prevent a nationally representative view into county-level disparities. Even among the states where county-level data is available on public-facing dashboards, large gaps, limitations, and inconsistencies inhibit the ability to compare among race and ethnicity both within and across states. Here we describe retrospective findings as of March 2022 from a small subset of 13 states with robust, consistent reporting standards, share the barriers we encountered, and propose recommendations for revamping standardization and data-sharing practices nationally. Comprehensive, granular data on race and ethnicity is fundamental for efforts to advance health equity, not only related to COVID-19 but in public health and health care more broadly.
County-level racial vaccination inequity shows substantial variability within and across states
National and state vaccination aggregates can mask local disparities with unique underlying causes. County-level estimates disaggregated by race and ethnicity, the most granular geographic unit reported at a national scale, reveal large differences in coverage. As of March 29, 2022, we can robustly report county-level first-dose vaccination data among the White and Black populations for 827 counties across 13 states, accounting for 40% of the population over the age of 5 years, 31% of the White population over the age of 5 years, and 40% of the Black population over the age of 5 years in the United States. We omit other racial and ethnic groups for comparability reasons further explored below. The fact that this omission is necessary, however, is precisely the problem that needs to be addressed.
In April 2021, vaccination coverage was 13 percentage points higher among the White population compared with the Black population. Across our dataset as of March 2022, we saw a decline in the vaccination gap, whereby the Black population trailed by 9 percentage points, consistent with national reporting at that time. However, the county-level data reveals significant local variations. In most states, the average White vaccination coverage was higher compared with Black vaccinations. But there are exceptions, with average Black vaccination rates higher in states such as Alabama, Georgia, and Oregon. In our dataset, 32% of all counties had Black vaccination rates greater than White (Exhibit 1). White vaccination rates were greater than Black vaccination rates in 43% of counties, and the remaining 25% of counties had an equitable proportion of both Black and White populations vaccinated. Even in states like California with large inequities between the Black and White populations, the differences between counties were far greater than the differences in the state averages. Such heterogeneous inequities across counties clearly indicate the need for tailored, local response strategies. Moreover, they highlight that attempts to create a single national-level racial analysis are doomed to be over-simplistic.
Data gaps, limitations, and inconsistencies inhibit comparisons across and within states
As we began this data journey, we discovered county-level data by race and ethnicity for people who received at least 1 dose of the vaccine was publicly available on 22 individual state dashboards, but only 16 states showed the fully vaccinated (those that have completed their primary vaccine series [ie, 2 doses of either Pfizer or Moderna or 1 dose of the Johnson & Johnson]). Between states, we encountered 3 main issues in state-reported vaccination data: their racial/ethnic classifications, their frequency of reporting, and their separation in reporting of data administered through federal programs such as the Indian Health Services and the Long-Term Care Partnership Program. We defined 2 categories of states based on how they report race and ethnicity. There were 5 states in our sample (AL, DE, GA, OH, WI) that reported race and ethnicity separately, using a 2-question format (eg, “Black” and “Hispanic or Not Hispanic”), while the remaining 8 states (CA, MA, MI, OR, SC, TX, VA, WA) record them as mutually exclusive, using a 1-question format (eg, “Black, non-Hispanic” or “Hispanic”, but not both) (Exhibit 2). The total population within a state used to calculate vaccination rates differs depending upon the method used for reporting. This, in turn, prevents a proper comparison of vaccination rates between states using different approaches since the underlying total population of, for example, “White” (“Hispanic” and “non-Hispanic” under the 2-question format) is different from solely “White, non-Hispanic” (under the 1-question format).
The federal government has minimum standards for reporting race and ethnicity data. However, these standards were last revised in 1997 and do not fully reflect the diversity of today’s population. The standards allow for a reporting system by asking about race and ethnicity separately (2-question format) or together (1-question format) (Exhibit 2). When data on race and ethnicity are collected separately, the reporting agency should include the number of respondents in each racial category who are of Hispanic origin. However, in the absence of proper federal guidance, certain states following the 2-question format have violated the spirit of the Office of Management and Budget (OMB) standards by failing to report (eg, Arizona’s publicly available vaccination dashboard) the number of Hispanic people within each racial category. The use of the Hispanic category in the 2-question format, when aggregated, does not provide information on the race of those selecting it. Though useful in improving nuanced self-identification, the 2-question format has the potential to mask important subpopulations if not properly reported. Implementers of the 2-question approach should aggregate both questions as a cross-tabulation since they are inherently related.
The federal government should revise and enforce data standardization protocols both on collection and reporting
Many of the inconsistencies in county-level vaccination stemmed from a dynamically changing pandemic environment where responders needed to adapt rapidly. However, data reporting could have been more consistent through better federal guidance early on. Most stakeholders agree that the Federal OMB standard is outdated, leading to regional decision-makers being forced to “go their own way” regarding the best approach to gather this data. Moving forward, solutions can be addressed both for crises and for routine monitoring.
In the context of public health emergencies like COVID-19, state and local health departments need access to timely data to monitor disparities and adjust response strategies. Immunization Information Systems (IISs), currently controlled individually by all 64 jurisdictions, including US states, are intended to be a centralized data repository of immunization records for each jurisdiction and the source for sending records to the CDC’s COVID-19 Data Clearinghouse. Individual IISs have varying capacities to automate processes and handle large quantities of data. Data quality can also significantly vary across IISs, specifically due to data reporting policies. The current nationwide network of IISs cannot consistently provide real-time data, but the CDC’s Vaccine Administration Management System (VAMS) program was intended to fill that gap by directly integrating into any public health facility’s IT platform. VAMS was built to enable IIS jurisdictions to access aggregate data via reports and dashboards, including data analysis tools, upon uploading vaccine administration data. However, VAMS has been plagued with problems and bugs that forced states to abandon the software and seek private solutions instead. While the CDC’s vaccine administration software heralded a promise for a national solution to real-time data management, the execution left a lot to be desired. Continued investment in information technology (IT) systems like VAMS is necessary for state and local decision-makers to have greater access to the data needed to respond quickly and efficiently.
In the longer term, an overhaul of current data standards and sharing processes is needed to ensure consistent health equity analyses can be conducted in the broader healthcare ecosystem. First, the federal government should review and update the OMB 1997 Statistical Directive on collecting and presenting federal data on race and ethnicity to more accurately reflect the demographics of the US population and provide flexibility to state and local governments to capture information representing their communities. Disparities often exist within granular racial and ethnic groups like Middle Eastern or North African, and local jurisdictions should be encouraged to track these data. There should be standard language that details why racial data is gathered in a given clinical setting, so that people can choose whether to provide or withhold their racial data. This informed consent will create implicit limits for what the data can be used for and why. These need to be codified and rigorously adhered to, with regular oversight.
Secondly, the Interagency Working Group on Equitable Data (Data Working Group) should work through OMB to standardize collection and reporting of racial and ethnic data across the federal government (eg, USDA, CMS, HRSA), while providing states and local governments the flexibility to collect data on other populations residing in their area. The federal government should review all data systems to ensure that race and ethnicity data are collected, where appropriate.
Lastly, public health and health care systems should assess the feasibility of incorporating the HL7 Fast Healthcare Interoperability Resources (FHIR) Code System for race and ethnicity, where appropriate, into quality measure specifications. Individuals’ data are often stored across multiple records regarding health insurance eligibility, health care utilization, and clinical data with varying levels of completeness due to self-reporting. FHIR is a standard for data formats and elements that allows health information to be exchanged between systems. FHIR could facilitate the sharing of reliable information on race and ethnicity between stakeholders and eliminate data inconsistencies that prevent comparison across states and systems. Currently, there is no mandate to include FHIR in health IT systems—but it is only a matter of time before public health institutions take advantage of what has already been widely adopted by the health care system.
Bad data blocks credible progress towards health equity
Health data—indeed, public data of any sort—need to be both locally relevant and nationally comparable. Decision-makers and practitioners within each US county should be armed with health equity data that accurately represents their local situation. Likewise, they should also be able to freely collaborate and share experiences, which will only be enabled by the adoption of national standards on race and ethnicity data and reporting that are applied at the most granular level. We observe that improving trends in vaccination equity at the national level belie significant, and persistent, state- and county-level inequities. These results underscore the need for responses tailored to the local context as researchers investigate barriers faced by specific groups. These calls to action will only be more pressing as we transition to yearly COVID-19 boosters or near-term and long-term health crises.
For good reason, minorities in the United States are less likely to provide data of any kind for public health reporting. When they take the risk of trusting public health officials with details that might make them vulnerable, it is critical that they and their communities are rewarded with the best possible data analysis, resulting in the most equitable and reasonable policies possible. The only way to restore trust in public health institutions is for these institutions to consistently act in a trustworthy manner. Getting the data gathering and analysis right is certainly not the only thing that public health agencies need to do to restore trust. But it is an important issue that must be “gotten right” before other health equity issues can be discussed clearly.