Opening a window on the history of FDA
ICF data scientists and engineers help make a century’s worth of agency documents searchable for researchers, journalists, and the public.
Since 2014, the Food and Drug Administration (FDA) has committed to new levels of transparency and accountability through the openFDA website, aiming to “educate the public and save lives.” Since its launch, openFDA has consistently made new datasets and resources available to researchers, journalists, and the public.
One example is a collection of news releases and public health alerts dating to the agency’s founding in 1913. The information in these historical documents sheds light on the diverse responsibilities and activities of FDA, which has had an outsized impact on the lives of American citizens for more than a century. When the agency’s historian wanted to make this collection easier for users to navigate, FDA approached ICF—which had partnered with the agency on other aspects of the openFDA project—to develop a solution.
Challenge
These historical documents—detailing the history of medications, adverse reactions, agency responses to disease outbreaks, and more—had already been digitized, but they weren’t available in a machine-readable format. The documents also spanned a period of technological change—from handwriting to typewriting to word processing. The tool ICF used to convert the images to text, therefore, needed to be both powerful and flexible to interpret letters and words despite a lot of “noise” in the background, such as handwritten notes in margins and worn areas created by paper folds.
- AI
- Open source
- Human-centered design
Solution
ICF’s data scientists and engineers have extensive experience working with different AI tools, and they leveraged that knowledge to choose the right one for the FDA historical documents project. Our team considered a variety of optical character recognition (OCR) tools to help interpret the database’s words before settling on Tesseract. This open-source engine aligned with openFDA’s commitment to sharing code, examples, and ideas. It also delivered higher accuracy than many expensive OCR tools currently available.
We also created charts and other visualizations based on recommendations by FDA stakeholders. These highlight details about the documents, such as the most frequently reported side effects by decade. The team used known best practices for user experience when designing the database’s interface and visualizations.
Finally, the team created APIs on the database’s back end so that users could grab the data and pull it into their own tools and systems for research, reporting, and other purposes.
Where we are now
The historical documents database — which comprises more than 8,500 documents — went live in late March 2024. The FDA historian and other stakeholders were thrilled to have such a powerful tool to share this valuable information with the public. With the openFDA site averaging 11 million viewers per month, these resources are sure to reach a wide audience and support openFDA's goal of educating the public and saving lives.