The HHEAR Data Repository: Facilitating Environmental Health and Exposome Research Through Data Science, Harmonization, and Accessibility

Screen Shot 2021-03-31 at 8.27.50 PM.png

Video


Team Information

Team Members

  • Jeanette Stingone, Assistant Professor, Epidemiology, Columbia University Mailman School of Public Health

Abstract

The U.S.-based, NIEHS funded, Human Health Exposure Analysis Resource (HHEAR) provides scientific investigators access to both laboratory and statistical analyses aimed at incorporating and expanding environmental exposures within their research. To benefit the broader research community, the HHEAR Data Center has created a public data repository that houses deidentified data from all studies accepted into the HHEAR program that have passed an embargo period. The goal of this repository is to promote the secondary analysis of pooled environmental health data by providing data in a manner that is findable, accessible, interoperable and reusable. The public repository has been constructed by coupling the open-source Human-aware Data Acquisition Framework with precisely-developed semantic annotation templates. These tools facilitate the ingestion, semantic-mapping, harmonization and accessibility of data (epidemiologic, clinical and biomarker) and metadata across the multiple environmental health studies within the HHEAR Program. We demonstrate how users of the public repository have the ability to simultaneously view, search and download data from multiple HHEAR studies. Study descriptions and data elements are viewable in both text-based and graphical forms. The repository can be searched based on a number of factors including health outcomes, biological markers of exposure and common covariates. Because data have been harmonized to a common vocabulary (the HHEAR ontology), downloaded datasets automatically contain common codes and labels for variables that are present in multiple studies. This facilitates data pooling across studies with originally disparate variable names and coding schema. By selecting common data elements, users can create customized datasets with accompanying codebooks in a format that is easily imported into statistical analysis software. This increased availability of data will encourage secondary data analysis of pooled HHEAR studies, allowing for investigations that can leverage larger sample sizes and greater exposure variability.


Contact this Team

Contact: Jeanette Stingone (use form to send email)

Previous
Previous

Digital Phenotyping of Sleep Patterns Among Heterogenous Samples of Latinx Adults using Unsupervised Learning

Next
Next

Using ‘Big Data’ To Explore and Identify Potential Risk Factors for Early-Onset Colorectal Cancer