A comprehensive literature review (2014–2025) was conducted to evaluate the reproducibility of microbiome metadata in original research articles. Studies were identified via PubMed using microbiome and sequencing-related keywords. Reviews and meta-analyses were excluded. Each study's sequencing method (16S, shotgun, or both), availability of supplementary metadata (e.g., sample IDs, biological and demographic data), and data accessibility (e.g., GitHub, NCBI, SRA) were recorded.
Researchers systematically extracted and validated metadata and sequencing information using a structured checklist. Key metadata elements—such as sample IDs, age, gender, health status, and environmental context—were analyzed. Metadata reproducibility was assessed based on (1) whether metadata was provided, (2) if sample IDs matched those in SRA explorer, and (3) whether data were differentiable for downstream analysis, including machine learning use.
Validation included cross-checking accession codes and confirming consistency between metadata files and public repositories. Studies were reviewed by multiple individuals to ensure reliability and transparency. The final dataset was used to assess trends in metadata completeness, publication year, and journal impact.
This bar graph illustrates how the number of papers per year included in our meta-analysis out of #### total papers included
This figure illustrates nodes from which the studies included in our project stem from. Notably, the United States and China are two countries that are represented the most in our project.
This bullet graph reveals the trend in the type of sequencing used in the papers included in our study. With this graph, an increase in the use of shotgun sequencing is observed up until 2023, when it decreased, but increased again for 2024.
This bullet graph reveals the trends with reproducibility over time. For a study to be reproducible, independent researchers must be able to obtain the same results using the same methods and data. Fully reproducible datasets continue to be very rare, and possibly a crucial concern.
Our analysis included finding a paper's metadata, making sure that the SampleID in SRA/genomic data base matches the metadata, and that the entry in the genomic data base is present and differentiable.
This flow chart depicts the steps taken during out project to decide if studies were replicable.