AURORA Shiny App - streamlining biodiversity data sharing

An integrated Shiny application designed to streamline the preparation, validation, and export of both marine and terrestrial biodiversity datasets into Darwin Core tables for sharing on public repositories such as GBIF β€” the Global Biodiversity Information Facility.

Biodiversity data Darwin Core Standards Taxonomy matching Quality control Metadata Darwin Core Archive (DwC-A)
Tool workflow
1
Ingest
Import the source dataset and inspect its original structure before processing.
2
Field Mapping
Map original columns of the ingested dataset to Darwin Core terms with guidance and validation.
3
Identification Data Cleaning
Review and edit identification-related fields, especially scientificName and related Darwin Core terms, before taxonomic matching.
4
Taxonomy
Automatic matching of scientific names against authoritative taxonomic backbones such as WoRMS or GBIF.
5
Darwin Core Tables
Build structured Darwin Core tables for Event, Occurrence, and Extended Measurement or Fact (eMoF).
6
eMoF Editor
Review and edit eMoF-related fields and controlled vocabulary entries.
7
DwC Tables Overview & Quality Control (QC)
Inspect issues, diagnostics, summary outputs, and validation results for export.
8
Metadata
Streamline metadata creation and revision using an interface inspired by the GBIF IPT.
User manual

Step-by-step guidance covering ingestion, mapping, taxonomy, quality control, metadata, and export.

Open manual
Examples of supported tasks
πŸ“
Map to Darwin Core
Align the original dataset fields with standardized Darwin Core (DwC) terms to ensure global interoperability.
πŸ“…
Standardize Temporal Data
Convert all event dates into ISO 8601 format for temporal consistency.
πŸ“
Normalize Coordinates
Transform various coordinate formats into Decimal Degrees (WGS84).
πŸƒ
Taxonomic Matching
Cross-reference scientific names with authoritative taxonomic databases such as WoRMS and GBIF to validate names and retrieve accepted classifications.
πŸ—‚οΈ
Build Darwin Core Tables
Structure the data into relational Event, Occurrence, and eMoF tables to comply with the DwC-A star schema.
πŸ”
Data Quality Inspection
Review processed tables to identify inconsistencies, missing values, and other issues in the dataset.
πŸ“
Metadata Preparation
Document dataset-level metadata to support revision, publication, and FAIR data sharing.
Data ingestion


Note
If your dataset is already in a tidy format, you can proceed directly by clicking Complete Ingestion .

What is a tidy format?
  • Each occurrence must correspond to a single row.
  • Do not store a single variable across multiple columns (e.g., species names split into separate columns).



                
Preview

Field mapping to Darwin Core

Map columns to Darwin Core terms, create fields, configure formatting options, and review the cleaned dataset.

Darwin Core Quick Reference Guide

Validation

Map each column to a Darwin Core term

Note: Create IDs and Create remarks can be used more than once.

Create IDs
Create remarks
Create / fill basisOfRecord
Use this only when all records in the dataset share the same basisOfRecord. Otherwise, the field should be filled correctly before import.
organismQuantity and sampleSizeValue Dependency Fields
Original coordinate system

Identify the original CRS (EPSG) of the dataset. Coordinates will be transformed into EPSG:4326 (WGS 84).


Target CRS EPSG:[4326]

Coordinates will be stored in decimal degrees.

Date formatting

Note: While our date-formatting engine works hard, it isn't flawless. Handling every possible date and time combination is complex. It will do its best, but please review the results carefully. If your eventDate is too messy, please preprocess values before ingestion.

This option is only used for ambiguous numeric dates such as 01/02/2024. If you choose dd/mm, that value is interpreted as 1 February 2024. If you choose mm/dd, it is interpreted as 2 January 2024. Dates already in ISO format, such as 2024-02-01 or 2019-10-01T22:39:12, are not affected.

Preview of the cleaned dataset after mapping, create-fields, and formatting.

Identification Data Cleaning

According to the Darwin Core standard, when scientific names include open nomenclature qualifiers such as 'cf.', 'aff.', or informal placeholders (e.g., 'sp.'), these qualifiers should not be included in the scientificName field, but rather in the identificationQualifier term. The scientificName should contain the lowest possible taxon rank that can be confidently determined, while the identificationQualifier field captures the determiner's uncertainty or relationship to a known taxon.

For example, for Gadus cf. morhua :

  • scientificName : 'Gadus'
  • taxonRank : 'genus'
  • identificationQualifier : 'cf. morhua'

To ensure data traceability, the original identification is retained in the verbatimIdentification field as a historical record.

Warning: The Reviewed scientificName column (red) must be completed manually or by accepting the suggestion for every row before applying changes.
Optional fields shown below are Darwin Core terms that were not selected during Field Mapping. They can be reviewed and edited here if needed. Please review all entries for accuracy.
Settings


Download decisions (CSV)
Editable lookup table
Taxonomy match
Notes
  • Taxonomic Matching β€” Matches are generated from the scientificName field.
  • Database Authorities β€” Marine taxa are validated against the World Register of Marine Species (worrms package, wm_records_taxamatch()). Terrestrial taxa are matched against the GBIF Backbone Taxonomy via the rgbif package.
  • Ambiguity Resolution β€” The results table contains one row per unique scientificName. When a match is uncertain, a manual selection is required to confirm the correct taxon.
  • Inclusion Rules β€” Only records with a confirmed match are included in the final dataset and passed to subsequent processing steps. Unresolved names are excluded from the output but retained in the validation log for review.
  • Workflow Restart β€” To process unresolved names, corrections must be made in the Identification Data Cleaning tab and the taxonomic matching step must be rerun.

Settings

Columns to keep in final dataset
Only Darwin Core output terms are available for selection. The inputName and taxonMatchStatus fields are strictly technical and are excluded from the final dataset. When using WoRMS, the scientificNameID is populated with the identifier returned by the matching source (i.e., the LSID). Conversely, when the GBIF backbone is used, the scientificNameID field is currently ignored (see: GBIF Issue #217 ).
Summary
Name lookup table
Issues (taxonomy)

Build Darwin Core Tables

Choose the archive resource type and review the allowed fields step by step before building the Darwin Core tables.

Build settings
Selection warnings / status
Quality report
Table previews
eMoF Data Editor

This editor helps standardize the core measurement vocabulary fields in the Extended Measurement or Fact (eMoF) table after the Darwin Core tables have already been built.

  • Repeated non-numeric measurement rows are grouped into a single editable row.
  • Rows containing numeric measurementValues are grouped, and their values are locked to prevent accidental bulk edits.
  • Edits remain as a draft until you click Apply changes to eMoF.
Note: Consult the controlled vocabulary at BODC-NERC controlled vocabulary.
Status

Editable lookup table
Note: This section provides a brief summary of the DwC tables content and issues still present that need to be solved. For a more rigorous assessment, you can perform a comprehensive quality control check using the LifeWatch & EMODnet Biology QC tool.
Data overview

Data Overview & Summary

Export status
Errors
Warnings
Records (occurrence)


Overview of measurement or fact records


Geographic coverage of the dataset



Taxonomic coverage of the dataset

Overview of all issues


Details


eMoF issues

The AURORA project

The AURORA R Shiny application was developed as a core output of the project AURORA - bringing deep-sea biodiversity data to light , an initiative dedicated to streamline the mobilization of deep-sea biodiversity data.

Although the deep sea covers 65% of the planet, it remains Earth’s least understood biome. Within European marine waters, deep-sea data (depths >200 m) account for only 11% of records in the Ocean Biogeographic Information System (OBIS), dropping to a mere 1% for depths below 3500 meters.

To address this knowledge gap, the Digital Marine Biodiversity Lab at the University of Aveiro is mobilizing its collections of deep-sea data, starting with the Aurora seamount on the Gakkel Ridge in the Central Arctic Ocean.

This data mobilisation to open repositories, such as EMODnet Biology, will directly contribute to the development of the European Digital Twin of the Ocean.

Disclaimer

This application is provided as a tool to assist in the standardization of biodiversity datasets into Darwin Core (DwC) archives. While every effort is made to ensure the accuracy of the underlying mapping logic and transformation scripts, the outputs are provided "as is" without any guarantees of completeness, accuracy, or fitness for a specific purpose.

How to cite

Magnenti, Á. S., AraΓΊjo, S. M., … Matos, F. L. (2026). Aurora Shiny App: Streamlining biodiversity data sharing (Version 1.0).

http://bio-shiny.ua.pt:3838/aurora

License: CC-BY 4.0 β€” This work may be shared and adapted, including for commercial purposes, provided appropriate credit is given.

Support and bug reports

Technical support: For technical support or to suggest new feature enhancements for the AURORA Shiny App, please reach out to us directly via email at fmatos@ua.pt .

Bug reports: If you encounter any bugs or functional glitches, please report them by opening a new issue on the AURORA GitHub repository: AURORA GitHub repository .

Institutional support
Participating institutions logos
Funding

AURORA project is funded by the Flanders Marine Institute (VLIZ) through the DTO-BioFlow project funded by the European Union.