AURORA Shiny App

AURORA Shiny App - streamlining biodiversity data sharing

An integrated Shiny application designed to streamline the preparation, validation, and export of both marine and terrestrial biodiversity datasets into Darwin Core tables for sharing on public repositories such as GBIF — the Global Biodiversity Information Facility.

Biodiversity data Darwin Core Standards Taxonomy matching Quality control Metadata Darwin Core Archive (DwC-A)

Tool workflow

1

Ingest

Import the source dataset and inspect its original structure before processing.

2

Field Mapping

Map original columns of the ingested dataset to Darwin Core terms with guidance and validation.

3

Identification Data Cleaning

Review and edit identification-related fields, especially scientificName and related Darwin Core terms, before taxonomic matching.

4

Taxonomy

Automatic matching of scientific names against authoritative taxonomic backbones such as WoRMS or GBIF.

5

Darwin Core Tables

Build structured Darwin Core tables for Event, Occurrence, and Extended Measurement or Fact (eMoF).

6

eMoF Editor

Review and edit eMoF-related fields and controlled vocabulary entries.

7

DwC Tables Overview & Quality Control (QC)

Inspect issues, diagnostics, summary outputs, and validation results for export.

8

Metadata

Streamline metadata creation and revision using an interface inspired by the GBIF IPT.

User manual

Step-by-step guidance covering ingestion, mapping, taxonomy, quality control, metadata, and export.

Open manual

Examples of supported tasks

📝

Map to Darwin Core

Align the original dataset fields with standardized Darwin Core (DwC) terms to ensure global interoperability.

📅

Standardize Temporal Data

Convert all event dates into ISO 8601 format for temporal consistency.

📍

Normalize Coordinates

Transform various coordinate formats into Decimal Degrees (WGS84).

🍃

Taxonomic Matching

Cross-reference scientific names with authoritative taxonomic databases such as WoRMS and GBIF to validate names and retrieve accepted classifications.

🗂️

Build Darwin Core Tables

Structure the data into relational Event, Occurrence, and eMoF tables to comply with the DwC-A star schema.

🔍

Data Quality Inspection

Review processed tables to identify inconsistencies, missing values, and other issues in the dataset.

📝

Metadata Preparation

Document dataset-level metadata to support revision, publication, and FAIR data sharing.

Data ingestion

Source

Upload file

Example dataset

Upload (CSV)

Browse...

Choose example dataset

Encoding

Delimiter (CSV)

Header row

Note
If your dataset is already in a tidy format, you can proceed directly by clicking Complete Ingestion .

What is a tidy format?

Each occurrence must correspond to a single row.
Do not store a single variable across multiple columns (e.g., species names split into separate columns).

Preview

Field mapping to Darwin Core

Map columns to Darwin Core terms, create fields, configure formatting options, and review the cleaned dataset.

Darwin Core Quick Reference Guide

Validation

Map each column to a Darwin Core term

Note: Create IDs and Create remarks can be used more than once.

Create IDs

Target field

Source columns

Separator

Overwrite target if it already exists

Create remarks

Remarks field

Source columns

Separator

Overwrite target if it already exists

Create / fill basisOfRecord

basisOfRecord value

Overwrite existing basisOfRecord values

Use this only when all records in the dataset share the same basisOfRecord. Otherwise, the field should be filled correctly before import.

organismQuantity and sampleSizeValue Dependency Fields

Original coordinate system

Identify the original CRS (EPSG) of the dataset. Coordinates will be transformed into EPSG:4326 (WGS 84).

Input CRS EPSG:

Preserve original coordinates in verbatimLatitude / verbatimLongitude

Create/fill geodeticDatum

Target CRS EPSG:[4326]

Coordinates will be stored in decimal degrees.

Date formatting

Standardize eventDate to ISO-8601

Note: While our date-formatting engine works hard, it isn't flawless. Handling every possible date and time combination is complex. It will do its best, but please review the results carefully. If your eventDate is too messy, please preprocess values before ingestion.

Interpret ambiguous numeric dates as

dd/mm

mm/dd

This option is only used for ambiguous numeric dates such as 01/02/2024. If you choose dd/mm, that value is interpreted as 1 February 2024. If you choose mm/dd, it is interpreted as 2 January 2024. Dates already in ISO format, such as 2024-02-01 or 2019-10-01T22:39:12, are not affected.

Preview of the cleaned dataset after mapping, create-fields, and formatting.

Target data repository

Data repository

Required terms follow the selected database.

Actions

Download Warning/Errors (CSV)

Workflow status

Identification Data Cleaning

According to the Darwin Core standard, when scientific names include open nomenclature qualifiers such as 'cf.', 'aff.', or informal placeholders (e.g., 'sp.'), these qualifiers should not be included in the scientificName field, but rather in the identificationQualifier term. The scientificName should contain the lowest possible taxon rank that can be confidently determined, while the identificationQualifier field captures the determiner's uncertainty or relationship to a known taxon.

For example, for Gadus cf. morhua :

scientificName : 'Gadus'
taxonRank : 'genus'
identificationQualifier : 'cf. morhua'

To ensure data traceability, the original identification is retained in the verbatimIdentification field as a historical record.

Warning: The Reviewed scientificName column (red) must be completed manually or by accepting the suggestion for every row before applying changes.
Optional fields shown below are Darwin Core terms that were not selected during Field Mapping. They can be reviewed and edited here if needed. Please review all entries for accuracy.

Settings

Download decisions (CSV)

Editable lookup table

Taxonomy match

Notes

Taxonomic Matching — Matches are generated from the scientificName field.
Database Authorities — Marine taxa are validated against the World Register of Marine Species (worrms package, wm_records_taxamatch()). Terrestrial taxa are matched against the GBIF Backbone Taxonomy via the rgbif package.
Ambiguity Resolution — The results table contains one row per unique scientificName. When a match is uncertain, a manual selection is required to confirm the correct taxon.
Inclusion Rules — Only records with a confirmed match are included in the final dataset and passed to subsequent processing steps. Unresolved names are excluded from the output but retained in the validation log for review.
Workflow Restart — To process unresolved names, corrections must be made in the Identification Data Cleaning tab and the taxonomic matching step must be rerun.

Settings

Database

WoRMS (marine)

GBIF Backbone (terrestrial)

Columns to keep in final dataset

Only Darwin Core output terms are available for selection. The inputName and taxonMatchStatus fields are strictly technical and are excluded from the final dataset. When using WoRMS, the scientificNameID is populated with the identifier returned by the matching source (i.e., the LSID). Conversely, when the GBIF backbone is used, the scientificNameID field is currently ignored (see: GBIF Issue #217 ).

Summary

Name lookup table

Export lookup (CSV)

Issues (taxonomy)

Build Darwin Core Tables

Choose the archive resource type and review the allowed fields step by step before building the Darwin Core tables.

Build settings

Selection warnings / status

Quality report

Table previews

eMoF Data Editor

This editor helps standardize the core measurement vocabulary fields in the Extended Measurement or Fact (eMoF) table after the Darwin Core tables have already been built.

Repeated non-numeric measurement rows are grouped into a single editable row.
Rows containing numeric measurementValues are grouped, and their values are locked to prevent accidental bulk edits.
Edits remain as a draft until you click Apply changes to eMoF.

Note: Consult the controlled vocabulary at BODC-NERC controlled vocabulary.

Status

Editable lookup table

Note: This section provides a brief summary of the DwC tables content and issues still present that need to be solved. For a more rigorous assessment, you can perform a comprehensive quality control check using the LifeWatch & EMODnet Biology QC tool.

Data overview

Data overview
Issues found

Data Overview & Summary

Export status

Errors

Warnings

Records (occurrence)

Overview of measurement or fact records

Geographic coverage of the dataset

Taxonomic coverage of the dataset

Overview of all issues

Details

eMoF issues

Dataset Metadata

The sections follow the structure of the GBIF IPT metadata form. Fields can be completed to facilitate metadata review among dataset authors, and then copied and pasted into the metadata form of the selected IPT for dataset submission.

Resource identity

Title *

Descriptive title of the resource.

Short Name

Short name of the resource.

Publishing Organization *

Organization responsible for publishing the resource.

Resource type

Type *

Subtype

Update Frequency *

Languages and licence

Data Language *

Metadata Language *

Data Licence *

Narrative fields

Description *

Overview of the resource.

Maintenance

General description of resource maintenance.

Maintenance Description

Description of the maintenance frequency.

Resource contacts

Number of contacts

Acknowledgements

Metadata that acknowledges funders and contributors

Acknowledgements to funders and main contributors.

Bounding box

West *

East *

South *

North *

Description *

Geographic description of the study area.

Taxonomic scope

Description

Description of the taxonomic scope.

Kingdom

Order

Temporal extent

Start Date

End Date

Extended narrative

Purpose

Introduction

Getting Started

Keyword sets

Number of keyword groups

Project description

Project Title

Funding

Study Area Description

Design Description

Project personnel

Number of project personnel

Sampling methodology

Study Extent *

Sampling Description *

Quality Control

Step Description *

Main citation

Resource Citation *

Resource Citation Identifier

Bibliographic citations

Number of bibliographic citations

Collections

Number of collections

Specimen preservation methods

Number of preservation methods

Curatorial units

Number of curatorial unit entries

Links

Resource Homepage

Number of other data formats

Additional metadata

Date First Published

Date Last Published

Resource Logo URL

Additional Information

Alternative identifiers

Number of alternative identifiers

Export metadata

Export workbook

The AURORA project

The AURORA R Shiny application was developed as a core output of the project AURORA - bringing deep-sea biodiversity data to light , an initiative dedicated to streamline the mobilization of deep-sea biodiversity data.

Although the deep sea covers 65% of the planet, it remains Earth’s least understood biome. Within European marine waters, deep-sea data (depths >200 m) account for only 11% of records in the Ocean Biogeographic Information System (OBIS), dropping to a mere 1% for depths below 3500 meters.

To address this knowledge gap, the Digital Marine Biodiversity Lab at the University of Aveiro is mobilizing its collections of deep-sea data, starting with the Aurora seamount on the Gakkel Ridge in the Central Arctic Ocean.

This data mobilisation to open repositories, such as EMODnet Biology, will directly contribute to the development of the European Digital Twin of the Ocean.

Disclaimer

This application is provided as a tool to assist in the standardization of biodiversity datasets into Darwin Core (DwC) archives. While every effort is made to ensure the accuracy of the underlying mapping logic and transformation scripts, the outputs are provided "as is" without any guarantees of completeness, accuracy, or fitness for a specific purpose.

How to cite

Magnenti, Á. S., Araújo, S. M., … Matos, F. L. (2026). Aurora Shiny App: Streamlining biodiversity data sharing (Version 1.0).

http://bio-shiny.ua.pt:3838/aurora

License: CC-BY 4.0 — This work may be shared and adapted, including for commercial purposes, provided appropriate credit is given.

Support and bug reports

Technical support: For technical support or to suggest new feature enhancements for the AURORA Shiny App, please reach out to us directly via email at fmatos@ua.pt .

Bug reports: If you encounter any bugs or functional glitches, please report them by opening a new issue on the AURORA GitHub repository: AURORA GitHub repository .

Institutional support

Participating institutions logos

Funding

AURORA project is funded by the Flanders Marine Institute (VLIZ) through the DTO-BioFlow project funded by the European Union.