Sampling Event Data

Introduction

Sampling-event data provides detailed information about species occurrences in specific locations and times, along with the sampling methods used. These datasets follow standardized protocols, allowing researchers to assess species abundance, community composition, and compare data across different times and places. Examples include vegetation transects, bird censuses, monitoring data, and environmental DNA samples. Unlike opportunistic observations, sampling-event data is quantitative and calibrated, making it useful for detecting changes and trends in populations. By including details of the sampling effort and protocol, researchers can even infer species absence and better understand biodiversity patterns. These datasets include the same basic descriptive information included under Resource metadata and the same standard elements as in Occurrence Data.

How to transform your data into sampling event data

Ultimately your data needs to be transformed into two tables using Darwin Core (DwC) term names as column names: one table of sampling events and another table of species occurrences derived from (associated to) each sampling event.

Try putting your data into the Excel template, which includes two sheets: one for sampling events and another for associated species occurrences.

Alternatively, if your data is stored in a supported database, you can write two SQL tables (views) using DwC column names: one for sampling events and another for associated species occurrences.

Each sampling event record should include all required DwC fields and as many recommended DwC fields as possible. You can augment your table with extra DwC columns, but only DwC terms from this list.

Similarly, each species occurrence record should include all required DwC fields and as many recommended DwC fields as possible. You can augment your table with extra DwC columns, but only DwC terms from this list. Some DwC terms will be redundant meaning they are added to both sampling event and species occurrence records. As a general rule, try not to add redundant terms with the same values. It is fine if they have different values though, for example, if you wanted to define a location of an event and then define more specific locations for individual occurrences. Otherwise, when the location of individual occurrences isn’t supplied, its location gets inherited from the event.

For extra guidance, you can refer to the guide Best Practices in Publishing Sampling-event data and look at the template populated with example data or the list of exemplar datasets.

Templates

Populate it and upload it to the IPT. Try to augment it with as many GBIF-supported Event Core terms and extensions as you can. You can augment your core table with extra DwC columns, but only DwC terms from this list.

Required and recommended DwC fields

You can review the required and recommended fields in the Data quality requirements for sampling events.

Exemplar datasets

FAQ

Q. How do I indicate that a sampling event was part of a time series?

A. All sampling events at the same location must share the same locationID.

Q. How do I publish a hierarchy of events (recursive data type) using parentEventID?

A. The classic example is sub-sampling of a larger plot. To group all (child) sub-sampling events under the (parent) sampling event, the parentEventID of all sub-sampling events must be set to the eventID of the (parent) sampling event. To be valid, all parentEventIDs must reference eventIDs of records defined in the same dataset. Otherwise, the parentEventID must be a globally unique identifier (e.g. DOI, HTTP URI, etc) that resolves to an event record described elsewhere. Ideally, all (child) sub-sampling events share the same date and location as the (parent) event it references.

Q. How do I publish absence data?

A. Step 1: Make species absences explicit by adding a species occurrence record for each species that could have been observed at the time and place of sampling, but was not observed, by setting the following fields:

Mandatory:

occurrenceStatus=absent

Optional (provide one or both):

individualCount="0"
organismQuantity and organismQuantityType pair: "0", "individuals"

Alternatively, include sampling event records even if the sampling yielded no derived species occurrences. This allows species absences to be inferred. This example sampling event dataset from Norway (also on GBIF.org) demonstrates how this looks.

Step 2: Define the taxonomic scope of all sampling events included in the dataset, it is recommended to publish a timestamped checklist together with the sampling event dataset, which represents the species composition that could be observed at the time and place of sampling given the sampling protocol (and/or the taxonomic coverage of the study and the expertise of the personnel carrying out identification). This would allow for accurate presence/absence data to be recorded. In addition to the normal (expected) species composition, the checklist could include invasive (unexpected) species. For taxonomic and biogeographical/ecological reasons, however, this checklist would exist solely within the context of the sampling event dataset.

Instructions on how to create a checklist can be found here. Detailed metadata should be included with the checklist describing a) the people who performed the identifications and their taxonomic expertise and b) how it was decided that these species were detectable & identifiable at the time and place of sampling.

To link the checklist to the sampling event dataset, add the checklist to the dataset metadata in the External links section.