Occurrence Data

Introduction

Resources which present evidence of the occurrence of a species at a particular place and normally on a specified date. These datasets expand on most Checklist Data because they contribute to mapping the historical or current distribution of a species. At the most basic, such datasets may provide only general locality information (even limited to a country identifier). Ideally they also include coordinates and a coordinate precision to support fine scale mapping. In many cases, these datasets may separately record multiple individuals of the same species. Examples of such datasets include databases of specimens in natural history collections, citizen science observations, data from species atlas projects, etc. If sufficient information exists in the source dataset (or applies consistently to all occurrences in the dataset), it is recommended that these datasets are presented as Sampling Event Data. These datasets include the same basic descriptive information included under Resource metadata.

How to transform your data into occurrence data

flow od

Ultimately your data needs to be transformed into a table structure using Darwin Core (DwC) term names as column names.

Try putting your data into the Excel template, which includes all required DwC fields and recommended DwC fields.

Alternatively if your data is stored in a supported database, you can write an SQL table (view) using DwC column names. Be careful to include all required DwC fields and add as many recommended DwC fields as possible.

For extra guidance, you can look at the exemplar datasets.

You can augment your table with extra DwC columns, but only DwC terms from this list.

Templates

Excel Template Excel Template (with example data)

Populate it and upload it to the IPT. Try to augment it with as many DwC terms as you can.

FAQ

Q. How do I indicate a species was absent?

A. Set occurrenceStatus="absent". In addition, individualCount and organismQuantity should be equal to 0.

Q. How can I generalize sensitive species occurrence data?

A. How you generalize sensitive species data (e.g. restrict the resolution of the data) depends on the species' category of sensitivity. Where there is low risk of perverse outcomes, unrestricted publication of sensitive species data remains appropriate. Note it is the responsibility of the publisher to protect sensitive species occurrence data. For guidance, please refer to this best-practice guide. You could refer to this recent essay in Science, which presents a simplified assessment scheme that can be used to help assess the risks from publishing sensitive species data.

When generalizing data you should try not to reduce the value of the data for analysis, and make users aware how and why the original record was modified using the Darwin Core term informationWithheld.

As indicated in the best-practice guide, you should also publish a checklist of the sensitive species being generalized. For each species you should explain:

  • the rationale for inclusion in the list

  • the geographic coverage of sensitivity

  • its sensitivity category

  • the date to review its sensitivity

This will help alert other data custodians that these species are regarded as potentially sensitive in a certain area and that they should take the sensitivity into account when publishing the results of their analyses, etc.

Helpful formulas for generalizing point location

The following formula obscures a latitude/longitude point by a factor of 5000m. Note pointX and pointY must be provided in 'length in meters' and TRUNC truncates the number to an integer by removing the decimal part:

pointX = TRUNC(pointX / 5000) * 5000
pointY = TRUNC(pointY / 5000) * 5000