metadata

INSDC Sequences

Versão mais recente publicado por Test Organization #1 em 24 de Junho de 2021 Test Organization #1
This dataset contains a subset of INSDC sequences available from the the public ENA API (https://www.ebi.ac.uk/ena/portal/api/). The dataset is prepared periodically. This dataset doesn't contain any record associated with environmental sample identifiers or records associated with any host organism. Records with environmental sample identifiers and host organism can be found in separate datasets published by EMBL-EBI. This translates into the following parameters when querying data with the ENA portal API search: `environmental_sample=False & host=""`... Mais
Publication date:
24 de Junho de 2021
License:
CC-BY 4.0

Descrição

This dataset contains a subset of INSDC sequences available from the the public ENA API (https://www.ebi.ac.uk/ena/portal/api/). The dataset is prepared periodically. This dataset doesn't contain any record associated with environmental sample identifiers or records associated with any host organism. Records with environmental sample identifiers and host organism can be found in separate datasets published by EMBL-EBI. This translates into the following parameters when querying data with the ENA portal API search: `environmental_sample=False & host=""`

The data was then processed as follows: 1. Human sequences were excluded. 2. For non-CONTIG records, the sample accession number (when available) along with the scientific name were used to identify sequence records corresponding to the same individuals (or group of organism of the same species in the same sample). Only one record was kept for each scientific name/sample accession number. 3. Contigs and whole genome shotgun (WGS) records were added individually. 4. The records that were missing some information were excluded. Only records associated with a specimen voucher or records containing both a location AND a date were kept. 5. The records associated with the same vouchers are aggregated together. 6. A lot of records left corresponded to individual sequences or reads corresponding to the same organisms. In practise, these were "duplicate" occurrence records that weren't filtered out in STEP 2 because the sample accession sample was missing. To identify those potential duplicates, we grouped all the remaining records by `scientific_name`, `collection_date`, `location`, `country`, `identified_by`, `collected_by` and `sample_accession` (when available). Then we excluded the groups that contained more than 50 records. The rationale behind the choice of threshold is explained here: https://github.com/gbif/embl-adapter/issues/10#issuecomment-855757978 7. To improve the matching of the EBI scientific name to the GBIF backbone taxonomy, we incorporated the ENA taxonomic information. The kingdom, Phylum, Class, Order, Family, and genus were obtained from the ENA Browser API: https://www.ebi.ac.uk/ena/browser/api/

You can find the mapping used to format the EMBL data to Darwin Core Archive here: https://github.com/gbif/embl-adapter/blob/master/DATAMAPPING.md

Downloads

Baixe a versão mais recente dos metadados como EML ou RTF:

Metadados como um arquivo EML download em English (6 kB)
Metadados como um arquivo RTF download em English (6 kB)

Versões

A tabela abaixo mostra apenas versões de recursos que são publicamente acessíveis.

Direitos

Pesquisadores devem respeitar a seguinte declaração de direitos:

O editor e o detentor dos direitos deste trabalho é Test Organization #1. This work is licensed under a Creative Commons Attribution (CC-BY) 4.0 License.

GBIF Registration

Este recurso foi registrado no GBIF e atribuído ao seguinte GBIF UUID: fe39cd36-6580-4205-bfe2-ca19990521fd.  Test Organization #1 publica este recurso, e está registrado no GBIF como um publicador de dados aprovado por GBIF Secretariat.

Palavras-chave

Metadata

Contatos

Quem criou esse recurso:

European Bioinformatics Institute (EMBL-EBI)

Quem pode responder a perguntas sobre o recurso:

European Bioinformatics Institute (EMBL-EBI)

Quem preencher os metadados:

GBIF Helpdesk

Quem mais foi associado com o recurso:

Usuário
Marie Grosjean

Cobertura Geográfica

Worldwide

Coordenadas delimitadoras Sul Oeste [-90, -180], Norte Leste [90, 180]

Metadados Adicionais

Identificadores alternativos fe39cd36-6580-4205-bfe2-ca19990521fd
https://ipt.gbif.org/resource?r=marie-test-embl-metadata