POST data

How to import data into OpenDataBio!

20 minute read

Importing data

The OpenDataBio-R package is a client for this API and allows importing, downloading and updating data from R language objects.
Examples of using the R package here
Data can be imported via web interface through delimited data files, where column names are POST parameters listed on this page
Authentication token is required for all POST

Custom structured data in the `notes` field

The notes field of any model is for plain text or text formatted as a JSON object containing structured data. The Json option allows you to store custom structured data in any model that has the notes field. You can, for example, store some secondary fields from original sources as notes when importing data, but you can store any additional data that is not provided by the OpenDataBio database framework. This data will not be validated nor searchable by OpenDataBio and the standardization of tags and values is up to you. Json notes will be imported and exported as JSON text and will be presented in the interface as a formatted table; URLs in your Json will be presented as links in this table.

POST BibReferences

To import bibliographic references, two options with precedence over DOI:

doi - the DOI number or URL of the reference to be registered
bibtex - alternatively, a bibliographic record in the format bibtex

POST Biocollections

To import biocollection names:

name - the full name of the biological collection
acronym - the acronym or acronym of the biological collection

POST Individuals

Request fields allowed when importing individuals:

collector=mixed - required - persons ‘id’, ‘abbreviation’, ‘full_name’, ’email’; if multiple persons, separate values in your list with pipe | or ; because commas may be present within names. Main collector is the first on the list;
tag=string - required - the individual number or code (if the individual identifier is as MainCollector+Number, this is the field for Number);
dataset=mixed - required - name or id of the Dataset;
date=YYYY-MM-DD or array - the date the individual was recorded/tagged, for historical records you may inform an incomplete string in the form “1888-05-NA” or “1888-NA-NA” when day and/or month are unknown. You may also inform as an array in the form “date={ ‘year’ : 1888, ‘month’: 5}”. OpenDataBio deals with incomplete dates, see the IncompleteDate Model. At least year is required.
notes - any annotation for the Individual, plain text or data in JSON;

Location fields (one or multiple locations may be informed for the individual). Possible fields are:

location - the Individual’s location name or id required if longitude and latitude are not informed
latitude and longitude- geographical coordinates in decimal degrees; required if location is not informed
altitude - the Individual location elevation (altitude) in meters above see level. Must be a integer value;
location_notes - any note for the individual location, plain text or data in JSON;
location_date_time - if different than the individual’s date, a complete date or a date+time value for the individual first location. Mandatory for multiple locations;
x - if location is of Plot type, the x coordinate of the individual in the location;
y - if location is of Plot type, the y coordinate of the individual in the location;
distance - if location is of POINT type, the individual distance in meters from the location;
angle - if location is of POINT type, the individual azimuth (angle) from the location;

Identification fields. Identification is not mandatory, and may be informed in two different ways: (1) self identification - the individual may have its own identification; or (2), other identification - the identification is the same as that of another individual (for example, from an individual having a voucher in some biocollection).

For (self) identification at least taxon and identifier must be informed. The list of possible fields are:
- taxon=mixed - name or id of the identified taxon, e.g. ‘Ocotea delicata’ or its id
- identifier=mixed - persons responsible for the taxonomic identification. persons ‘id’, ‘abbreviation’, ‘full_name’, ’email’; if multiple persons, separate values in your list with pipe | or ; because commas may be present within names.
- identification_date or identification_date_year, identification_date_month, and/or identification_date_day - complete or incomplete. If empty, the individual’s date is used;
- modifier - name or number for the identification modifier. Possible values ’s.s.’=1, ’s.l.’=2, ‘cf.’=3, ‘aff.’=4, ‘vel aff.’=5, defaults to 0 (none).
- identification_notes - any identification notes, plain text or data in JSON;
- identification_based_on_biocollection - the biocollection name or id if the identification is based on a reference specimen deposited in an biocollection
- identification_based_on_biocollection_id - only fill if identification_based_on_biocollection is present;
If the identification is other:
- identification_individual - id or fullname (organimsID) of the Individual having the identification.

If the Individual has Vouchers with the same Collectors, Date and CollectorNumber (Tag) as those of the Individual, the following fields and options allow to store the vouchers while importing the Individual record (alternatively, you may import voucher after importing individuals using the Voucher EndPoint. Vouchers for the individual may be informed in two ways:

As separate string fields:

biocollection - A string with a single value or a comma separated list of values. Values may be the id or acronym values of the Biocollection Model. Ex: “{ ‘biocollection’ : ‘INPA;MO;NY’}” or “{ ‘biocollection’ : ‘1,10,20’}”;
biocollection_number - A string with a single value or a comma separated list of values with the BiocollectionNumber for the Individual Voucher. If a list, then must have the same number of values as biocollection;
biocollection_type - A string with a single numeric code value or a comma separated list of values for Nomenclatural Type for the Individual Vouchers. The default value is 0 (Not a Type). See nomenclatural types list.

AS a single field biocollection containing an array with each element having the fields above for a single Biocollection: “{ { ‘biocollection_code’ : ‘INPA’, ‘biocollection_number’ : 59786, ‘biocollection_type’ : 0}, { ‘biocollection_code’ : ‘MG’, ‘biocollection_number’ : 34567, ‘biocollection_type’ : 0} }”

POST Individual-locations

The individual-locations endpoint allows importing multiple locations for registered individuals. Designed for occurrences of organisms that move and have multiple locations.

Possible fields are:

individual - the Individual’s id required
location - the Individual’s location name or id required OR longitude+latitude
latitude and longitude- geographical coordinates in decimal degrees; required if location is not informed
altitude - the Individual location elevation (altitude) in meters above see level;
location_notes - any note for the individual location, plain text or data in JSON;
location_date_time - if different than the individual’s date, a complete date or a date+time (hh:mm:ss) value for the individual location. required
x - if location is of Plot type, the x coordinate of the individual in the location;
y - if location is of Plot type, the y coordinate of the individual in the location;
distance - if location is of POINT type (or latitude and longitude are informed), the individual distance in meters from the location;
angle - if location is of POINT type, the individual azimuth (angle) from the location

POST Locations

The locations endpoints interact with the locations table. Use to import new locations.

Attention

ODB Locations are stored with a parent-child relationship, assuring validations and facilitating queries. Parents will be guessed using the location geometry. If parent is not informed, the imported location must be completely contained by a registered parent (using sql ST_WITHIN function to detect parent). However, if a parent is informed, the importation may also test if the geometry fits a buffered version of the parent geometry, thus ignoring minor geometries overlap and shared borders. Countries can be imported without parent relations. Any other location must be registered within at least a ‘country’ as parent. If the record is marine, and falls outside of a registered country polygon, a ‘ismarine’ argument must be indicated to accept the non-spatial parent relationship.

Subplots - a plot location within a plot location - is the only situation in which a geometry is not needed. If not informed, the geometry will be calculated based on the parent plot geometry and the subplot startx and starty coordinates.

Make sure your geometry projection is EPSG:4326 WGS84. Use this standard!

Available POST fields:

name - the location name - required (parent+name must be unique in the database)
adm_level - must be numeric, see location get api - required
geometry use either: required
- geom for a WKT representation of the geometry, POLYGON, MULTIPOLYGON, POINT OR LINESTRING allowed;
- lat and long for latitude and longitude in decimal degrees (use negative numbers for south/west).
altitude - in meters
datum - defaults to ‘EPSG:4326-WGS 84’ and your are strongly encourage of importing only data in this projection. You may inform a different projection here;
parent - either the id or name of the parent location. The API will detect the parent based on the informed geometry and the detected parent has priority if informed is different. However, only when parent is informed, validation will also test whether your location falls within a buffered version of the informed parent, allowing to import locations that have a parent-child relationship but their borders overlap somehow (either shared borders or differences in georeferencing);
when location is plot (adm_level=100), optional fields are:
- x and y for the plot dimensions in meters(defines the Cartesian coordinates)
- startx and starty for start position of a subplot in relation to its parent plot location;
notes - any note you wish to add to your location, plain text or data in JSON;
azimuth - apply only for Plots and Transects when registered with a POINT geometry - azimuth will be used to build the geometry. For plots the point coordinate refer to the 0,0 vertice of the plot polygon that will be build clockwise starting from the informed point, the azimuth and the y dimension. For transects, the informed point coordinates are the start point and a linestring will be build using this azimuth and x dimension.
ismarine - to permit the importation of location records that not fall within any register parent location you may add ismarine=1. Note, however, that this allows you to import misplaced locations. Only use if your location is really a marine location that fall outside any Country border;

alternatively: you may just submit a single column named geojson containing a Feature record, with its geometry and having as ‘properties’ at least tags name and adm_level (or admin_level). See geojson.org. This is usefull, for example, to import country political boundaries (https://osm-boundaries.com/).

POST Locations-validation

The locations-validation endpoint allows you to validate geographic coordinates, i.e., it validates the geometry of each point and searches for related geographic areas or areas that contain the point from the locations registered in an OpenDatabio database. This is useful, for example, to validate data on the occurrence of individuals prior to an import.

You need the following information:

latitude - in decimal degrees (use negative numbers for south).
longitude - in decimal degrees (use negative numbers for west).

Response

For each unique coordinate, it will return one or more lines with different locations where the point is found, adding the following columns:

withinLocationName - name of the most inclusive location where the point is located
withinLocationID - id of the most inclusive location in the OpendDataBio database
withinLocationParent - name of the parent area of the most inclusive location
withinLocationCountry - name of the country of the most inclusive location
withinLocationHigherGeography - the hierarchy of the locations where the point is located
withinLocationType - the type of location, or administrative level
withinLocationTypeAdmLevel - the numeric value of withinLocationType
searchObs - observations such as ’location not found’

POST Measurements

The measurements endpoint allows to import measurements.

The following fields are allowed in a post API:

dataset=number - the id of the dataset where the measurement should be placed required
date=YYYY-MM-DD - the observation date for the measurement, must be passed as YYYY-MM-DD required
object_type=string - either ‘Individual’,‘Location’,‘Taxon’ or ‘Voucher’, i.e. the object from where the measurement was taken required
object_id=number - the id of the measured object, either (individuals.id, locations.id, taxons.id, vouchers.id) required
person=mixed persons responsible for the measurements. You may inform ‘id’, ‘abbreviation’, ‘full_name’ or ’email’. If multiple persons, separate values in your list with pipe | or ; because commas may be present within names. required
trait_id=number or string - either the id or export_name for the measurement. required
value=number, string list - this will depend on the trait type, see tutorial required, optional for trait type LINK
link_id=number - the id of the linked object for a Trait of type Link required if trait type is Link DEPRECATED REPLACED BY location
location or location_id - the id or name of the Location of a measurement related to a Taxon. Only Taxons can also have a location associated with them. This replaces the Link-type Trait logic.
bibreference=number - the id of the BibReference for the measurement. Should be use when the measurement was taken from a publication
notes - any note you whish. In same cases this is a usefull place to store measurement related information. For example, when measuring 3 leaves of a voucher, you may indicate here to which leaf the measurement belongs, leaf1, leaf2, etc. allowing to link measurements from different traits by this field. Plain text or data in JSON;
duplicated - by default, the import API will prevent duplicated measurements for the same trait, object and date; specifying duplicated=#, where # is the stored number of records+1. For example, there is already one record, you must inform duplicated=2 if there are two, a third can be stored using duplicated=3.
parent_measurement = number only - the ‘id’ of another measurement which is the parent of the measurement being imported. This creates a relationship between these measurements. The ‘id’ will be validated and must be from a measurement belonging to the same object, same date and different variable. E.g. leaf width may be linked to a measurement of leaf length.

POST Media

The media endpoint interacts with the Media table, and the import process works the same way as batch uploading via the web interface.

The API should receive a ZIP file containing the images to be imported, along with one CSV file, which may include the following attributes for each image:

filename – required: the exact name of the image file
object_type – required: must be one of the following values: ['Individual', 'Voucher', 'Location', 'Taxon']
object_id – required: the numeric ID of the object_type related to the image
collector – a list of registered collector numbers or names who will be considered the authors of the image; can be persons.id, abbreviation, full_name, or email. If multiple people are listed, separate values with a vertical bar | or semicolon ; (since names may contain commas).
tags – a list of numeric IDs or names of Tags, i.e., keywords describing the image content; separated by ; or |
date – the date of the image in YYYY-MM-DD format; if not provided, the import date will be used
title_en and/or title_pt – one or more columns with the title of the image in the registered languages
license – one of the following public licenses from CreativeCommons.org:
['CC0', 'CC-BY', 'CC-BY-SA', 'CC-BY-ND', 'CC-BY-NC', 'CC-BY-NC-SA', 'CC-BY-NC-ND']
If not specified, the default will be CC-BY-SA
dataset – the name (acronym) or ID of the dataset to which the media will be linked — the media access permissions will match those of the dataset
project – the name (acronym) or ID of the project to associate with the media
notes – any general note about the file content
location – the ID or name of the Location to link the media to — for media linked to Taxon, you can alternatively use latitude and longitude
longitude – longitude in decimal degrees (negative for West) — for media linked to Taxon
latitude – latitude in decimal degrees (negative for South) — for media linked to Taxon

POST Persons

The persons endpoint interact with the Person table. The following fields are allowed when importing persons using the post API:

full_name - person full name, required
abbreviation - abbreviated name, as used by the person in publications, as collector, etc. (if left blank, a standard abbreviation will be generated using the full_name attribute – abbreviation must be unique within a ODB installation);
email - an email address,
institution - to which institution this person is associated;
biocollection - name or acronym of the Biocollection to which this person is associated.

POST Taxons

Use to import new taxon names.

The POST API requires ONLY the full name of the taxon to be imported, i.e. for species or below species taxons the complete name must be informed (e.g. Ocotea guianensis or Licaria cannela aremeniaca). The script will validate the name retrieving the remaining required info from the nomenclatural databases using their API services. It will search GBIF and Tropicos if the case and retrieve taxon info, their ids in these repositories and also the full classification path and senior synonyms (if the case) up to when it finds a valid record name in this ODB database. So, unless you are trying to import unpublished names, just submit the name parameter of the list below.

Possible fields:

name - taxon full name required, e.g. “Ocotea floribunda” or “Pagamea plicata glabrescens”
level - may be the numeric id or a string describing the taxonRank recommended for unpublished names
parent - the taxon’s parent full name or id - note - if you inform a valid parent and the system detects a different parent through the API to the nomenclatural databases, preference will be given to the informed parent; required for unpublished names
bibreference - a textual reference in which the taxon was published;
bibkey - a registered BibReference - either the numeric id or the bibkey
author - the taxon author’s name;
author_id or person - the registered Person name, abbreviation, email or id, representing the author of unpublished names - required for unpublished names
valid - boolean, true if this taxon name is valid; 0 or 1
mobot - Tropicos.org id for this taxon
ipni - IPNI id for this taxon
mycobank - MycoBank id for this taxon
zoobank - ZOOBANK id for this taxon
gbif - GBIF nubKey for this taxon

POST Traits

When entering few traits, it is strongly recommended that you enter traits one by one using the Web Interface form, which reduces the chance of duplicating trait definitions.

The traits endpoint interact with the Trait table. The POST method allows you to batch import traits into the database and is designed for transferring data to OpenDataBio from other systems, including Trait Ontologies.

As noted under the Trait Model description, it is important that one really checks whether a needed Trait is not already in the DataBase to avoid multiplication of redundant traits. The Web Interface facilitates this process. Through the API, OpenDataBio only checks for identical export_name, which must be unique within the database. Note, however, that Traits should also be as specific as possible for detailed metadata annotations.
Traits use User Translations for names and descriptions, allowing a multiple-languages

Fields allowed for the traits/ post API:

export_name=string - a short name for the Trait, which will be used during data exports, are more easily used in trait selection inputs in the web-interface and also during data analyses outside OpenDataBio. Export names must be unique and have no translation. Short and CamelCase export names are recommended. Avoid diacritics (accents), special characters, dots and even white-spaces. required
type=number - a numeric code specifying the trait type. See the Trait Model for a full list. required
objects=list - a list of the Core objects the trait is allowed to be measured for. Possible values are ‘Individual’, ‘Voucher’, ‘Location’ and/or ‘Taxon’, singular and case sensitive. Ex: “{‘object’: ‘Individual,Voucher’}”; required
name=json - see translations below; required
description=json - see translations below; required
Trait specific fields:
- unit=string - required for quantitative traits only (the unit o measurement). This must be either a code or a name (in any language) of a unit already stored in the database. Units can only be define through the web interface.
- range_min=number - optional for quantitative traits. specify the minimum value allowed for a Measurement.
- range_max=number - optional for quantitative. maximum allowed value for the trait.
- categories=json - required for categorical and ordinal traits; see translations below
- wavenumber_min and wavenumber_max - required for spectral traits = minimum and maximum WaveNumber within which the ‘value_length’ absorbance or reflectance values are equally distributed. May be informed in range_min and range_max, priority for prefix wavenumber over range if both informed.
- value_length - required for spectral traits = number of values in spectrum
- link_type- required for Link traits - the class of link type, fullname or basename: eg. ‘Taxon’ or ‘App\Models\Taxon’.
bibreference=mix - the id(s) or bibkey(s) of a BibReference already stored in the database, separated by ‘|’ or ‘;’
parent - id or export_name of another Trait to which the current trait depends on. If you indicate a trait here, this means you add a RESTRICTION on the validation of the measurements. Adding a Measurement for the current trait will DEPEND on the database having a measurement for the trait here indicated, for the same object and same date. Example, you create a trait called POM (point of measurement) for recording the height on a tree where you measure a DBH (diameter at breast height). Adding DBH as a trait on which POM depends upon, means you can only add POM if there is a DBH value for the same tree on the same date.

Translations

Fields name, description must have the following structure to account for User Translations. They should be a list with the language as ‘keys’. For example a name field may be informed as:

using the Language code as keys:

 [
   {"en": "Diâmetro na altura do peito"," pt-br": "Diâmetro a Altura do Peito"}
 ]

or using the Language ids as keys:

 [
   {"1":"Diâmetro à altura do peito","2":"Diâmetro a Altura do Peito"}
 ]

or using the Language names as keys:

 [
   {"English":"Diameter at Breast Height","Portuguese": "Diâmetro a Altura do Peito"}
 ]

Alternatively, you can add the information as separate parameters. Instead of name you can use name.LANGUAGE_CODIGO_ou_ID, for example name.en or name.1 for the name in English and name.pt-br or name.2 for the name in Portuguese. Likewise for description: description.en or description.1, etc.
Field categories must include for each category+rank+lang the following fields:
- lang=mixed - the id, code or name of the language of the translation, required
- name=string - the translated category name required (name+rank+lang must be unique)
- rank=number - the rank for ordinal traits; for non-ordinal, rank is important to indicate the same category across languages, so may just use 1 to number of categories in the order you want. required
- description=string - optional for categories, a definition of the category.
- Example for categories:
```
  [
    {"lang":"en","rank":1,"name":"small","description":"smaller than 1 cm"},
    {"lang":"pt-br","rank":1,"name":"pequeno","description":"menor que 1 cm"}
    {"lang":"en","rank":1,"name":"big","description":"bigger than 10 cm"},
    {"lang":"pt-br","rank":1,"name":"grande","description":"maior que 10 cm"},
  ]
```
Valid languages may be retrieved with the Language API.

POST Vernaculars

The vernaculars endpoint interacts with the Vernacular table, which contains information on common names to be related to Taxon and/or Individual records.

The following fields are allowed in the data import:

name - required - the common name to be recorded;
language - required - the name, code or id of the language related to the common name. See languages endpoint
taxons = list - scientific names or ids of taxons. See taxons endpoint
individuals = list - fullnames, ids or uuids of individuals. See individuals endpoint
citations = list - one or more bibliographic citations, which may have the following fields:
citation = string - the text of the citation
bibreference = mixed - the id or bibkey of the bibliographic reference that must be previously registered See bibreferences endpoint
notes - any annotation
parent - the id or name of another Vernacular to inform variants of popular names
type - one of ‘use’ or ‘generic’ or ’etimology’

POST Vouchers

The vouchers endpoint interact with the Voucher table.