This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Getting data with OpenDataBio-R

Getting data using the OpenDataBio R client

    The Opendatabio-R package was created to allow users to interact with an OpenDataBio server, to both obtain (GET) data or to import (POST) data into the database. This tutorial is a basic example of how to get data.

    Set up the connection

    1. Set up the connection to the OpenDataBio server using the odb_config() function. The most important parameters for this function are base_url, which should point to the API url for your OpenDataBio server, and token, which is the access token used to authenticate your user.
    2. The token is only need to get data from datasets that have one of the restricted access policies. Data from datasets of public access can be extracted without the token specification.
    3. Your token is avaliable in your profile in the web interface
    library(opendatabio)
    base_url="https://opendb.inpa.gov.br/api"
    token ="GZ1iXcmRvIFQ"
    cfg = odb_config(base_url=base_url, token = token)
    

    More advanced configuration involves setting a specific API version, a custom User Agent, or other HTTP headers, but this is not covered here.

    Test your connection

    The function odb_test() may be used to check if the connection was successful, and whether your user was correctly identified:

    odb_test(cfg)
    #will output
    Host: https://opendb.inpa.gov.br/api/v0
    Versions: server 0.9.1-alpha1 api v0
    $message
    [1] "Success!"
    
    $user
    [1] "admin@example.org"
    

    As an alternative, you can specify these parameters as systems variables. Before starting R, set this up on your shell (or add this to the end of your .bashrc file):

    export ODB_TOKEN="YourToken"
    export ODB_BASE_URL="https://opendb.inpa.gov.br/api"
    export ODB_API_VERSION="v0"
    

    GET Data

    See the GET API Quick Reference for a complete list of endpoints and request parameters. Also see the generic parameters, especially save_job which is important for downloading large datasets.

    For publicly accessible data the token is optional. Below are some examples. Follow a similar reasoning to use the other endpoints. See the R package help for all available odb_get_{endpoint} functions.

    Getting Taxon names

    See GET API Taxon Endpoint request parameters and a list of response fields.

    base_url="https://opendb.inpa.gov.br/api"
    cfg = odb_config(base_url=base_url)
    #get id for a taxon
    mag.id = odb_get_taxons(params=list(name='Magnoliidae',fields='id,name'),odb_cfg = cfg)
    #use this id to get all descendants of this taxon
    odb_taxons = odb_get_taxons(params=list(root=mag.id$id,fields='id,scientificName,taxonRank,parent_id,parentName'),odb_cfg = cfg)
    head(odb_taxons)
    

    If the server used the seed data provided and the default language is portuguese, the result will be:

      id scientificName taxonRank parent_id  parentName
    1 25    Magnoliidae     Clado        20 Angiosperms
    2 43     Canellales     Ordem        25 Magnoliidae
    3 62       Laurales     Ordem        25 Magnoliidae
    4 65    Magnoliales     Ordem        25 Magnoliidae
    5 74      Piperales     Ordem        25 Magnoliidae
    6 93  Chloranthales     Ordem        25 Magnoliidae
    

    Getting Locations

    See GET API Location Endpoint request parameters and a list of response fields.

    Get some fields listing all Conservation Units (adm_level==99) registered in the server:

    base_url="https://opendb.inpa.gov.br/api"
    cfg = odb_config(base_url=base_url)
    odblocais = odb_get_locations(params = list(fields='id,name,parent_id,parentName',adm_level=99),odb_cfg = cfg)
    head(odblocais)
    

    If the server used the seed data provided and the default language is portuguese, the result will be:

    id                                                           name
    1 5628                              Estação Ecológica Mico-Leão-Preto
    2 5698          Área de Relevante Interesse Ecológico Ilha do Ameixal
    3 5700 Área de Relevante Interesse Ecológico da Mata de Santa Genebra
    4 5703     Área de Relevante Interesse Ecológico Buriti de Vassununga
    5 5707                                Reserva Extrativista do Mandira
    6 5728                                   Floresta Nacional de Ipanema
    parent_id parentName
    1         6  São Paulo
    2         6  São Paulo
    3         6  São Paulo
    4         6  São Paulo
    5         6  São Paulo
    6         6  São Paulo
    

    Get the plots imported in the import locations tutorial. To obtain a spatial object in R, use the readWKT function of the rgeos package.

    library(rgeos)
    library(opendatabio)
    base_url="https://opendb.inpa.gov.br/api"
    cfg = odb_config(base_url=base_url)
    
    locais = odb_get_locations(params=list(adm_level=100),odb_cfg = cfg)
    locais[,c('id','locationName','parentName')]
    colnames(locais)
    for(i in 1:nrow(locais)) {
      geom = readWKT(locais$footprintWKT[i])
      if (i==1) {
        plot(geom,main=locais$locationName[i],cex.main=0.8)
        axis(side=1,cex.axis=0.5)
        axis(side=2,cex.axis=0.5,las=2)
      } else {
        plot(geom,main=locais$locationName[i],add=T,col='red')
      }
    }
    

    Figure generated:

    Getting Individual Data

    See GET API Individual Endpoint for the full list of search parameters and response fields.

    library(opendatabio)
    base_url = "https://opendb.inpa.gov.br/api"
    token = "YOUR TOKEN HERE"
    
    # Set the connection configuration
    cfg = odb_config(base_url = base_url, token = token)
    
    # DIRECT DOWNLOAD – if you want to download a small amount of data
    inds = odb_get_individuals(params = list(limit = 100), odb_cfg = cfg)
    
    # PREPARE FILE ON SERVER – if your query will return a large number of records
    # Download all records you have access to or public ones
    # Save the process, since the result is likely to be large
    jobid = odb_get_individuals(params = list(save_job = TRUE), odb_cfg = cfg)
    # Check the status of the job
    odb_get_jobs(params = list(id = jobid$job_id), odb_cfg = cfg)
    # When it finishes, get the data here (or download the file via the web interface)
    all_inds = odb_get_jobs(params = list(id = jobid$job_id), odb_cfg = cfg)
    
    # FETCHING SPECIFIC DATA
    
    # All individuals identified as taxon X
    params = list(taxon = "Licaria cannela tenuicarpa")
    licarias = odb_get_individuals(params = params, odb_cfg = cfg)
    
    # All individuals identified as taxon X or its descendants
    params = list(taxon_root = "Licaria")
    licarias = odb_get_individuals(params = params, odb_cfg = cfg)
    
    # All individuals from dataset X
    params = list(dataset = "MyDataset name or id")
    inds = odb_get_individuals(params = params, odb_cfg = cfg)
    # Or use save_job above if the dataset is large
    
    # You can view the list of available datasets
    datasets = odb_get_datasets(odb_cfg = cfg)
    

    Getting Measurements

    See GET API Measurement Endpoint for the complete list of query parameter options and response fields.

    Use the odb_get_measurements function.

    library(opendatabio)
    base_url="https://opendb.inpa.gov.br/api"
    token="YOUR TOKEN HERE"
    
    #establishes the connection configuration
    cfg = odb_config(base_url=base_url, token = token)
    
    #100 first measurements of the dataset X with id=10
    measurements = odb_get_measurements(params=list(dataset=10,limit=100),odb_cfg=cfg)
    
    #100 first measurements of the dataset X with id=10 for the variable whose export_name is treeDbh
    measurements = odb_get_measurements(params=list(trait="treeDbh",dataset=10,limit=100),odb_cfg=cfg)
    
    #Measurements of the dataset X with id=10 for the variable whose export_name is treeDbh
    #only for Lauraceae
    measurements = odb_get_measurements(params=list(trait="treeDbh",dataset=10,taxon_root="Lauraceae"),odb_cfg=cfg)
    
    #linking data of individuals measurements
    laurels = odb_get_individuals(params=list(dataset=10,taxon_root="Lauraceae"),odb_cfg=cfg)
    filter = grep("Individu",measurements$measured_type) #optional, depends on what is in measurements
    g = match(measurements$measured_id[filter],laurels$id)
    measurements$location = NA
    measurements$location[filter] = laurels$locationName[g]
    

    Getting Media

    See GET API Media Endpoint for the complete list of query parameter options and response fields.

    Use the odb_get_media function from the R package.

    library(opendatabio)
    base_url="https://opendb.inpa.gov.br/api"
    token="YOUR TOKEN HERE"
    
    #set the connection configuration
    cfg = odb_config(base_url=base_url, token = token)
    
    #the first 50 media files of a dataset that has images
    imgs = odb_get_media(params=list(dataset=97,limit=50),odb_cfg=cfg)
    
    #see this metadata
    head(imgs)
    
    #from this metadata, download the media files
    #create a function for this:
    getImagesByURL <- function(url,downloadFolder='img') {
    dir.create(downloadFolder,showWarnings = F)
    fn = strsplit(url,"\\/")[[1]]
    fn = fn[length(fn)]
    nname = paste(downloadFolder,fn,sep="/")
    img = httr::GET(url=url) 
    writeBin(httr::content(img, "raw"), nname)
    }
    #use the function to download images to a folder
    sapply(imgs$file_url,getImagesByURL,downloadFolder='testeImgsFromOdb') 
    

    Getting Voucher Data

    See GET API Voucher Endpoint for the full list of search parameter options and response fields.

    Follow the example above, but use the odb_get_vouchers function.

    library(opendatabio)
    base_url="https://opendb.inpa.gov.br/api"
    token="YOUR TOKEN HERE"
    
    #establishes the connection configuration
    cfg = odb_config(base_url=base_url, token = token)
    
    #first 100 vouchers registered in a biocollection
    vouchers = odb_get_vouchers(params=list(biocollection="INPA",limit=100),odb_cfg=cfg)
    
    #vouchers in location x (id, or name, as registered in the database)
    vouchers = odb_get_vouchers(params=list(location="Reserva Florestal Adolpho Ducke, Parcela PDBFF-100ha",limit=100),odb_cfg=cfg)