RNAget API (1.1.0)

Download OpenAPI specification:Download

GA4GH RNA-seq Working Group: ga4gh-rnaseq@ga4gh.org License: Apache 2.0

Design principles

This API provides a means of retrieving data from several types of RNA experiments including:

  • Feature-level expression data from RNA-seq type measurements
  • Coordinate-based signal/intensity data similar to a bigwig representation

via a client/server model.

Features of this API include:

  • Support for a hierarchical data model which provides the option for servers to associate expression data for discovery and retrieval
  • Support for accessing subsets of expression data through slicing operations on the expression matrix and/or query filters to specify features to be included
  • Support for accessing signal/intensity data by specifying a range of genomic coordinates to be included

Out of the scope of this API are:

  • A means of retrieving primary (raw) read sequence data. Input samples are identified in expression output and data servers should implement additional API(s) to allow for search and retrieval of raw reads. The htsget API is designed for retrieval of read data.
  • A means of retrieving reference sequences. Servers should implement additional API(s) to allow for search and retrieval of reference base sequences. The refget API is designed for retrieval of references sequences.
  • A means of retrieving feature annotation details. Expression matrices provide the identity of each mapped feature. Servers should implement additional API(s) to allow for search and retrieval of genomic feature annotation details.

OpenAPI Description

An OpenAPI description of this specification is available and describes the 1.1.0 version. OpenAPI is an independent API description format for describing REST services and is compatible with a number of third party tools.

Compliance

Implementors can check if their RNAget implementations conform to the specification by using our compliance suite.

Protocol essentials

All API invocations are made to a configurable HTTPS endpoint, receive URL-encoded query string parameters and HTTP headers, and return text or other allowed formatting as requested by the user. Queries containing unsafe or reserved characters in the URL, including but not limited to "&", "/", "#", MUST encode all such characters. Successful requests result with HTTP status code 200 and have the appropriate text encoding in the response body as defined for each endpoint. The server may provide responses with chunked transfer encoding. The client and server may mutually negotiate HTTP/2 upgrade using the standard mechanism.

HTTP responses may be compressed using RFC 2616 transfer-coding, not content-coding.

HTTP response may include a 3XX response code and Location header redirecting the client to retrieve expression data from an alternate location as specified by RFC 7231, clients SHOULD be configured to follow redirects. 302, 303 and 307 are all valid response codes to use.

Responses from the server MUST include a Content-Type header containing the encoding for the invoked method and protocol version. Unless negotiated with the client and allowed by the server, the default encoding is:

Content-Type: application/vnd.ga4gh.rnaget.v1.1.0+json; charset=us-ascii

All response objects from the server are expected to be in JSON format, regardless of the response status code, unless otherwise negotiated with the client and allowed by the server.

Object IDs are intended for persistent retrieval of their respective objects. An object ID MUST uniquely identify an object within the scope of a single data server. It is beyond the scope of this API to enforce uniqueness of ID among different data servers. IDs are strings made up of uppercase and lowercase letters, decimal digits, hypen, period, underscore and tilde [A-Za-z0-9.-_~]. See RFC 3986 § 2.3.

Endpoints are described as HTTPS GET methods which will be sufficient for most queries. Queries containing multiple metadata filters may approach or exceed the URL length limits. To handle these types of queries it is recommended that servers SHOULD implement parallel HTTPS POST endpoints accepting the same URL parameters as a UTF8-encoded JSON key-value dictionary.

When processing requests containing multiple filters and filters with lists of items, the data provider MUST use a logical AND for selecting the results to return.

Internet Media Types Handling

When responding to a request a server MUST use the fully specified media type for that endpoint. When determining if a request is well-formed, a server MUST allow a internet type to degrade like so

  • application/vnd.ga4gh.rnaget.v1.1.0+json; charset=us-ascii
  • application/vnd.ga4gh.rnaget.v1.1.0+json
  • application/json

Errors

The server MUST respond with an appropriate HTTP status code (4xx or 5xx) when an error condition is detected. In the case of transient server errors (e.g., 503 and other 5xx status codes), the client SHOULD implement appropriate retry logic. For example, if a client sends an alphanumeric string for a parameter that is specified as unsigned integer the server MUST reply with Bad Request.

Error type HTTP status code Description
Bad Request 400 Cannot process due to malformed request, the requested parameters do not adhere to the specification
Unauthorized 401 Authorization provided is invalid
Not Found 404 The resource requested was not found
Not Acceptable 406 The requested formatting is not supported by the server
Not Implemented 501 The specified request is not supported by the server

Security

The RNAget API can be used to retrieve potentially sensitive genomic data and is dependent on the implementation. Effective security measures are essential to protect the integrity and confidentiality of these data.

Sensitive information transmitted on public networks, such as access tokens and human genomic data, MUST be protected using Transport Level Security (TLS) version 1.2 or later, as specified in RFC 5246.

If the data holder requires client authentication and/or authorization, then the client's HTTPS API request MUST present an OAuth 2.0 bearer access token as specified in RFC 6750, in the Authorization request header field with the Bearer authentication scheme:

Authorization: Bearer [access_token]

Data providers SHOULD verify user identity and credentials. The policies and processes used to perform user authentication and authorization, and the means through which access tokens are issued, are beyond the scope of this API specification. GA4GH recommends the use of the OAuth 2.0 framework (RFC 6749) for authentication and authorization. It is also recommended that implementations of this standard also implement and follow the GA4GH Authentication and Authorization Infrastructure (AAI) standard.

CORS

Cross-origin resource sharing (CORS) is an essential technique used to overcome the same origin content policy seen in browsers. This policy restricts a webpage from making a request to another website and leaking potentially sensitive information. However the same origin policy is a barrier to using open APIs. GA4GH open API implementers should enable CORS to an acceptable level as defined by their internal policy. For any public API implementations should allow requests from any server.

GA4GH is publishing a CORS best practices document, which implementers should refer to for guidance when enabling CORS on public API instances.

Possible Future API Enhancements

  • Allow OR for search filters
  • Report size of download file
  • Matrix slicing with bool array or list of indices

API specification change log

1.1.0 Adds /service-info endpoint 1.0.0 Initial release version

Authentication

rnaget_auth

Security Scheme Type OAuth2
implicit OAuth Flow
Authorization URL: http://ga4gh.org/oauth/dialog
Scopes:
  • read:expression -

    read expression data

  • read:continuous -

    read continuous data

  • read:project -

    read information about projects

  • read:study -

    read information about studies

  • read:info -

    read general info

projects

The project is the top level of the model hierarchy and contains a set of related studies. Example projects include:

  • all data submitted by contributor X
  • the local mirror of the European Nucleotide Archive data

Get a single project by ID

Returns the project matching the provided ID

Authorizations:
rnaget_auth (read:project)
path Parameters
projectId
required
string

ID of project to return

Responses

Response samples

Content type
application/json
{
  • "id": "c2fe2aa6ad3043108bd88a30fc0303da",
  • "version": 1,
  • "name": "Demo Project",
  • "description": "This is a small project to demo API funtions"
}

Returns a list of projects matching filters

Get a list of projects matching filters

Authorizations:
rnaget_auth (read:project)
query Parameters
version
string
Example: version=1

version to filter by

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Returns filters for project searches

To support flexible search this provides a means of discovering the search filters supported by the data provider.

Authorizations:
rnaget_auth (read:project)

Responses

Response samples

Content type
application/json
[
  • {
    }
]

studies

The study is a set of related RNA expression values. It is assumed all samples in a study have been processed uniformly. Example studies include:

  • multiple tissues from all patients enrolled in clinical trial X
  • a collection of liver samples from several sources which have been uniformly reprocessed for differential analysis

Get a single study by ID

Returns the study matching the provided ID

Authorizations:
rnaget_auth (read:study)
path Parameters
studyId
required
string

ID of study to return

Responses

Response samples

Content type
application/json
{
  • "id": "c4cf910c9ae54832902c954cb439e30c",
  • "version": 1,
  • "name": "Demo Study",
  • "description": "This study is part of the demo project",
  • "parentProjectID": "c2fe2aa6ad3043108bd88a30fc0303da",
  • "genome": "human GRCh38"
}

Returns a list of studies matching filters

Get a list of studies matching filters

Authorizations:
rnaget_auth (read:study)
query Parameters
version
string
Example: version=1

version to filter by

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Returns filters for study searches

To support flexible search this provides a means of discovering the search filters supported by the data provider.

Authorizations:
rnaget_auth (read:study)

Responses

Response samples

Content type
application/json
[
  • {
    }
]

expressions

The expression is a matrix of calculated expression values.

Expression metadata

This describes a set of minimal metadata appropriate for several types of RNA experiments. The purpose is to define a common naming scheme for metadata to enable client software to have some expectation of data fields for improved interoperability. These definitions are not intended to be a comprehensive set of metadata and defining such a universal set is beyond the scope of this effort.

Where possible details are incorporated by reference. This is to reduce the final size of matrix files, support existing metadata standards and support server-defined metadata fields.

All field names are presented here in camel case. Parsers should treat field names as case-insensitive and any white space contained in the field names should be ignored:

sampleID == sampleid == Sample ID != sample_id

All fields are optional. Fields that utilize an ontology term assume both an id and a label. Later implementations will utilize schemablocks and/or Phenopackets as referenced entities.

Metadata Field Description
sampleID an identifier for the biological specimen the experiment was conducted on. This id MUST uniquely identify the sample within the scope of the server
assayType the type of experiment performed (ex. RNA-seq, ATAC-seq, ChIP-seq, DNase-Hypersensitivity, methylation profiling, histone profiling, microRNA profiling, transcription profiling, WGS)
samplePrepProtocol reference to a resource or webpage describing the protocol used to obtain and prepare the sample
libraryPrepProtocol reference to a resource or webpage describing the protocol used to prepare the library for sequencing
annotation a reference to the specific annotation used for quantifying the reads
analysisPipeline reference to a resource or webpage describing the analysis protocol. This description should include a full listing of all software used including the exact version and command line options used. If containerized software is used a reference to the specific containers should be included. The GA4GH Tool Registry Service is a resource for discovering and registering genomic tools and workflows.
cellTypeID a cell type term ID
cellTypeLabel a cell type term label from the CL ontology
phenotypeID phenotype ID applicable to the sample
phenotypeLabel phenotype term (recommended ontologies: Human Phenotype Ontology, NCIT, or ICD)
sexID sex ID of the organism providing the sample
sexTerm sex label of the organism providing the sample PATO 47 term
organismID organism ID for the sample origin
organismlabel organism label for the sample origin NCBITaxon
tissueID tissue ID of origin or organism part of origin
tissueLabel tissue Label of origin or organism part of origin (recommended to use Uberon
cellLineID ID of cell line
cellLineLabel Label of cell line

For metadata ID values it is recommended that implementors use the id:label CURIE notation as described in Identifiers and CURIEs

Example metadata using CURIE

Metadata Field Value
organismID NCBITaxon:9606
organismLabel human
The meaning of zero

Microarray and image-based RNA-seq (Seq-FISH etc.) have a dependency on probes which may not have 100% coverage of the annotation reference. The consequence is that some features which show zero expression may not necessarily have a truly zero expression. This idea can be extended further in the context of submitted data as well as potentially access restricted data. The result is that a zero value can indicate one of several states:

  1. Not measured - not measured at all and value is not available
  2. Not supplied - measured but not provided to the data repository
  3. Restricted access - measured but require further authentication to view
  4. Not applicable - measurement does not apply to the sample

If applicable, the NaN value MUST be used to indicate these states.

Get specific expression data ticket

Returns a ticket to download a single specified expression matrix

Authorizations:
rnaget_auth (read:expression)
path Parameters
expressionId
required
string

ID of expression to return

query Parameters
sampleIDList
Array of strings

return only values for listed sampleIDs

featureIDList
Array of strings

return only values for listed feature IDs

featureNameList
Array of strings

return only values for listed features

Responses

Response samples

Content type
application/json
{
  • "version": 1,
  • "fileType": "loom",
  • "studyID": "c4cf910c9ae54832902c954cb439e30c",
  • "url": "string",
  • "units": "string",
  • "headers": { },
  • "md5": "string"
}

Get specific expression data file

Returns a single specified expression matrix

Authorizations:
rnaget_auth (read:expression)
path Parameters
expressionId
required
string

ID of expression to return

query Parameters
sampleIDList
Array of strings

return only values for listed sampleIDs

featureIDList
Array of strings

return only values for listed feature IDs

featureNameList
Array of strings

return only values for listed features

Responses

Response samples

Content type
application/json
{
  • "message": "string"
}

Get a ticket to download expression data

Returns a download ticket for expression data matching filters

Authorizations:
rnaget_auth (read:expression)
query Parameters
format
required
string

Data format to return

projectID
string
Example: projectID=9c0eba51095d3939437e220db196e27b

project to filter by

studyID
string
Example: studyID=c4cf910c9ae54832902c954cb439e30c

study to filter by

version
string
Example: version=1

version to filter by

sampleIDList
Array of strings

return only values for listed sampleIDs

featureIDList
Array of strings

return only values for listed feature IDs

featureNameList
Array of strings

return only values for listed features

Responses

Response samples

Content type
application/json
{
  • "version": 1,
  • "fileType": "loom",
  • "studyID": "c4cf910c9ae54832902c954cb439e30c",
  • "url": "string",
  • "units": "string",
  • "headers": { },
  • "md5": "string"
}

Download expression data matching filters

Returns an expression data file matching filters

Authorizations:
rnaget_auth (read:expression)
query Parameters
format
required
string

Data format to return

projectID
string
Example: projectID=9c0eba51095d3939437e220db196e27b

project to filter by

studyID
string
Example: studyID=c4cf910c9ae54832902c954cb439e30c

study to filter by

version
string
Example: version=1

version to filter by

sampleIDList
Array of strings

return only values for listed sampleIDs

featureIDList
Array of strings

return only values for listed feature IDs

featureNameList
Array of strings

return only values for listed features

Responses

Response samples

Content type
application/json
{
  • "message": "string"
}

Get output formats

The response is a list of the supported data formats as a JSON formatted object unless an alternative formatting supported by the server is requested. A data provider may use any internal storage format that they wish with no restrictions from this API. To support development of interoperable clients, it is recommended that data providers MUST support at least 1 of the following common output formats:

  • Tab delimited text (tsv)
  • Loom (loom)
  • anndata (anndata)

A Tab delimited file can have any number of comment lines beginning with # for storing metadata. There should be one header row following the comments. Feature (genes/transcripts) names and/or ID fields should be the first columns of the header row and have the string type. All following columns are for the samples and will have 32-bit float values in each row.

Example .tsv file
# Example tsv file
geneID  geneName  sample1 sample2
ENSG00000000003 TSPAN6  12.4  15.6

A Loom format file will have a 32-bit float matrix for the expression values with samples on the column axis and features on the row axis. Associated metadata can be stored as row and column attributes as described by the loom specification.

An anndata format file will have a 32-bit float matrix for the expression values with samples on the column axis and features on the row axis. Associated metadata can be stored as row and column attributes as described by the anndata specification.

Authorizations:
rnaget_auth (read:expression)

Responses

Response samples

Content type
application/json
[
  • "string"
]

Returns filters for expression searches

To support flexible search this provides a means of discovering the search filters supported by the data provider.

Authorizations:
rnaget_auth (read:expression)
query Parameters
type
string

one of feature or sample reflecting which axis to request filters for. If blank, both will be returned

Responses

Response samples

Content type
application/json
[
  • {
    }
]

continuous

Continuous is a matrix of coordinate range based signal data

Get specific continuous data ticket

Returns a ticket to download a single specified continuous matrix

Authorizations:
rnaget_auth (read:continuous)
path Parameters
continuousId
required
string

ID of continuous matrix to return

query Parameters
chr
string
Example: chr=chr10

The refererence to which start and end apply in the form chr? where ? is the specific ID of the chromosome (ex. chr1, chrX).

start
integer <int32> >= 0

The start position of the range on the sequence, 0-based, inclusive.

end
integer <int32> >= 0

The end position of the range on the sequence, 0-based, exclusive.

Responses

Response samples

Content type
application/json
{
  • "version": 1,
  • "fileType": "loom",
  • "studyID": "c4cf910c9ae54832902c954cb439e30c",
  • "url": "string",
  • "units": "string",
  • "headers": { },
  • "md5": "string"
}

Get specific continuous data file

Returns a single specified continuous matrix

Authorizations:
rnaget_auth (read:continuous)
path Parameters
continuousId
required
string

ID of continuous matrix to return

query Parameters
chr
string
Example: chr=chr10

The refererence to which start and end apply in the form chr? where ? is the specific ID of the chromosome (ex. chr1, chrX).

start
integer <int32> >= 0

The start position of the range on the sequence, 0-based, inclusive.

end
integer <int32> >= 0

The end position of the range on the sequence, 0-based, exclusive.

Responses

Response samples

Content type
application/json
{
  • "message": "string"
}

Get a ticket to download continuous data

Returns a download ticket for continuous data matching filters

Authorizations:
rnaget_auth (read:continuous)
query Parameters
format
required
string

Data format to return

projectID
string
Example: projectID=9c0eba51095d3939437e220db196e27b

project to filter by

studyID
string
Example: studyID=c4cf910c9ae54832902c954cb439e30c

study to filter by

version
string
Example: version=1

version to filter by

sampleIDList
Array of strings

return only values for listed sampleIDs

chr
string
Example: chr=chr10

The refererence to which start and end apply in the form chr? where ? is the specific ID of the chromosome (ex. chr1, chrX).

start
integer <int32> >= 0

The start position of the range on the sequence, 0-based, inclusive.

end
integer <int32> >= 0

The end position of the range on the sequence, 0-based, exclusive.

Responses

Response samples

Content type
application/json
{
  • "version": 1,
  • "fileType": "loom",
  • "studyID": "c4cf910c9ae54832902c954cb439e30c",
  • "url": "string",
  • "units": "string",
  • "headers": { },
  • "md5": "string"
}

Download continuous data matching filters

Returns a continuous data file matching filters

Authorizations:
rnaget_auth (read:continuous)
query Parameters
format
required
string

Data format to return

projectID
string
Example: projectID=9c0eba51095d3939437e220db196e27b

project to filter by

studyID
string
Example: studyID=c4cf910c9ae54832902c954cb439e30c

study to filter by

version
string
Example: version=1

version to filter by

sampleIDList
Array of strings

return only values for listed sampleIDs

chr
string
Example: chr=chr10

The refererence to which start and end apply in the form chr? where ? is the specific ID of the chromosome (ex. chr1, chrX).

start
integer <int32> >= 0

The start position of the range on the sequence, 0-based, inclusive.

end
integer <int32> >= 0

The end position of the range on the sequence, 0-based, exclusive.

Responses

Response samples

Content type
application/json
{
  • "message": "string"
}

Get output formats

The response is a list of the supported data formats as a JSON formatted object unless an alternative formatting supported by the server is requested. A data provider may use any internal storage format that they wish with no restrictions from this API. To support development of interoperable clients, it is recommended that data providers MUST support at least 1 of the following common output formats:

  • Tab delimited text (.tsv)
  • Loom (.loom)

A Tab delimited file can have any number of comment lines beginning with # for storing metadata. The first line of the tsv file will be a tab-delimited list beginning with #labels and containing the labels for text fields in the main matrix. The second line of the tsv file will be a tab-delimited list containing 2 items: #range and the range in the form chr?:start-stop where the start coordinate is zero-based, inclusive and the stop coordinate is zero-based, exclusive. Any additional comments may follow these 2 lines. The data matrix follows the comment block. Sample names and/or ID fields should be the first columns of the header row, be in the same order as listed in the #labels comment and have the string type. All coordinates in the continuous range described in the #range comment will be in the following columns with each base position in its own column. The coordinate columns will contain 32-bit float values in each row corresponding to the measured signal value at that coordinante for the sample corresponding to that row.

Example .tsv file
#labels sampleID  sampleName
#range  chr1:1000000-1000002
# assembly  GRCh38-V29-male
12003-L1  12003-human-liver-4 12.4  15.6

A Loom format file will have a 32-bit float matrix for the signal values with coordinates on the column axis and samples on the row axis. Associated metadata can be stored as row and column attributes as described by the loom specification.

Authorizations:
rnaget_auth (read:continuous)

Responses

Response samples

Content type
application/json
[
  • "string"
]

Returns filters for continuous searches

To support flexible search this provides a means of discovering the search filters supported by the data provider.

Authorizations:
rnaget_auth (read:continuous)

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Service Info

The GA4GH Service Info specification provides a GA4GH-wide, structured format for describing web services implementing GA4GH API specifications. RNAget implements service info through the standard /service-info API endpoint, and also extends the base model with additional attributes.

RNAget services MUST indicate that they support the RNAget protocol by using an artifact value of rnaget in the service info type property.

{
  ...
  "type": {
    "group": "org.ga4gh",
    "artifact": "rnaget",
    "version": "1.1.0" 
  }
  ...
}

Show information about this RNAget instance

Responses

Response samples

Content type
application/json
{
  • "id": "org.ga4gh.myservice",
  • "name": "My project",
  • "type": {
    },
  • "description": "This service provides...",
  • "organization": {},
  • "contactUrl": "mailto:support@example.com",
  • "documentationUrl": "https://docs.myservice.example.com",
  • "createdAt": "2019-06-04T12:58:19Z",
  • "updatedAt": "2019-06-04T12:58:19Z",
  • "environment": "test",
  • "version": "1.0.0",
  • "supported": {
    }
}

The Project Model

id
required
string

A unique identifier assigned to this object

version
string

Version number of the object

name
string

Short, readable name

description
string

Detailed description of the object

{
  • "id": "c2fe2aa6ad3043108bd88a30fc0303da",
  • "version": 1,
  • "name": "Demo Project",
  • "description": "This is a small project to demo API funtions"
}

The Study Model

id
required
string

A unique identifier assigned to this object

version
string

Version number of the object

name
string

Short, readable name

description
string

Detailed description of the object

parentProjectID
string

ID of the project containing the study

genome
string

Name of the reference genome build used for aligning samples in the study

{
  • "id": "c4cf910c9ae54832902c954cb439e30c",
  • "version": 1,
  • "name": "Demo Study",
  • "description": "This study is part of the demo project",
  • "parentProjectID": "c2fe2aa6ad3043108bd88a30fc0303da",
  • "genome": "human GRCh38"
}

The Filter Model

filter
required
string

A unique name for the filter for use in query URLs

fieldType
string

The dataType (string, float, etc.) of the filter

description
string

Detailed description of the filter

values
Array of strings

List of supported values for the filter

{
  • "filter": "tissue",
  • "fieldType": "string",
  • "description": "tissue of origin",
  • "values": [
    ]
}

The Ticket Model

version
string

Version number of the object

fileType
string

Type of file. Examples include: loom, tsv

studyID
string

ID of containing study

url
required
string

An https: URL to download file

units
required
string

Units for the values. Examples include: TPM, FPKM, counts

headers
object

For HTTPS URLs, the server may supply a JSON object containing one or more string key-value pairs which the client MUST supply verbatim as headers with any request to the URL. For example, if headers is {"Authorization": "Bearer xxxx"}, then the client must supply the header Authorization: Bearer xxxx with the HTTPS request to the URL.

md5
string

MD5 digest of the file

{
  • "version": 1,
  • "fileType": "loom",
  • "studyID": "c4cf910c9ae54832902c954cb439e30c",
  • "url": "string",
  • "units": "string",
  • "headers": { },
  • "md5": "string"
}