Download OpenAPI specification:Download
This API provides a means of retrieving data from several types of RNA experiments including:
via a client/server model.
Features of this API include:
Out of the scope of this API are:
An OpenAPI description of this specification is available and describes the 1.0.0 version. OpenAPI is an independent API description format for describing REST services and is compatible with a number of third party tools.
Implementors can check if their RNAget implementations conform to the specification by using our compliance suite.
All API invocations are made to a configurable HTTPS endpoint, receive URL-encoded query string parameters and HTTP headers, and return text or other allowed formatting as requested by the user. Queries containing unsafe or reserved characters in the URL, including but not limited to "&", "/", "#", MUST encode all such characters. Successful requests result with HTTP status code 200 and have the appropriate text encoding in the response body as defined for each endpoint. The server may provide responses with chunked transfer encoding. The client and server may mutually negotiate HTTP/2 upgrade using the standard mechanism.
HTTP responses may be compressed using RFC 2616 transfer-coding, not content-coding.
HTTP response may include a 3XX response code and Location header redirecting the client to retrieve expression data from an alternate location as specified by RFC 7231, clients SHOULD be configured to follow redirects. 302
, 303
and 307
are all valid response codes to use.
Responses from the server MUST include a Content-Type header containing the encoding for the invoked method and protocol version. Unless negotiated with the client and allowed by the server, the default encoding is:
Content-Type: application/vnd.ga4gh.rnaget.v1.0.0+json; charset=us-ascii
All response objects from the server are expected to be in JSON format, regardless of the response status code, unless otherwise negotiated with the client and allowed by the server.
Object IDs are intended for persistent retrieval of their respective objects. An object ID MUST uniquely identify an object within the scope of a single data server. It is beyond the scope of this API to enforce uniqueness of ID among different data servers. IDs are strings made up of uppercase and lowercase letters, decimal digits, hypen, period, underscore and tilde [A-Za-z0-9.-_~]. See RFC 3986 § 2.3.
Endpoints are described as HTTPS GET methods which will be sufficient for most queries. Queries containing multiple metadata filters may approach or exceed the URL length limits. To handle these types of queries it is recommended that servers SHOULD implement parallel HTTPS POST endpoints accepting the same URL parameters as a UTF8-encoded JSON key-value dictionary.
When processing requests containing multiple filters and filters with lists of items, the data provider MUST use a logical AND
for selecting the results to return.
When responding to a request a server MUST use the fully specified media type for that endpoint. When determining if a request is well-formed, a server MUST allow a internet type to degrade like so
application/vnd.ga4gh.rnaget.v1.0.0+json; charset=us-ascii
application/vnd.ga4gh.rnaget.v1.0.0+json
application/json
The server MUST respond with an appropriate HTTP status code (4xx or 5xx) when an error condition is detected. In the case of transient server errors (e.g., 503 and other 5xx status codes), the client SHOULD implement appropriate retry logic. For example, if a client sends an alphanumeric string for a parameter that is specified as unsigned integer the server MUST reply with Bad Request
.
Error type | HTTP status code | Description |
---|---|---|
Bad Request |
400 | Cannot process due to malformed request, the requested parameters do not adhere to the specification |
Unauthorized |
401 | Authorization provided is invalid |
Not Found |
404 | The resource requested was not found |
Not Acceptable |
406 | The requested formatting is not supported by the server |
Not Implemented |
501 | The specified request is not supported by the server |
The RNAget API can be used to retrieve potentially sensitive genomic data and is dependent on the implementation. Effective security measures are essential to protect the integrity and confidentiality of these data.
Sensitive information transmitted on public networks, such as access tokens and human genomic data, MUST be protected using Transport Level Security (TLS) version 1.2 or later, as specified in RFC 5246.
If the data holder requires client authentication and/or authorization, then the client's HTTPS API request MUST present an OAuth 2.0 bearer access token as specified in RFC 6750, in the Authorization request header field with the Bearer authentication scheme:
Authorization: Bearer [access_token]
Data providers SHOULD verify user identity and credentials. The policies and processes used to perform user authentication and authorization, and the means through which access tokens are issued, are beyond the scope of this API specification. GA4GH recommends the use of the OAuth 2.0 framework (RFC 6749) for authentication and authorization. It is also recommended that implementations of this standard also implement and follow the GA4GH Authentication and Authorization Infrastructure (AAI) standard.
Cross-origin resource sharing (CORS) is an essential technique used to overcome the same origin content policy seen in browsers. This policy restricts a webpage from making a request to another website and leaking potentially sensitive information. However the same origin policy is a barrier to using open APIs. GA4GH open API implementers should enable CORS to an acceptable level as defined by their internal policy. For any public API implementations should allow requests from any server.
GA4GH is publishing a CORS best practices document, which implementers should refer to for guidance when enabling CORS on public API instances.
Security Scheme Type | OAuth2 |
---|---|
implicit OAuth Flow | Authorization URL: http://ga4gh.org/oauth/dialog Scopes:
|
The project is the top level of the model hierarchy and contains a set of related studies. Example projects include:
Returns the project matching the provided ID
read:project
) projectId required | string ID of project to return |
{- "id": "c2fe2aa6ad3043108bd88a30fc0303da",
- "version": 1,
- "name": "Demo Project",
- "description": "This is a small project to demo API funtions"
}
Get a list of projects matching filters
read:project
) version | string Example: version=1 version to filter by |
[- {
- "id": "c2fe2aa6ad3043108bd88a30fc0303da",
- "version": 1,
- "name": "Demo Project",
- "description": "This is a small project to demo API funtions"
}
]
To support flexible search this provides a means of discovering the search filters supported by the data provider.
read:project
) [- {
- "filter": "tissue",
- "fieldType": "string",
- "description": "tissue of origin",
- "values": [
- "liver"
]
}
]
The study is a set of related RNA expression values. It is assumed all samples in a study have been processed uniformly. Example studies include:
Returns the study matching the provided ID
read:study
) studyId required | string ID of study to return |
{- "id": "c4cf910c9ae54832902c954cb439e30c",
- "version": 1,
- "name": "Demo Study",
- "description": "This study is part of the demo project",
- "parentProjectID": "c2fe2aa6ad3043108bd88a30fc0303da",
- "genome": "human GRCh38"
}
Get a list of studies matching filters
read:study
) version | string Example: version=1 version to filter by |
[- {
- "id": "c4cf910c9ae54832902c954cb439e30c",
- "version": 1,
- "name": "Demo Study",
- "description": "This study is part of the demo project",
- "parentProjectID": "c2fe2aa6ad3043108bd88a30fc0303da",
- "genome": "human GRCh38"
}
]
To support flexible search this provides a means of discovering the search filters supported by the data provider.
read:study
) [- {
- "filter": "tissue",
- "fieldType": "string",
- "description": "tissue of origin",
- "values": [
- "liver"
]
}
]
The expression is a matrix of calculated expression values.
This describes a set of minimal metadata appropriate for several types of RNA experiments. The purpose is to define a common naming scheme for metadata to enable client software to have some expectation of data fields for improved interoperability. These definitions are not intended to be a comprehensive set of metadata and defining such a universal set is beyond the scope of this effort.
Where possible details are incorporated by reference. This is to reduce the final size of matrix files, support existing metadata standards and support server-defined metadata fields.
All field names are presented here in camel case. Parsers should treat field names as case-insensitive and any white space contained in the field names should be ignored:
sampleID == sampleid == Sample ID != sample_id
All fields are optional. Fields that utilize an ontology term assume both an id and a label. Later implementations will utilize schemablocks and/or Phenopackets as referenced entities.
Metadata Field | Description |
---|---|
sampleID | an identifier for the biological specimen the experiment was conducted on. This id MUST uniquely identify the sample within the scope of the server |
assayType | the type of experiment performed (ex. RNA-seq, ATAC-seq, ChIP-seq, DNase-Hypersensitivity, methylation profiling, histone profiling, microRNA profiling, transcription profiling, WGS) |
samplePrepProtocol | reference to a resource or webpage describing the protocol used to obtain and prepare the sample |
libraryPrepProtocol | reference to a resource or webpage describing the protocol used to prepare the library for sequencing |
annotation | a reference to the specific annotation used for quantifying the reads |
analysisPipeline | reference to a resource or webpage describing the analysis protocol. This description should include a full listing of all software used including the exact version and command line options used. If containerized software is used a reference to the specific containers should be included. The GA4GH Tool Registry Service is a resource for discovering and registering genomic tools and workflows. |
cellTypeID | a cell type term ID |
cellTypeLabel | a cell type term label from the CL ontology |
phenotypeID | phenotype ID applicable to the sample |
phenotypeLabel | phenotype term (recommended ontologies: Human Phenotype Ontology, NCIT, or ICD) |
sexID | sex ID of the organism providing the sample |
sexTerm | sex label of the organism providing the sample PATO 47 term |
organismID | organism ID for the sample origin |
organismlabel | organism label for the sample origin NCBITaxon |
tissueID | tissue ID of origin or organism part of origin |
tissueLabel | tissue Label of origin or organism part of origin (recommended to use Uberon |
cellLineID | ID of cell line |
cellLineLabel | Label of cell line |
For metadata ID values it is recommended that implementors use the id:label
CURIE notation as described in Identifiers and CURIEs
Metadata Field | Value |
---|---|
organismID |
NCBITaxon:9606 |
organismLabel |
human |
Microarray and image-based RNA-seq (Seq-FISH etc.) have a dependency on probes which may not have 100% coverage of the annotation reference. The consequence is that some features which show zero expression may not necessarily have a truly zero expression. This idea can be extended further in the context of submitted data as well as potentially access restricted data. The result is that a zero value can indicate one of several states:
If applicable, the NaN
value MUST be used to indicate these states.
Returns a ticket to download a single specified expression matrix
read:expression
) expressionId required | string ID of expression to return |
sampleIDList | Array of strings return only values for listed sampleIDs |
featureIDList | Array of strings return only values for listed feature IDs |
featureNameList | Array of strings return only values for listed features |
{- "version": 1,
- "fileType": "loom",
- "studyID": "c4cf910c9ae54832902c954cb439e30c",
- "url": "string",
- "units": "string",
- "headers": { },
- "md5": "string"
}
Returns a single specified expression matrix
read:expression
) expressionId required | string ID of expression to return |
sampleIDList | Array of strings return only values for listed sampleIDs |
featureIDList | Array of strings return only values for listed feature IDs |
featureNameList | Array of strings return only values for listed features |
{- "message": "string"
}
Returns a download ticket for expression data matching filters
read:expression
) format required | string Data format to return |
projectID | string Example: projectID=9c0eba51095d3939437e220db196e27b project to filter by |
studyID | string Example: studyID=c4cf910c9ae54832902c954cb439e30c study to filter by |
version | string Example: version=1 version to filter by |
sampleIDList | Array of strings return only values for listed sampleIDs |
featureIDList | Array of strings return only values for listed feature IDs |
featureNameList | Array of strings return only values for listed features |
{- "version": 1,
- "fileType": "loom",
- "studyID": "c4cf910c9ae54832902c954cb439e30c",
- "url": "string",
- "units": "string",
- "headers": { },
- "md5": "string"
}
Returns an expression data file matching filters
read:expression
) format required | string Data format to return |
projectID | string Example: projectID=9c0eba51095d3939437e220db196e27b project to filter by |
studyID | string Example: studyID=c4cf910c9ae54832902c954cb439e30c study to filter by |
version | string Example: version=1 version to filter by |
sampleIDList | Array of strings return only values for listed sampleIDs |
featureIDList | Array of strings return only values for listed feature IDs |
featureNameList | Array of strings return only values for listed features |
{- "message": "string"
}
The response is a list of the supported data formats as a JSON formatted object unless an alternative formatting supported by the server is requested. A data provider may use any internal storage format that they wish with no restrictions from this API. To support development of interoperable clients, it is recommended that data providers MUST support at least 1 of the following common output formats:
A Tab delimited file can have any number of comment lines beginning with #
for storing metadata. There should be one header row following the comments. Feature (genes/transcripts) names and/or ID fields should be the first columns of the header row and have the string
type. All following columns are for the samples and will have 32-bit float
values in each row.
# Example tsv file
geneID geneName sample1 sample2
ENSG00000000003 TSPAN6 12.4 15.6
A Loom format file will have a 32-bit float
matrix for the expression values with samples on the column axis and features on the row axis. Associated metadata can be stored as row and column attributes as described by loom specification.
read:expression
) [- "string"
]
To support flexible search this provides a means of discovering the search filters supported by the data provider.
read:expression
) type | string one of |
[- {
- "filter": "tissue",
- "fieldType": "string",
- "description": "tissue of origin",
- "values": [
- "liver"
]
}
]
Returns a ticket to download a single specified continuous matrix
read:continuous
) continuousId required | string ID of continuous matrix to return |
chr | string Example: chr=chr10 The refererence to which start and end apply in the form chr? where ? is the specific ID of the chromosome (ex. chr1, chrX). |
start | integer <int32> >= 0 The start position of the range on the sequence, 0-based, inclusive. |
end | integer <int32> >= 0 The end position of the range on the sequence, 0-based, exclusive. |
{- "version": 1,
- "fileType": "loom",
- "studyID": "c4cf910c9ae54832902c954cb439e30c",
- "url": "string",
- "units": "string",
- "headers": { },
- "md5": "string"
}
Returns a single specified continuous matrix
read:continuous
) continuousId required | string ID of continuous matrix to return |
chr | string Example: chr=chr10 The refererence to which start and end apply in the form chr? where ? is the specific ID of the chromosome (ex. chr1, chrX). |
start | integer <int32> >= 0 The start position of the range on the sequence, 0-based, inclusive. |
end | integer <int32> >= 0 The end position of the range on the sequence, 0-based, exclusive. |
{- "message": "string"
}
Returns a download ticket for continuous data matching filters
read:continuous
) format required | string Data format to return |
projectID | string Example: projectID=9c0eba51095d3939437e220db196e27b project to filter by |
studyID | string Example: studyID=c4cf910c9ae54832902c954cb439e30c study to filter by |
version | string Example: version=1 version to filter by |
sampleIDList | Array of strings return only values for listed sampleIDs |
chr | string Example: chr=chr10 The refererence to which start and end apply in the form chr? where ? is the specific ID of the chromosome (ex. chr1, chrX). |
start | integer <int32> >= 0 The start position of the range on the sequence, 0-based, inclusive. |
end | integer <int32> >= 0 The end position of the range on the sequence, 0-based, exclusive. |
{- "version": 1,
- "fileType": "loom",
- "studyID": "c4cf910c9ae54832902c954cb439e30c",
- "url": "string",
- "units": "string",
- "headers": { },
- "md5": "string"
}
Returns a continuous data file matching filters
read:continuous
) format required | string Data format to return |
projectID | string Example: projectID=9c0eba51095d3939437e220db196e27b project to filter by |
studyID | string Example: studyID=c4cf910c9ae54832902c954cb439e30c study to filter by |
version | string Example: version=1 version to filter by |
sampleIDList | Array of strings return only values for listed sampleIDs |
chr | string Example: chr=chr10 The refererence to which start and end apply in the form chr? where ? is the specific ID of the chromosome (ex. chr1, chrX). |
start | integer <int32> >= 0 The start position of the range on the sequence, 0-based, inclusive. |
end | integer <int32> >= 0 The end position of the range on the sequence, 0-based, exclusive. |
{- "message": "string"
}
The response is a list of the supported data formats as a JSON formatted object unless an alternative formatting supported by the server is requested. A data provider may use any internal storage format that they wish with no restrictions from this API. To support development of interoperable clients, it is recommended that data providers MUST support at least 1 of the following common output formats:
A Tab delimited file can have any number of comment lines beginning with #
for storing metadata. The first line of the tsv file will be a tab-delimited list beginning with #labels
and containing the labels for text fields in the main matrix. The second line of the tsv file will be a tab-delimited list containing 2 items: #range
and the range in the form chr?:start-stop where the start coordinate is zero-based, inclusive and the stop coordinate is zero-based, exclusive. Any additional comments may follow these 2 lines. The data matrix follows the comment block. Sample names and/or ID fields should be the first columns of the header row, be in the same order as listed in the #labels
comment and have the string
type. All coordinates in the continuous range described in the #range
comment will be in the following columns with each base position in its own column. The coordinate columns will contain 32-bit float
values in each row corresponding to the measured signal value at that coordinante for the sample corresponding to that row.
#labels sampleID sampleName
#range chr1:1000000-1000002
# assembly GRCh38-V29-male
12003-L1 12003-human-liver-4 12.4 15.6
A Loom format file will have a 32-bit float
matrix for the signal values with coordinates on the column axis and samples on the row axis. Associated metadata can be stored as row and column attributes as described by the loom specification.
read:continuous
) [- "string"
]
To support flexible search this provides a means of discovering the search filters supported by the data provider.
read:continuous
) [- {
- "filter": "tissue",
- "fieldType": "string",
- "description": "tissue of origin",
- "values": [
- "liver"
]
}
]
id required | string A unique identifier assigned to this object |
version | string Version number of the object |
name | string Short, readable name |
description | string Detailed description of the object |
{- "id": "c2fe2aa6ad3043108bd88a30fc0303da",
- "version": 1,
- "name": "Demo Project",
- "description": "This is a small project to demo API funtions"
}
id required | string A unique identifier assigned to this object |
version | string Version number of the object |
name | string Short, readable name |
description | string Detailed description of the object |
parentProjectID | string ID of the project containing the study |
genome | string Name of the reference genome build used for aligning samples in the study |
{- "id": "c4cf910c9ae54832902c954cb439e30c",
- "version": 1,
- "name": "Demo Study",
- "description": "This study is part of the demo project",
- "parentProjectID": "c2fe2aa6ad3043108bd88a30fc0303da",
- "genome": "human GRCh38"
}
filter required | string A unique name for the filter for use in query URLs |
fieldType | string The dataType (string, float, etc.) of the filter |
description | string Detailed description of the filter |
values | Array of strings List of supported values for the filter |
{- "filter": "tissue",
- "fieldType": "string",
- "description": "tissue of origin",
- "values": [
- "liver"
]
}
version | string Version number of the object |
fileType | string Type of file. Examples include: loom, tsv |
studyID | string ID of containing study |
url required | string An |
units required | string Units for the values. Examples include: TPM, FPKM, counts |
headers | object For HTTPS URLs, the server may supply a JSON object containing one or more string key-value pairs which the client MUST supply verbatim as headers with any request to the URL. For example, if headers is |
md5 | string MD5 digest of the file |
{- "version": 1,
- "fileType": "loom",
- "studyID": "c4cf910c9ae54832902c954cb439e30c",
- "url": "string",
- "units": "string",
- "headers": { },
- "md5": "string"
}