Search public, human, RNA-seq experiments by cell, tissue type, and other features | Indexing files
Purpose
Annotations of cell and tissue types for experiments submitted to the SRA are often inconsistent and completely missing. The purpose of this webservice is to provide a way to search and select human RNA-seq datasets by their tissue and cell type. Through a series of steps from with careful data cleaning to automatic tissue assignment, we are able to accurately derive tissue and cell type annotations for a large fraction of these datasets.
The data SHARQ uses
Metadata for the index NCBI SRA files is obtained from metadata hosted on NCBI's FTP servers as well as text versions of abstracts available from Pubmed (exmaple). A future version of SHARQ will use metadata from SRAdb and will include metadata from EBI and DDBJ. The vocabulary for cell types and tissues were derived from words from the UNIPROT controlled vocabulary of tissues as well as a controlled vocabulary of types and tissues from ENCODE. SHARQ is currently a beta release. The vocabulary for cell types and tissues will be updated with improvements and each weekly update will associate SHARQ with a dated version number that can be used for citation and data provenance purposes. It is also important to note that many SRA files do not have cell type annotations, but do have tissue annotations. We therefore currently focus on accuracy of tissue annotations and cell type information is provided for additional filtering capabilities and reference. We currently focus on publicly available sequence data, therefore, we do not annotate dbGaP submissions, for example.
Selecting a tissue
To select experiments that were performed on a single tissue, navigate the list of tissues (top chart) and click on the tissues you would like to include in your target list. The count of matching runs at the top right corner will update.
To deselect a tissue, click on its bar again.
To cancel all tissue type selections, use the general "Reset all" button located at the top right corner next to the count of matched runs.
Selecting a single tissue, for example, "brain", will update all other widgets to only display distributions of read length, sequencing platform, etc. among the matching runs.
Additional filters
To select a cell type, click on the bar representing that cell in the cell type chart.
To select runs by the average read length, click on the bars that represent your desired read lengths.
To select runs by the sequencing platform used to produce the data, click on all applicable platform types. If you do not see your target sequencing platform in the list, it may be bundles into "Other" category since too few runs were produced using that technology.
To select between paired and single end reads, click the appropriate bar.
To select sequencing runs from a specific date range, drag your mouse across that "Submission date" chart. This will render a selection widget that you can resize by dragging its left and right handles. To move the selection widget, position mouse over the transparent portion of the selection widget (mouse crosshair icon will turn into a move icon ) and drag it to the desired position.
To reset the date selection, click anywhere in the chart outside of the selection widget.
Download matches
To download SRR IDs for all sequencing runs matching your filters, click on the "Download IDs" button at the top of the page.
To download SRR IDs and additional information about the runs, click on the "Download al fields" button at the top of the page. Additional columns include: SRR IDs, associated experiment IDs, date of data submission, tissue type assigned by our inference algorithm, sequencing platform, average read length (in pase pairs), library layout (paired/single end), dataset size in megabytes.
Recommended system configuration
SHARQ works best in Chrome and Safari browsers. There are known issues when running the service in Firefox 34 and we are working on them.
Download original data
Download original metadata collected from Short Read Archive: [tar.gz] (coming soon)
Code for metadata collection, tissue annotation, and this webservice is available on GitHub: https://github.com/kingsfordgroup/sharq (coming soon)
Contact
If you experience a bug or have questions about the service, please post them at the project's GitHub page: https://github.com/kingsfordgroup/sharq (coming soon)
Direct your further quesions to Carl Kingsford.