Data Descriptor: Long-read sequencing of the human cytomegalovirus transcriptome with the Pacific Biosciences RSII platform

Zsolt Balázs, Dóra Tombácz, Attila Szucs, Michael Snyder, Z. Boldogkői

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.

Original languageEnglish
Article number170194
JournalScientific data
Volume4
DOIs
Publication statusPublished - Dec 19 2017

Fingerprint

RNA
Sequencing
Descriptors
Transcription
Fibroblasts
Reverse
Genes
Cells
Molecules
Cell
Lung
Preparation
Genome
Human
Libraries
Messenger RNA

ASJC Scopus subject areas

  • Statistics and Probability
  • Information Systems
  • Education
  • Computer Science Applications
  • Statistics, Probability and Uncertainty
  • Library and Information Sciences

Cite this

Data Descriptor : Long-read sequencing of the human cytomegalovirus transcriptome with the Pacific Biosciences RSII platform. / Balázs, Zsolt; Tombácz, Dóra; Szucs, Attila; Snyder, Michael; Boldogkői, Z.

In: Scientific data, Vol. 4, 170194, 19.12.2017.

Research output: Contribution to journalArticle

@article{5a50ecbb21a84fc99ae3717b8ca21a79,
title = "Data Descriptor: Long-read sequencing of the human cytomegalovirus transcriptome with the Pacific Biosciences RSII platform",
abstract = "Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.",
author = "Zsolt Bal{\'a}zs and D{\'o}ra Tomb{\'a}cz and Attila Szucs and Michael Snyder and Z. Boldogkői",
year = "2017",
month = "12",
day = "19",
doi = "10.1038/sdata.2017.194",
language = "English",
volume = "4",
journal = "Scientific data",
issn = "2052-4463",
publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - Data Descriptor

T2 - Long-read sequencing of the human cytomegalovirus transcriptome with the Pacific Biosciences RSII platform

AU - Balázs, Zsolt

AU - Tombácz, Dóra

AU - Szucs, Attila

AU - Snyder, Michael

AU - Boldogkői, Z.

PY - 2017/12/19

Y1 - 2017/12/19

N2 - Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.

AB - Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.

UR - http://www.scopus.com/inward/record.url?scp=85038830229&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85038830229&partnerID=8YFLogxK

U2 - 10.1038/sdata.2017.194

DO - 10.1038/sdata.2017.194

M3 - Article

C2 - 29257134

AN - SCOPUS:85038830229

VL - 4

JO - Scientific data

JF - Scientific data

SN - 2052-4463

M1 - 170194

ER -