INSERT-seq
help page
INSERT-seq is a bioinformatics pipeline that can be used to analyse the landscape of genomic integrations from samples sequenced following INSERT-seq library prep protocol.
A command line version of the pipeline is available to download in a Bitbucket repository.
Website usage
Read structure
After the library-pre protocol and Oxford Nanopore sequencing, the reads will have the following structure.
Sequencing adapters are ligated at both ends.
The UMI & barcode adapter used for PCR amplifications will be presend in one end, consisting of three main parts:
- A. The primer binding site (complementary to the sequencing adapter) and the barcode sequence.
- UMI. The unimolecular identifyer sequence.
- B. The target binding site (complementary between PCR 1 and PCR2 adapters).
This adapter is bound to the genome and at the other end of the read we will find the integration site.
- C. The sequence corresponding to the paylod (or integration site) right next to the juntion is required by the analysis pipeline in the section "Ontarget primer sequence" of the input.
Results
When a run end, a compressed file will be generated for you to download the results.
Results are organized in folders.
- FILTERED. Contains quality filtered and trimmed reads.
- MAPPED. Contains SAM and BAM files for read alignment inspection. Alignments can be visualized with any genome browser program such as IGV or Geneious.
- PEAKS.
- *_peaks.bed Bed file containing sigfinicant insertions. The columns correspond to chromosome, start, end, number of reads per insertion, DHD distance to a reference peak.
- *_filtered_peakSite.bed Bed file containing significant insertions with only the predicted insertion point instead of a range.
- *_prePeakCalling_sorted_insertions.bed Bed file with all genomic regions with mapped reads, contains significant and non significant insertions or noise.
- STATS. Contains one folder per sample with histograms of raw read length or Nanopore sequencing quality.
- UMI. Containes extracted UMIs from each read and the consensus reads after UMI clustering.
- OUTPUT. Contains plots and tables of gene feature annotations and unanchored peak calling.
- *_unanchored_peaks.bed Bed file containing unanchored insertions.