CRUK-CI 10x Sequencing Guide

The Genomics and Bioinformatics Cores now support single cell Gene Expression (3'), Single Cell ATAC and Single Cell Immune Profiling (5') sequencing using 10x Genomics' instruments (with or without Feature Barcoding Technology) as well as Visium Spatial Gene Expression sequencing.

Links to 10x Genomics' web site for the tools can be found at the bottom of this page.

Sample Submission for Library Prep and Sequencing

In order to submit cells or nuclei for library prep and sequencing please fill in the 10x Submission Form. As the 10x Single Cell workflow requires cells or nuclei to be processed as soon as possible after collection please contact the Genomics Helpdesk to co-ordinate cell collection with library preparation by Genomics staff.

Genomics offers service of library preparation using following reagents:

Chromium™ Next GEM Single Cell 3ʹ Reagent Kits v3.1 (including CRISPR screening, Cell Surface Protein and/or hashtags)
Chromium™ Next GEM Single Cell ATAC Reagent Kits v1.1
Chromium™ Next GEM Single Cell 5' Reagent Kits v2.0 (including V(D)J enrichment, Antigen Specificity, Cell Surface Protein, ECCITE-seq, and/or hashtags)
Chromium™ Next GEM Single Cell Multiome ATAC + Gene Expression Reagent Kits

Sample Submission for Sequencing Only

Sample submission through Clarity is unchanged. There are a few index types available:

10x SITT/SITN/SINT - dual indexes compatible with libraries generated with 3' v3.1 kit, 5' v2 kit, Visium kit and RNA part of Multiome workflow. SITT are mostly used for Gene expression libraries while SITN and SINT are used for Feature Barcoding technology (for example CITEseq).
10x SINA - single indexes compatible with 10x single cell ATAC v1.1 kit and ATAC part of Multiome workflow.
Historically there was also 10x SIGA - single indexes used for 3' & 5' kits however they are no longer used in our facility.

In 10x SINA & 10x SIGA, each index is a mix of four different sequences to balance across all four nucleotides for each label. You must submit with one label per sample: we will take care of the technicalities.

SITT/SITN/SINT index plates include unique dual indexing (UDI) sample indexes. This means that there is a unique sample index barcode in the both the i7 and i5 index reads. When demultiplexing flowcells where both index reads have been sequenced, bcl2fastq requires that both index sequences match the expected sequence for a read to be assigned to that sample. This solves the "index hopping" issue present on Illumina patterned flowcell sequencers.

There are 96 barcodes in each of SITT, SITN and SINT index sets. List of them can be found here:

The labels are of the form SI-TT-<well>, SI-NT-<well>, SI-TN-<well>. These labels are recorded in our Clarity system without the hyphens, so when you fill out the submission form please leave the hyphens out. For example SI-TT-B11 becomes SITTB11 and so forth.

There are 96 barcodes in the 10x Single Index Kit T set A (SIGA) and 96 in the 10x Single Index Kit N set A (SINA). The SIGA labels are of the form SI-GA-<well>; the SINA labels of the form SI-NA-<well>. These labels are recorded in our Clarity system without the hyphens, so when you fill out the submission form please leave the hyphens out. So SI-GA-C10 becomes SIGAC10, SI-NA-G7 becomes SINAG71 and so forth.

FASTQ Production

Your sequencing data will made available in the standard FASTQ file format. We also provide some smaller files giving information about those files, and a report for each lane of sequencing. Processing the sequenced run folders is done using Illumina's bcl2fastq program.

10x sc 3' mRNA v3.1, 10x Visium and the RNA part of the Multiome workflow libraries are run with a bases mask of:

28 regular cycles of which 16 cycles are 10x barcode and 12 cycles are the unique molecular index (UMI)
10 index cycles
10 index cycles
90 regular cycles

10x ATAC and the ATAC part of the Multiome workflow libraries are run with a bases mask of:

50 regular cycles
8 index cycles
16 index cycles (10x barcode)
50 regular cycles

10x V(D)J libraries are run with a bases mask of:

26 regular cycles, of which 16 cycles are 10x barcode and 10 cycles are the UMI
10 index cycles
10 index cycles
90 regular cycles

Older 10x V(D)J experiments were run using PE150 parameters however from September 2019 Cell Ranger algorithm was updated and 150bp reads are no longer necessary for TCR/BCR libraries.

We produce a FASTQ file for the index read as well as the regular reads for 10x technologies (normally the index is only in the read headers).

Data Delivery

Please refer to the help page for Sequencing Data for how you will be receiving your 10x sequencing data.

Running 10x Pipelines

As one can tell from the above, it is not possible to run the Cell Ranger and Space Ranger pipelines in the simplest form, as the run folder structure is not as it would be if it were created by the 10x demultiplexing pipeline. Instead, one needs to follow the instructions for using the downstream pipelines provided by 10x Genomics:

Running Cell Ranger with bcl2fastq FASTQs (last section).
Running Cell Ranger ATAC with bcl2fastq FASTQs (last section).
Running Cell Ranger V(D)J with bcl2fastq FASTQs (last section).
Running Cell Ranger ARC with bcl2fastq FASTQs (last section).
Running Space Ranger with bcl2fastq FASTQs (last section).

As an example, consider running Cell Ranger on the files mentioned in the previous section. First, one would create an analysis directory for the work, including a directory fastq into which the FASTQ files will be put.

% mkdir -p /data/singlecellanalysis/fastq
% cd /data/singlecellanalysis/fastq

One would then fetch the FASTQ files using the download tool or from the FTP site. (This can be done by so many methods I won't write examples, but will assume that this file has arrived in the directory just created.)

% ls -l
-rw-r--r-- 1 me users 1320878652 Nov 19 08:26 SLX-54321.SINAG9.THEFCDRXX.s_2.i_1.fq.gz
-rw-r--r-- 1 me users        300 Nov 19 08:28 SLX-54321.SINAG9.THEFCDRXX.s_2.md5sums.txt
-rw-r--r-- 1 me users 6890932946 Nov 19 08:26 SLX-54321.SINAG9.THEFCDRXX.s_2.r_1.fq.gz
-rw-r--r-- 1 me users 3310544457 Nov 19 08:26 SLX-54321.SINAG9.THEFCDRXX.s_2.r_2.fq.gz
-rw-r--r-- 1 me users 6773557068 Nov 19 08:26 SLX-54321.SINAG9.THEFCDRXX.s_2.r_3.fq.gz

Make sure the file transfer hasn't resulted in errors (optional but very much recommended if the files are fetched from the FTP site).

% md5sum -c SLX-54321.SINAG9.THEFCDRXX.s_2.md5sums.txt
SLX-54321.SINAG9.THEFCDRXX.s_2.i_1.fq.gz: OK
SLX-54321.SINAG9.THEFCDRXX.s_2.r_1.fq.gz: OK
SLX-54321.SINAG9.THEFCDRXX.s_2.r_2.fq.gz: OK
SLX-54321.SINAG9.THEFCDRXX.s_2.r_3.fq.gz: OK

Rename the files to the names expected by CellRanger and its variants. The script crukci_to_illumina.py (see the sequencing data page) will change the names of the files to the correct pattern, or one can rename the files by hand.

% python3 /opt/scripts/crukci_to_illumina.py
SLX-54321.SINAG9.THEFCDRXX.s_2.i_1.fq.gz -> SINAG9_S1_L002_I1_001.fastq.gz
SLX-54321.SINAG9.THEFCDRXX.s_2.r_1.fq.gz -> SINAG9_S1_L002_R1_001.fastq.gz
SLX-54321.SINAG9.THEFCDRXX.s_2.r_2.fq.gz -> SINAG9_S1_L002_R2_001.fastq.gz
SLX-54321.SINAG9.THEFCDRXX.s_2.r_3.fq.gz -> SINAG9_S1_L002_R3_001.fastq.gz

% ls -l
-rw-r--r-- 1 me users 1320878652 Nov 19 08:26 SINAG9_S1_L002_I1_001.fastq.gz
-rw-r--r-- 1 me users 6890932946 Nov 19 08:26 SINAG9_S1_L002_R1_001.fastq.gz
-rw-r--r-- 1 me users 3310544457 Nov 19 08:26 SINAG9_S1_L002_R2_001.fastq.gz
-rw-r--r-- 1 me users 6773557068 Nov 19 08:26 SINAG9_S1_L002_R3_001.fastq.gz
-rw-r--r-- 1 me users        300 Nov 19 08:28 SLX-54321.SINAG9.THEFCDRXX.s_2.md5sums.txt

Run Cell Ranger for this sample. Here, Cell Ranger 5.0.1 is installed in /opt/10x/cellranger-5.0.1 and the reference data in /opt/10x/refdata-gex-GRCh38-2020-A. There are lots of ways to run CellRanger, and the code block below is not a complete example. It does give you some idea of how to get started.

% cd /data/singlecellanalysis
% /opt/10x/cellranger-5.0.1/cellranger count \
  --id="MyId" \
  --transcriptome="/opt/10x/refdata-gex-GRCh38-2020-A" \
  --fastqs="/data/singlecellanalysis/fastq"

Further Information

Refer to 10x Genomics' website for instructions on using the downstream pipelines:

Single-Library Analysis (Cell Ranger).
Aggregating Multiple Libraries (Cell Ranger).
Customized Secondary Analysis (Cell Ranger).

1. Hyphens are removed from barcode labels because the label forms part of the name of the FASTQ file for that sample in its pool. Elsewhere in our sequencing service the hyphen is used as a separator for dual index kits, where (normally) twelve first indexes and eight second indexes are paired to give ninety-six combinations. Including the hyphen in 10x barcodes would cause confusion with the rest of the service.