Chapter 2 Associations QC
2.1 cCREs per BC
Goal: Assess BC promiscuity
Input file: associations_before_promiscuity
Evaluated metrics: Specificity
Legend: A histogram of the number of cCREs per BC
Interpretation: The successful example showcases a highly specific experiment, where most of the BCs are associated with only one cCRE. The unsuccessful example showcases a problem where many BCs are associated with more than one oligo, suggesting a problem in specificity
2.2 Reads per association
Goal: Assess confidence in associations
Input file: associations_before_minimum_observations
Evaluated metrics: Specificity
Legend: The number of reads supporting each cCRE-BC pairing
Interpretation: The successful example showcases a robust experiment, where BC-cCRE associations are supported by many reads. In the unsuccessful example, over half of the associations appear only once.
2.3 BCs per cCRE
Goal: Assess BC replicability
Input file: final_associations, cCRE_fasta
Evaluated metrics: Complexity (multiplicity & uniformity)
Legend: An empirical cumulative function (eCDF) of BCs per cCRE
Interpretation: The successful example showcases many BCs per cCRE and a low percentage of cCREs with fewer than 10 BCs, suggesting high complexity. The unsuccessful example showcases a problem where most cCREs have few BCs.
2.4 Retained cCREs
Goal: Assess BC replicability
Input file: final_associations, cCRE_fasta
Evaluated metrics: Complexity (multiplicity & uniformity)
Legend: Retained cCREs per increasing cutoffs of BC number per cCRE
Interpretation: The successful example showcases a high-complexity experiment, in which most cCREs are retained even at stringent minimum barcode thresholds. The unsuccessful example showcases a low-complexity experiment, in which the majority of cCREs are lost when applying high BC-per-cCRE filters.
2.5 cCRE retention by sequencing depth
Goal: Assess whether sequencing depth is sufficient
Input file: associations_downsampling_path, associations_downsampling_file_name, cCRE_fasta
Evaluated metrics: Complexity (multiplicity & uniformity)
Legend: Sequencing depth vs retained cCREs. Sequencing data is downsampled in order to assess whether we have reached saturation in the percentage of retained cCREs. Then, data is extrapolated to predict whether additional sequencing will help improve the results.
Interpretation: The successful example showcases a high-complexity experiment: additional sequencing is not predicted to improve the percentage of retained cCREs. The unsuccessful example showcases an experiment where sequencing depth was suboptimal: additional sequencing is predicted to improve the percentage of retained cCREs.
2.6 BCs per cCRE by sequencing depth
Goal: Assess whether sequencing depth is sufficient
Input file: associations_downsampling_path, associations_downsampling_file_name, cCRE_fasta
Evaluated metrics: Complexity (multiplicity & uniformity)
Legend: A box plot of the number of BCs per cCRE (y axis) as a function of the downsampling parameter (x axis), i.e., the fraction of sequencing data used.
Interpretation: In the successful example, sequencing depth was sufficient to capture library complexity. In the unsuccessful example, additional sequencing substantially increases the number of barcodes, indicating that the library has not yet been sequenced to saturation.
2.7 PCR bias - GC
Goal: Assess GC content bias in PCR amplification
Input file: final_associations, cCRE_fasta
Evaluated metrics: Complexity (multiplicity & uniformity)
Legend: The number of reads (yellow box plots) and cCREs (green line) at various levels of GC content. Data was binned in fixed sizes of 5%.
Interpretation: The successful example showcases a relatively consistent number of reads per GC content, as well as PCR conditions that are optimized for the most common GC content levels (peaks are close to one another). The unsuccessful example showcases both a strong amplification bias and suboptimal PCR conditions for the GC content levels of most cCREs.
2.8 PCR bias - G-stretches
Goal: Assess G-stretches bias in PCR amplification
Input file: final_associations, cCRE_fasta
Evaluated metrics: Complexity (multiplicity & uniformity)
Legend: The number of reads (yellow box plots) and cCREs (green line) at various lengths of G-stretches.
Interpretation: The successful example showcases a relatively consistent number of reads per G-stretch length. The unsuccessful example showcases both a strong amplification bias and suboptimal PCR conditions for the GC content levels of most cCREs (peaks do not overlap).