countGenomicOverlaps {GenomicRanges}R Documentation

Count Read Hits in Genomic Features

Description

Count read hits per exon or transcript and resolve multi-hit reads.

Usage

  ## S4 method for signature 'GRangesList,GRangesList'
countGenomicOverlaps(
    query, subject, 
    type = c("any", "start", "end", "within", "equal"),
    resolution = c("none", "divide", "uniqueDisjoint"),
    ignore.strand = FALSE, ...) 

Arguments

query A GRangesList, GRanges, or GappedAlignments object. The query is intended to be a GRangesList where each list element is a read. When the length of a list element is greater than 1, it is assumed to represent a spliced read. Both GRanges and GappedAlignments are coerced to a GRangesList with each row as a single list element. If the cigar in the GappedAlignments has gaps, the read will be represented in the corresponding GRangesList object as a split read (i.e., list element of length 2 or greater).
subject A GRangesList, or a GRanges object. The subject is expected to be a list of genomic features, specifically genes, with each row representing a feature where the feature could be an exon or transcript. If a GRanges object is provided, it will be coerced to a GRangesList with each original range representing as a single list element (i.e., as a single gene).
type See findOverlaps in the IRanges package for a description of this argument.
resolution A character(1) string of "none", "divide", or "uniqueDisjoint". These rule sets are used to distribute read hits when multiple subjects are hit by the same query.

  • "none" : No conflict resolution is performed. All queries that hit more than 1 subject are dropped.
  • "divide" : The hit from a single query is divided equally among all subjects that were hit. If a query hit 4 subjects each subject is assigned 1/4 of a hit.
  • "uniqueDisjoint" : Subjects hit by a common query are partitioned into disjoint intervals. Any regions that are shared between the subjects are discarded. If the read overlaps one of these remaining unique disjoint regions the hit is assigned to that feature. If the read overlaps both or none of the regions, no hit is assigned. Therefore, unlike the divide option, uniqueDisjoint does not resolve multi-hit conflict in all situations.
ignore.strand A logical value indicating if strand should be considered when matching.
... Additional arguments, perhaps used by methods defined on this generic.

Details

The countGenomicOverlaps methods use the findOverlaps function in conjunction with a resolution method to identify overlaps and resolve queries that match multiple subjects. The usual type argument of findOverlaps is used to specify the type of overlap. The resolution argument is used to select a method to resolve the conflict when a query hits more than 1 subject. Here the term ‘hit’ means an overlap identified by findOverlaps.

The primary difference in the handling of split reads vs simple reads (i.e., no gap in the CIGAR) is the portion of the read hit each split read fragment has to contribute. All reads, whether simple or split, have an overall value of 1 to contribute to a subject they hit. In the case of the split reads, this value is further divided by the number of fragments in the read. For example, if a split read has 3 fragments (i.e., two gaps in the CIGAR) each fragment has a value of 1/3 to contribute to the subject they hit. As with the simple reads, depending upon the resolution chosen the value may be divided, fully assigned or discarded.

More detailed examples can be found in the countGenomicOverlaps vignette.

Value

A GRangesList object with an additional metadata column specifying the number of hits.

Author(s)

Valerie Obenchain and Martin Morgan

Examples

rng1 <- function(s, w)
GRanges(seq="chr1", IRanges(s, width=w), strand="+")

rng2 <- function(s, w)
GRanges(seq="chr2", IRanges(s, width=w), strand="+")

subj <- GRangesList(A=rng1(1000, 500),
                    B=rng2(2000, 900),
                    C=rng1(c(3000, 3600), c(500, 300)),
                    D=rng2(c(7000, 7500), c(600, 300)),
                    E1=rng1(4000, 500), E2=rng1(c(4300, 4500), c(400, 400)),
                    F=rng2(3000, 500),
                    G=rng1(c(5000, 5600), c(500, 300)),
                    H1=rng1(6000, 500), H2=rng1(6600, 400))

query <- GRangesList(a=rng1(1400, 500),
                     b=rng2(2700, 100),
                     c=rng1(3400, 300),
                     d=rng2(7100, 600),
                     e=rng1(4200, 500),
                     f=rng2(c(3100, 3300), 50),
                     g=rng1(c(5400, 5600), 50),
                     h=rng1(c(6400, 6600), 50))

## Overlap type = "any"
none <- countGenomicOverlaps(query, subj, type="any", resolution="none")
divide <- countGenomicOverlaps(query, subj, type="any", resolution="divide")
uniqueDisjoint <- countGenomicOverlaps(query, subj, type="any", 
    resolution="uniqueDisjoint")
data.frame(none = values(unlist(none))[["hits"]], 
           divide = values(unlist(divide))[["hits"]],
           uniqDisj = values(unlist(uniqueDisjoint))[["hits"]])

## Split read with 4 fragments :
## - 2 fragments hit the same subject 
## - 1 fragment hits two different subjects
## - 1 fragment hits a single subject 
splitreads <- GRangesList(c(rng1(c(3000, 3200, 4000), 100), rng1(5400, 300)))
split_none <- countGenomicOverlaps(splitreads, subj, type="any")
split_divide <- countGenomicOverlaps(splitreads, subj, type="any", 
    resolution="divide")
data.frame(none = values(unlist(split_none))[["hits"]],
           divide = values(unlist(split_divide))[["hits"]])


[Package GenomicRanges version 1.4.6 Index]