countGenomicOverlaps {GenomicRanges} | R Documentation |
Count read hits per exon or transcript and resolve multi-hit reads.
## S4 method for signature 'GRangesList,GRangesList' countGenomicOverlaps( query, subject, type = c("any", "start", "end", "within", "equal"), resolution = c("none", "divide", "uniqueDisjoint"), ignore.strand = FALSE, ...)
query |
A GRangesList, GRanges, or GappedAlignments
object. The query is intended to be a
GRangesList where each list element is a read. When
the length of a list element is greater than 1, it is assumed to
represent a spliced read. Both GRanges and
GappedAlignments are coerced to a GRangesList with
each row as a single list element. If the cigar in the
GappedAlignments has gaps, the read will be represented in
the corresponding GRangesList object as a split read (i.e.,
list element of length 2 or greater).
|
subject |
A GRangesList , or a
GRanges object.
The subject is expected to be a list of genomic features,
specifically genes, with each row representing a feature
where the feature could be an exon or transcript. If a
GRanges
object is provided, it will be coerced to a
GRangesList
with each original range representing as a single list element
(i.e., as a single gene).
|
type |
See findOverlaps in the IRanges package for
a description of this argument.
|
resolution |
A character(1) string of "none", "divide", or "uniqueDisjoint".
These rule sets are used to distribute read hits when
multiple subjects are hit by the same query.
|
ignore.strand |
A logical value indicating if strand should be considered when matching. |
... |
Additional arguments, perhaps used by methods defined on this generic. |
The countGenomicOverlaps
methods use the findOverlaps
function in conjunction with a resolution method to identify overlaps
and resolve queries that match multiple subjects.
The usual type
argument of findOverlaps
is used to specify the type of overlap. The resolution
argument is used to select a method to resolve the conflict
when a query hits more than 1 subject. Here the
term ‘hit’ means an overlap identified by findOverlaps
.
The primary difference in the handling of split reads vs
simple reads (i.e., no gap in the CIGAR) is the portion of
the read hit each split read fragment has to contribute.
All reads, whether simple or split, have an overall value
of 1 to contribute to a subject they hit. In the case of the
split reads, this value is further divided by the number of
fragments in the read. For example, if a split read has 3
fragments (i.e., two gaps in the CIGAR) each
fragment has a value of 1/3 to contribute to the subject
they hit. As with the simple reads, depending upon the
resolution
chosen the value may be divided, fully
assigned or discarded.
More detailed examples can be found in the countGenomicOverlaps
vignette.
A GRangesList object with an additional metadata column specifying the number of hits.
Valerie Obenchain and Martin Morgan
rng1 <- function(s, w) GRanges(seq="chr1", IRanges(s, width=w), strand="+") rng2 <- function(s, w) GRanges(seq="chr2", IRanges(s, width=w), strand="+") subj <- GRangesList(A=rng1(1000, 500), B=rng2(2000, 900), C=rng1(c(3000, 3600), c(500, 300)), D=rng2(c(7000, 7500), c(600, 300)), E1=rng1(4000, 500), E2=rng1(c(4300, 4500), c(400, 400)), F=rng2(3000, 500), G=rng1(c(5000, 5600), c(500, 300)), H1=rng1(6000, 500), H2=rng1(6600, 400)) query <- GRangesList(a=rng1(1400, 500), b=rng2(2700, 100), c=rng1(3400, 300), d=rng2(7100, 600), e=rng1(4200, 500), f=rng2(c(3100, 3300), 50), g=rng1(c(5400, 5600), 50), h=rng1(c(6400, 6600), 50)) ## Overlap type = "any" none <- countGenomicOverlaps(query, subj, type="any", resolution="none") divide <- countGenomicOverlaps(query, subj, type="any", resolution="divide") uniqueDisjoint <- countGenomicOverlaps(query, subj, type="any", resolution="uniqueDisjoint") data.frame(none = values(unlist(none))[["hits"]], divide = values(unlist(divide))[["hits"]], uniqDisj = values(unlist(uniqueDisjoint))[["hits"]]) ## Split read with 4 fragments : ## - 2 fragments hit the same subject ## - 1 fragment hits two different subjects ## - 1 fragment hits a single subject splitreads <- GRangesList(c(rng1(c(3000, 3200, 4000), 100), rng1(5400, 300))) split_none <- countGenomicOverlaps(splitreads, subj, type="any") split_divide <- countGenomicOverlaps(splitreads, subj, type="any", resolution="divide") data.frame(none = values(unlist(split_none))[["hits"]], divide = values(unlist(split_divide))[["hits"]])