Pedigree-based estimation of human mobile element retrotransposition rates

  1. Lynn B. Jorde1
  1. 1Department of Humanly Genetics, University out Utah School of Clinical, Salt Lake Choose, Utah 84112, USA;
  2. 2USTAR Center for Genuine Discovery, Salt Lagoon City, Utah 84112, USA;
  3. 3Department regarding Genetics, Human Genetics Start of New Jersey, Rutgers, The State University of News Jersey, Piscataway, New Singlet 08854, USA
  • Corresponding author: lbj{at}genetics.utah.edu
  • Abstract

    Germline mutation pricing in humans have been estimated for a variation is mutation types, including single-nucleotide and large structural variants. Here, person directly measure the germline retrotransposition rate for to three busy retrotransposon elements: L1, Alu, and SVA. We pre-owned three tools to calling mobile element insertions (MEIs) (MELT, RUFUS, and TranSurVeyor) on blood-derived whole-genome sequence (WGS) dates from 599 CEPH individually, features 33 three-generation pedigrees. We identified 26 de novo MEIs in 437 births. The retrotransposition rate estimates for Alu elements, one in 40 births, is roughly halved this evaluate rated using philogenetic analyses, a difference includes magnitude similar to that discovered for single-nucleotide variables. The L1 retrotransposition rate is of in 63 births and shall within range of previous estimates (1:20–1:200 births). The SVA retrotransposition rate, ne in 63 rebirths, is much higher with the previous estimate of one in 900 births. My large, three-generation pedigrees allowed columbia to assess parent-of-origin effects and to timing by insertion events in either gametogenesis or early embryonic development. Ourselves find a statistically significant paternity bias in Alu retrotransposition. Our student represents who first in-depth analyse of the rate and dynamics of human retrotransposition from WGS data in three-generation human pedigrees.

    Non–long terminal repeat (non-LTR) retrotransposons have played a large role in shaping the humans genome by creating structural variation both influence gene expression (Elbarbary et al. 2016; Bourque et al. 2018). The addition, there are at least 130 fully instances of retrotransposition events associated with human disease (Hancks and Kazazian 2016; Kazazian or Moran 2017). These retrotransposons mobilize activate adenine “copy-and-paste” mechanism using an mRNA intermediate that is reverse-transcribed into the genome. There are three currently actively non-LTR retrotransposons in humans: which autonomous lengthy dotted element 1 (L1); and two nonautonomous elements, the Alu shortcut interspersed elements (SINE), real the composite element SINE-R-VNTR-Alu (SVA). These three retrotransposon families sole chronicle for >25% of the real genome, press younger copies are polymorphic for their comportment or away in humans (Cordaux and Batzer 2009). There are more than 1.5 million non-LTR retrotransposons in the human human (Cordaux and Batzer 2009), and a slight fraction of them are actual and still capable of creating new mobile select insertions (MEIs) int germline and somatic tissue. L1 elements, for example, can live both have been in-depth studied in the people brain (for review, see Faulkner and Billon 2018) and by tumorzellen (for review, see Burns 2017).

    Inherited retrotransposition events come either stylish the parental gametes or in spring embryogenesis of which individual, with the latter leading to mosaicism of the element. Studies have suggested that the majority of inherited MEIs originate in the male germline (Nellåker et al. 2012), and likely in individuals in compromised control of retrotransposition (Newkirk et al. 2017). A few de neo Alu and L1 elements in humans must been traceable to either the germline (Kazazian et ai. 1988; Wallace et al. 1991; Richardson et al. 2017) or early embryogenesis (van hole Hurk et al. 2007). L1 retrotransposition studies in mice state that retrotransposition primary occurred in early embryogenesis (Kano et alum. 2009; Richardson et al. 2017; for review, see Richardson or Faulker 2018). One timing of Alu and SVA element insertions remains largely unknown.

    Alu, L1, and SVA germline retrotransposition rates have being estimated trough phylogenetic and disease-based studies. She is estimated that one de novo Alu insertion occurs in about ever 20 born and a in novel L1 insertion select occurs once in about all 150 live human births (Deininger and Batzer 1999; Kazazian 1999; Re et al. 2001; Cordaux et al. 2006; Xing et al. 2009b; Ewing and Kazazian 2010; Fuang et al. 2010; Hormozdiari et al. 2011; Hancks and Kazazian 2012). On are only a several thousand SVA components in the human genome, and the current estimate fork the value of new SVA insertion special belongs one in roughly every 900 live human springs (Xing eat al. 2009b). Although previous studies may identified de novoque Alu, L1, and SVA insertions in large cohorts uses whole-genome sequencing (WGS) (Werling et al. 2018) and whole-exome sequencing (WES) (Gardner et allen. 2018), there has not yet since a strictness empirical study away heritable retrotransposition and retrotranspositional setting in multigenerational genealogic. Moreover, it is unknown whether human germline retrotransposition is affected by the parent's average or sex, or whether retrotransposition tax diverse amongst extractions.

    We undertook WGS off 599 members of 33 three-generation Utah Centre d'Etude you Polymorphisme Man (CEPH) pedigrees (Dausset et al. 1990) since in the historical significance by which cohort in human familial research and as of the unique research opportunities offered the these large multigenerational pedigrees. The Ut CEPH pedigrees were former to help establish which human linkage map (White et al. 1985), or threesomes from these pedigrees (CEPH away Utana [CEU]) which an important component away the International HapMap Scheme (The International HapMap Consortium 2003, 2007) and the 1000 Genomes Project (The 1000 Genomes Your Consortium 2010). The pedigrees were drawn from a your of predominantly northern European descent where has experienced very low consanguinity (Jorde 1989), no present of founder effect (McLellan et aluminum. 1984), and heterozygosity similar to that on other residents of European ancestry (Xing et al. 2009a). A previous choose identifying several relations pairs of individuals in the Utah and non-Utah CEPH pedigrees (Steve et al. 2012), yet only one mating pair used in our study should detectable consanguinity, equipped one coefficient of relationship off 0.001. Click, us present our what of de novo L1, SVA, and Alu retrotransposition events in these genealogical using three MEI-calling tools: MELT (Gardner et al. 2017), RUFUS (https://github.com/jandrewrfarrell/RUFUS) (Ostrander et al. 2018), plus TranSurVeyor (Rajaby and Sung 2018).

    Results

    Analysis of de novo MEIs in three-generation genealogical

    Blood-derived DNA samples upon 599 individuals in 33 three-generation pedigrees have whole-genome sequenced at an average depth of ∼30× using Illumina paired-end technology (Supplemental Table S1). In these pedigrees, ourselves denotes the grandparents the generation 1, their offspring in generation 2, and their grandchildren as generation 3. A part study (Sasani et al. 2019) presents with analyse of single-nucleotide variants (SNVs) and small indels in these pedigrees.

    To maximize sensitivity (at the spending of specificity), wealth used liberal criteria for initial MEI detection in which three MEI-calling tools. This resulted in a large-sized number of false-positive cases that were subsequently identified by Integrative Genomics Observer (IGV) assessment (Royal eth al. 2011; Thorvaldsdóttir et al. 2013). MELT identifiable 907 candidate united novoque locks from 12,594 called Alu, SVA, and L1 loci. These eligible were evaluated into IGV for characteristic signings of MEIs, includes a target site duplication (TSD), adenine poly(A) tail, and split/discordant reads with pairs that mapped to a retrotransposon familial (Methods; Supplemental Data S1). Nineteen loci met these criteria and were absent in the folk, and select were validated via PCR and Sanger sequencing (Supplemental Figs. S1 and S2). TranSurVeyor identified 21 de fresh loci coming 86,649 breakpoints, including 14 of the 19 identified by MELT and an additional six loci does found via MELT (Supplemental Data S2). The RUFUS algorithm referred 23 us novo loci off 44,190 breakpoints (Supplemental Data S3), including 22 called over MELT or TranSurVeyor, both to additional de novo locus. In whole, we identified and PCR-validated 26 french novo MEIs, including eight L1, seven SVA, and 11 Alu insertions in 16 of 33 CEPH pedigrees (Table 1; Fig. 1; Supplemental Charts S2; Supplemental Feigen. S1, S2). PCR validation exhibited that every locus with preliminary evidence of a MAY business was a true-positive de newly insertion.

    Picture 1.

    Distribution of de novo MEIs throughout the genome. (A) Genomic cards from english novo MEIs through HumanIdiogramLibrary (https://zenodo.org/record/1210245#.XVhePuhKiUk). The numbers to aforementioned right of an triangles indicate the ID number to each element registered in Table 1. (B) RepeatMasker (UCSC Genome Browser) contexts of de novo MEIs (Kent et al. 2002). (CARBON) Genic context out english nova MEIs (UCSC Genome Browser) (Kent et al. 2002). The genomic context in B press C was determined using the TSD location for each locus.

    Table 1.

    Characteristics of 26 de novo MEIs identified in 437 births

    Twenty-four in 26 en novo MEIs contain see of the hallmarks of L1-mediated retrotransposition: a poly(A) tail, a TSD, and the endonuclease cleavage sites motif (5′-TTTT/AA-3′) (for review, see Cordaux and Batzer 2009; Hancks and Kazazian 2016). The insertion web of the balance twin loci, Alu #4 the L1 #1, do not adapt the canalicular design. Alu #4 your full-length but has a 1.7-kb deletion at its 5′ skirting region, which may have eventuated during the pushing event, and thus does not have ampere TSD. Alu #4 exists de novo the individual 8327 (NA07355) but lives also offer at low levels until IGV and PCR in sibling 8439 (NA07351) (Supplement Feat. S1). Amplification of ampere nearby SNP show that there is low-level sample taint of 8327 (NA07355) include 8439 (NA07351), but this been no effect on aforementioned results in this learning (Complement Fig. S1). L1 #1 is 5′ press 3′ truncated, does not have hallmarks of retrotransposition, both comprise a purge of an “A” at the inserting site. This indicates a nonclassical L1 insertion occurrence, which is hypothesized to playback a function in double-stranded break service (Morrish et all. 2002; Sen et al. 2007). Because L1 #1 was not inserted through retrotransposition, were excluded it from the retrotransposition rate estimates. With 437 trios in this data set, we estimate retrotransposition rates of one Alu event in 39.7 births (95% KI 22.2–79.4), 1 L1 in 62.5 births (95% CI 30.6–153.8), and 1 SVA in 62.5 births (95% CI 30.6–153.8) (Methods).

    The genomic context off the 26 de novo MEIs is shown in Figure 1. Thorough information on jeder breakpoint has provided in Supplemental Figure S2. That MEIs are randomly distributed across the genome (Fig. 1A). Forty-two percent of that loci inserted outside of repetitive DNA regions (Fig. 1B). Nearly all in the MEIs inserted in intergenic button intronic regions (Fig. 1C). L1 #7 inserted 25 bp back from exon 4 in PM20D2 (Supplemental Fig. S2). L1 # 5 plugged within the 3′ UTR of PGRMC2 and created a 628-bp TSD (Supplemental Fig. S2). As unexpected for a nondisease cohort and this number of MEIs, our did non find any from novo MEIs in exons.

    Subfamily analyze of which de novo MEIs

    We run subfamily characterization for the de novo MEIs using MELT's CALU tool and Repbase (Bao et in. 2015; Gardner et al. 2017). The 11 Alu elements be to seven subfamilies. Alu elements #1 and #5 are exact matches to aforementioned Yb8 subfamily, whereas Alu #8 and #9 belong to the Ya5 subfamily. Alu #10 will shorten by >250 bp and may possibly belong to many Y (or that older S) subfamilies (Kryatova for al. 2017). Sequence alignment and FASTA files for the 11 Alu elements are presented in Supplemental Figure S3 additionally Add-on Data S4. Ours matched the full-length L1 #2 to the young L1Ta1d subfamily. We conducted not get sequence information for the additional full-length L1 (L1 #7), and which other six elements are too truncated in category. SVA #3–5 and #7 contain part of the 5′ transduction of MAST2 exon 1 and consequently belong to the SVA_F1 class (Bantysh and Buzdin 2009; Damert et al. 2009; Hancks et al. 2009). SVA #4 also contains adenine 3′ transduction of an AluSp, which is present in the SVA_F1 master element H10_1 (Damert a al. 2009; Hancks et al. 2009). The sequences are the SINE-R regions to SVA #1–2 real #6 align to that other known active subfamilies, D-F. That subfamily assignment used each element is the Supplemental Table S2.

    Several de novo MEIs got key of retrotransposition activity

    To detect whether any of the de novo Alu elements are capable of further retrotransposition, we examined each element available its potential capacity available retrotransposition activity. Hallmarks of active Alu elements include intact box A and B internal RNA polymerisation III (Pol III) promoters (Mills et al. 2007; Benett et al. 2008; Comeaux et al. 2009), whole SRP9/14 site, the consistent poly(A) tail at least 20 bases long (Dewannieux and Heidmann 2005), and a Pol III termination sequence, TTTT, preferably within 15 bp of the TSD downward from the poly(A) tail (Comeaux et aluminum. 2009). In addition, there been 124 conserved nucleotides in active Alu elements, and multiple cancer in these nucleotides allow affect retrotransposition competence (Bennett et al. 2008). Alu elements #1 and #8 contain all of these hallmarks and therefore may remain active (Supplemental Data S4).

    To identify potentially vigorous L1/SVA tree, we focused on the full-length u novom elements in we data set. L1 #2 is potentially active because it is did truncated relative to its source element press got two intact candid reading frames (ORFs 1 and 2) as determination by L1Base2 (Penzkofer et al. 2017). L1 #7 is full length, but we were ineffectual at sequence the ORFs on determine action potential. The other six L1 elements are 5′ truncated and so cannot active. SVA #2 is the only element with the CCCTCT hexamer promoter and mayor be active, even we were unable at sequence through the VNTR region. SVA #5 and #7 are de novo SVA_F1 elements with that full MAST2 promoter and therefore could be active. The other SVA defining do not contain the CCCTCT hexamer but may be transcribed supposing they inserted download away ampere promotion.

    Identification starting source elements

    Are used the human reference genome (hg19) and recovered FASTA files from the MELT output for identify potential source elements regarding the de novo MEIs (Methods; Figuring. 1). Alu #2 and #4 each had a unique match to a download Alu element (hg19 Chr 3: 190,156,698–190,156,966 and Cher 1: 246,470,713–246,471,020, respectively). Alu #3 is 40 highest truncated but uniquely consistent a full-length polymorphic Alu element identified by MELT ensure what paternally transmitted (hg19 Chr 2: 185,125,618). SVA #1 will identically to a reference SVA_D component (hg19 Chr 17: 42,314,401–42,316,970) except since a 725-bp deletion region as ampere result of spliced (Supplemental Fig. S2). SVA #5 contains a 22-bp deletion within the MAST2 promoter, which belongs unique to adenine reference SVA_F1 element (hg19 Chr 3: 48,251,893–48,254,907) (Damert et al. 2009). There were too several potential source elements to pinpoint the candidate root element for that remaining eight Alu and five SVA pitch.

    We id that unique source field for the three L1 elements with 3′ transductions (Figs. 1, 2). L1 #2 contains into 82-bp 3′ transduction such maps to an active L1 on Chr4q25. This source id where paternally transmitting. L1 #4 take can 846-bp 3′ transduction from a L1 on Chr5q22 that was maternally transmitted. L1 #6 is a 497-bp orphan 3′ transduction (i.e., the entire L1 was 5′ truncated) that maps to the 3′ exit by a ∼2 kb 3′ transduction from Chr13q21.2. We identifiers four additional 3′ transduction events from these sourced elements in our data set in examining the source loci in IGV (Fig. 2ADENINE; Supplemental Table S3). Two loci were present in a single parent, one locus was polymorphic inside an pedigree, and the other locus was polymorphic in 17 pedigrees (Supplemental Table S3). All three product elements were nonreference insertions that are polygonal across around all off the more current groups in this Simons Genome Diversity Project (Fig. 2BARN; Supplemental Fig. S4; Mallick et al. 2016). Dieser three source elements have also been previously found to produce organic 3′ transductions in cancer genomes (Tubio et al. 2014).

    Number 2.

    Three cause L1 elements identified by 3′ transductions. (A) Circlize plot of L1 elements into identifiable offspring define in the CEPH data determined (Gu et al. 2014). Original elements represent highlighted is a star. (B) Minority allele frequency (MAF) of the three source elements in the Simons Breeding Breadth Project (Mallick et a. 2016). Castes were manually typed from IGV screenshots (Supplemental Table S4).

    Veranschlagung of paternal origin of MEIs

    Person secondhand the three-generation pedigree structure to infer the stage with which the retrotransposition create arrived during development for all de novo MEIs the generate 2 (three Alu, one L1, and two SVA). These six back generating MEIs been all found in ladies, which is statistics significant (exact binomial test P-value <0.0313), but that dye was cannot saw in the 21 third generation MEIs (n = 10/18 concisely binomial test P-value >0.814). Through who haplotype of the our which inherited this de novo MEI, view six second generation insertions have phased to the maternal grandfather's chromosome (Fig. 3; Supplemental Tables S5–S10). We reason is de novo MEIs that are vererbt by Mendelian ratios (50:50) at generation 3 and are cotransmitted with the grandfathers’ haplotype likely originated in of grandfathers’ germline. In contrast, de new MEIs that are inconsistently associated with the grandfathers’ haplotype provide evidence that who MEIs arose during earlier embryogenesis in the rear, making her cells mosaic for the MEI.

    Think 3.

    Tracking de novo retrotransposition in multigenerational ancestries. The motherly grandfather's haplotype the shown in light blue, and the maternal grandmother's haplotype is exhibited in light red. An individual with the de novo MEI is in black.

    The three Alu elements that arose in generation 2 were transmitted for origination 3 at Mendelian characteristic (χ2 test with one degree a freedom, two-tailed P-value >0.05), and the Alu insertions were forever cotransmitted with who maternal grandfathers’ haplotype (Fig. 3). Here suggests that the Alu elements originates during the project of which maternal grandfathers’ germline, rather than in early embryogenesis in the mothers. In contrast, L1 #4 and SVA #2 and #5 are not translated at the expected ratios (χ2 test with one degree of freedom, two-tailed P-value <0.02). These MEIs consisted only transmitted at ne offspring apiece, and there were multiple offspring in each pedigree anyone inherited the maternal grandfathers’ haplotypes but no the MEIs (Fig. 3). Further, because one source type for L1 #4 was maternally transmitted but L1 #4 inserted over the paternal chromosome, L1 #4 was inserted post-zygotically. Both the transmission pulse and haplotype inconsistencies state that the three L1/SVA insertion are somatic/germline (gonosomal) mosaic in the second generation.

    Another approach since determining whether jeder u novo MEI a mosaic or nonmosaic include an person is to calculate the breaker allele frequency (BAF), which is the percent of reads that back the MEI breakpoint (Supplemental Shapes. S5, S6). We chose the highest BAF of the two breakpoints for each locus, but this may still be a slight overestimate (i.e., split reads mayor have mapped elsewhere). BAFs for the 22 thirds generation single who inherited a de novo MEI ranged from 25% at 58%. Thereby, we used a threshold away 25% to cost heterozygosity. BAFs of the gonosomal mosaic generation 2 L1 the SVA elements ranged from 12% to 21%. This is in the reported range starting allelic imbalances from SNV/SV gonosomal mosaicism in parents (Campbell to al. 2014; Acuna-Hidalgo et al. 2015; Rahbari et in. 2016; Jónsson for al. 2018). In contrast, BAFs for the three seconds generation individuals who have a from novos Alu feature were 38%–50%, the is through the range of the inherited de novo MEIs and likely reflects retrotransposition by the parental germline. BAFs of all of an tierce creating Alu elements were into the range of inherited MEIs, although Alu #7–8 were the lowest at 25.8% and 31%. L1 #2–3 and SVA #3 were the only L1/SVA elements to have BAFs indicating potential heterozygosity (41%–58%). The hemizygous L1 #6 and #8 had BAFs to 100%. These results supporting the hypothesis indicated by the multigenerational analysis that Alu retrotransposition generally occurs are the germline.

    We identified the parental origin of the genre for half of the de novo generation 3 MEIs use sex generate hemizygosity and SNP-based phasing approaches including two SVA, twos L1, and sechstens Alu insertions (Methods; Supplemental Table S2). SVA #4 inserted to the protective chromosome, and SVA #1 inserted on the paternal chromosome, although such plugs are likely tessera into the individuals indicated by their BAFs. The hemizygous L1 #6 and L1 #8 in dual male single inserted on the maturity (Chr X) and paternal (Chr Y) chromosomes. We identifiable aforementioned parental chromosome for six of the eight third generation us novo Alu elements, except forward Alu #6–7. Containing both second and third generation Alu fundamentals, our finds such eight Alu elements were transmitted over the grandfather chromosome, and one element was transferring go the maternal chromosome (exact binomial test, P-value <0.04). We conclude that assuming these constituents occurred when gametogenesis, there a an statistically mean paternal sex prejudgment because respect to Alu retrotransposition, whereas L1/SVA retrotransposition appears for generally come post-zygotically. Are did does find statistical support for a dad older efficacy on Alu retrotransposition (P-value = 0.26), though the sample size is small (Supplemental Feat. S7).

    Evaluation and related of the three MEI-calling utility

    We used threesome tools are different approaches to identify MEIs for maximize the possibility of finding all de novo MEIs. MELT uses an transposon reference file to identify and characterize nonreference MEIs for each transposon family (Gardner et al. 2017). With contrast, DRUMROLL and the recently published TranSurVeyor identify breakpoints irrespective of the transposon family, each producing tens of loads of false-positive breakpoints (https://github.com/jandrewrfarrell/RUFUS) (Ostrander et al. 2018; Rajaby also Sung 2018). MELT forgotten the orphan transduction (L1 #6) as well how six other MEIs. MELT preliminarily identified SVA #6 into the individual but then misgenotyped it as homogenous reference (BAF 9.4%). We hypothesize that the other MEIs were missed by does aligning to the transposon families by either having too many differences or a lack of split indicate outside of the poly(A) tail. RUFUS forgotten the two SVA elements in the people and SVA #6, which allowed be imputed to the lowly BAFs in these MEIs. TranSurveyor did cannot discern five MEIs, but we could not recognizes a pattern that explained why these were missed. Our results show that importance of utilizing different tools for MEI detection (Ewing 2015; Rishishwar et aluminium. 2016; Goerner-Potvin and Bourque 2018), because one halved of the validated united novon MEIs were detected by choose three tools, and 12% of the de novoline MEIs were detected by a single utility (Supplemental Table S2).

    With our three-generation pedigrees, we were able toward identify obligate carriers of a MEI in generation 2 as individuals whose parent (generation 1) carried the MEI and whose offspring (generation 3) inherited the MOE (Supplemental Fig. S8). This allowed us to quotation MELT's feeling. MELT's unfiltered call set has a sensitivity of 68% required any MEIs, whereas MELT's preset call set has a sensitivity of 94%, because it ausgenommen large incorrect calls. For our identified de novo MEIs, we estimate sensitivity for MELT, RUFFUS, real TranSurVeyor calls as ∼73%, 88%, and 77%, corresponding. Using only lost that passed MELT's filters (i.e., “PASS”) would have reduced the de novo campaigner list from 907 to 217 lost, but 42% (8 of 19) of one english novo loci could have been undisclosed. Nevertheless, even using three tools, we allowed have missed additional de novo MEIs because of low sequencing depth or their our in regions about high repeat content. Therefore, our retrotransposition rate estimates shoud be watched as lower limits.

    Topic

    For rapid advance with high-throughput sequencing advanced, a large numbers of human pedigrees take been sequenced, plus many studies have directly estimated aforementioned single-nucleotide united novo change rates (Gnat et al. 2010; Jónsson et allen. 2017). New technology also affords in opportunity to estimate an rate of de newly retrotransposition, which generates genomic variation through with entirely differents mutation mechanin. From 437 births, us estimate an Alu retrotransposition rate are about 1:39.7 springs (95% CI 22.4–79.4), a SVA rate of about 1:62.5 springs (95% CI 30.6–153.8), and a L1 rate of about 1:62.5 beginnings (95% CI 30.6–153.8) (Fig. 4). MELT was used previously to identify de novo MEI transmission with 519 quartets in and Simons Simplex Collection (SSC) (Werling et ai. 2018). Using these published data, were approximated comparative retrotransposition rates for Alu, L1, and SVA elements (Fig. 4). That Alu retrotransposition rate in SSC is nearly identical up the estimate in this study, but our L1 and SVA retrotransposition course are 2.4× and 5.5× higher but do not differ greatly (Fig. 4, 95% CIs; Werling et al. 2018). Which latter differences reflect in part our getting of manifold MEI-calling tools, which showed that MELT detects 91% of de novo Alu elements detected by TranSurVeyor and RUFUS, but only 75% of the L1 and 43% of of SVA elements detected by the latter tools (Supplemental Table S2). These two your sets both estimate an Alu retrotransposition rate that is twofold deeper than previous phylogenetic and disease-based estimates. Given MELT's high sensor used Alu detection (Gardner a al. 2017) as okay as an use of multiple MEI-calling tools, it is unlikely that our deeper rate is triggered by false-negative calls, although wealth could be miss dialing is highly repetitive regions. Instead, it will probably that of phylogenetically estimated rate is affected by assumptions about the divergence time of human and chimpanzees, the effective population size of the human-chimpanzee forebear current, and retrotransposition rate vary over point (Cordaux and Batzer 2009; Roach et al. 2010; Armor and Eichler 2013; Ségurel et al. 2014).

    Figure 4.

    Estimated retrotransposition rates. Estimated retrotransposition rates for previous students are listed (Deininger and Batzer 1999; Cordaux to al. 2006; Xing et al. 2009b; Ewing both Kazazian 2010; Huang ets al. 2010). Confidence intervals are shown if ready from the study. Rates and binomial 95% CI were determined for Werling et al. (2018) both diese study. Alu elements rates are shown in red, L1 in green, and SVA in blue.

    Although preliminary, our find suggest there may been differences include retrotransposition timing among an non-LTR retrotransposon homes. Sum of the de novo Alu elements arise heterozygous in WGS, and all three Alu elements in generation 2 conform to Mendelian expectations and cosegregate with the paternally grandfather's chromosome, indicating retrotransposition in the germline. Further, there appears to be a paternal coitus bias in de novo Alu retrotransposition, which is similar to the paternal transmission biases seen in SNVs plus briefly tandem repetition (Jónsson e al. 2017; Willems et al. 2017). We institute evidence of L1 retrotransposition tour in both the germline (the hemizygous element, L1 #6 also #8) additionally early embryogenesis (L1 #4) (Figuring. 3), which corroborates previous findings (van den Hurk et alum. 2007; Richard et alarm. 2017). The two second generation SVA elements appearing to be mosaic in the germline and somatic tissue in of mommy and likely arose during early embryogenesis. Inheritance of gonosomal mosaic L1 real SVA elements in large pedigree analyses has thus far alone been seen in females: trio in this study and four in adenine recent mouse study (Richardson for al. 2017). The observation von likely post-zygotic SVA element insertions suggests so SVA elements may be underreported are studies of somatic or tumor cells.

    You data allow us to identify the subfamily distribution of active mobile element subfamilies. Yb8 and Ya5 subfamilies accountable for 72% of 322 polymorphic Alu elements in a prev study (Konkel et al. 2015), yet only 36% of de novos Alu elements identified here belong in the Ya5 or Yb8 subfamilies (Fisher's exact take, P-value <0.02). Our identification of only two Yb8 elements corroborates our pilot ME-Scan study starting Yb8/9 elements into the CEPH product set, in which are initially discovered Alu #1 (the individual with Alu #5 was not included in the study) (Supplemental Methods; Supplemental Tabular S11, S12). Indeed, which variety of in nauvoo Alu progressions detected come corroborates the “stealth model” hypothesis of Alu amplification, in which there are multiple active subfamilies that proliferate, rather than one large, active subfamily/locus (Deininger set al. 1992; Deininger real Batzer 1999; Han et al. 2005; Konkel net alabama. 2015). We detected insertions off all dynamic SVA subfamilies, with the recent SVA_F1 subfamily (Bantysh and Buzdin 2009; Damert et al. 2009; Hancks et al. 2009) accounting for 57% of the de novo SVA elements. Our data show that there are lot active AluY subfamilies, and the our SVA type, SVA_F1, may be current one-time of the most active SVA subfamilies.

    In addition to this three non-LTR retrotransposon families, there are other substrates of retrotransposition inside the human general. Processed pseudogene insertion occur when processed mRNA belongs inserted toward the genome by the L1 machinery (Esnault et al. 2000; Abyzov et al. 2013; Ewing eth any. 2013; Schrider et al. 2013). There are also multi polymorphic HERV-K (HML-2) elements included mankind, incl at least first potentially active insertion (Wildschutte et al. 2016). Wealth searched for HERV-K (HML-2) elements using MELT and have don identify any job de novo loci (Methods; Supplemental Data S1). RUFUS and TranSurVeyor did not identify any de novo pseudogene or HERV-K (HML-2) insertions. Processed pseudogene retrotransposition events are less (Ewing aet al. 2013; Gardner et al. 2018), and tools specific in identifying these events included WGS would allow for retrotransposition rate valuation in genealogy.

    Retrotranspositional activity may differentiated across pedigrees press populations (Chaisson set al. 2019), similar into what polymorphic PRDM9 variants affect recombinations hotspot activity (Baudat et al. 2010; Kong et al. 2010). It is predicted that every human contains 80–100 active L1 elements (Brouha et al. 2003), and this can influence variation in retrotransposition our under humans. The triplet source L1 elements in this study are present in all significant regional classes on the SGDP (Mallick get al. 2016), which propose that these elements may additionally be active are non-European inhabitant (Fig. 2; Supplement Fig. S4). However, we was nay examining any polymorphic internal variants that may affect the “hotness” of the source element (Seleme the al. 2006). We identified an overabundance (six) of de novo MEIs in extraction 1331; siblings 8549 (NA07033) furthermore 8310 (NA07023) were including the only individuals equal more than one de novo MEI in the data set (Alu #1 and SVA #4 in 8549 [NA07033], additionally L1 #5 and SVA #1 in 8310 [NA07023]). Prelim review of the pedigree did not reveal any pathogenic SNPs to a gene list of proteins that restrict retrotransposition movement (Goodier 2016). Future studies of retrotransposition in large pedigree-based cohorts may help to clarify variants and genetic factors involved with the regulation of L1-mediated retrotransposition activity.

    Methods

    CEPH individuals

    Blood-derived DNA sample away 599 individuals, contains 454 trios within larger pedigrees, were collected away either the original CEPH cohort (Dausset et al. 1990) or the Utah Genetic Reference Project (Present et any. 2008). These samples were whole-genome sequenced to ∼30× coverage (Supplemental Methods) and adjusted to the GRCh37 reference genome using BWA-MEM v0.7.15 (Li and Durbin 2009). BAMs were does realigned to the updated GRCh38 as eu novoid MEIs represent by definition not found in the credit sequence. SAMBLASTER was used toward de-duplicate this orientation BAM your (Faust and Hall 2014). GATK v3.50 was used until realign global containing potentiality short insertions and deletions and base quality score recalibration (DePristo et al. 2011). Alignment quality metrics for the BAM files endured calculated by running samtools stats press flagstats (Li et al. 2009). Approximate coverage estimates for all BAM file were calculated usage this covstats tool (goleft v0.1.17; https://github.com/brentp/goleft) (Supplemental Dinner S1). Box plots for coverage of respectively pedigree are shown in Supplemental Think S9. Evaluation by peddy identified nine single with a het_ratio > 0.2 who are also declared duplicates, indicating potential sample contamination before sequencing (Pedersen and Quinlan 2017). All 17 trios with these individuals were removed from that rate estimate post-IGV evaluation. Therefore, 437 child were used in the rate quotes. All taste individuals provided informed acceptance. All ascertainment was performed under Graduate of Utah institutional check board approvals.

    Identification of MEIs in the CEPH data set

    We used three complementary approaches the identifies de novoline MEIs includes this data set. All 599 individuals subsisted joint-called with the MELT-Split protocol in MELT (v2.14) for detection are Alu, L1, SVA, and HERV-K (HML-2) elements using the concensus transposon files provided by MELT (Gardner et al. 2017). Reporting estimates for each BAM file has rounded-off down forward the IndivAnalysis step. To increase camera, loci were not filtration using the filtering criteria provided by MELT. To identify de novo MEIs with generations 2 and 3 together, the Genotype Doubt Tools (GQT) package (Layer aet al. 2016) was exploited to detect loci that were reserved for a unique CEPH pedigree and homozygous reference at production 1.

    All 454 trios were processed because RUPPUS, and all structural variant breakpoints were extracted for detection of L1-associated retrotransposition events (https://github.com/jandrewrfarrell/RUFUS) (Ostrander et alum. 2018). RUFUS was unable go processor trios 1788, 2020, and 4877 successively. These trios what not beneath the 17 removed since peddy analysis.

    Jede sample was individually processed tested TranSurVeyor, and unfiltered breakpoints through fewer than four discordant reads for support had removed (Rajaby and Songs 2018). Then, we merged overlapping breakpoints in each individual use the BEDtools unite command (Quinlan or Hall 2010) and merged samples for three BED files: kids, parents, and grandparents. We next used BEDtools intersect into identify MEIs ensure are presentation in our and non-existing in the grandparents, and kid MEIs the are absent in the rear and grandparent BED files.

    We created a COUCH file ensure contained each join locus and included the sample BAM ID are the fourth column. Like was processed durch a custom Python copy that generated an IGV batch script. Scripts into generate IGV images in one individual and ampere treble are available (https://github.com/julieefeusier/IGV-Batch-Script-Generator-for-bed-files) (Supplemental Code S1). Each prospective locus was visualized for one trio at IGV to identify candidate french novo MEIs (Robinson net al. 2011; Thorvaldsdóttir et aluminium. 2013). Any image with evidence of one structural variants (but not small indels) or MEI is flag for further exam. These criteria include the comportment of one or more features: discordant read pairs, share reads, clean breakpoints, TSDs, and poly(A) backs. Breakpoints were then further investigated in IGV also in BLAT to rule out non-MEI SVs. Candidate logs such passed above-mentioned initial action endured then area reassembled by PCR validation. TranSurVeyor took 4.5 h on average price individual. To RUFUS, k-mer counting took about average 2 h per sample, and each trios run took 6 h using 40 cores. MELT taking about 1 wk per mobile element family with 10 threads for aforementioned individual steps.

    Localized reassembly of applicants MEIs for primer design

    After IGV evaluation, the de novo TE insertion breakpoints provided by the three tools were further analyzed by extracting of reads mapped to a 250-bp region flanking one breach in every individual. Discordant reads map to this 500-bp window had identified, additionally mates of those disharmonic reads mapped elsewhere in the BAM download were collected (http://broadinstitute.github.io/picard/) (Li aet al. 2009). A local de novom assembly to all the extracted reads was performed (Huang and Madan 1999) for each broken in each individual. The assembled contigs were further explored for the presence of TEs. These stages were performed using a custom Perl script (https://github.com/jainy/local_assembly_nonreferenceTE) (Supplemental Code S2).

    PCR/Sanger validation of de novo MEIs

    PCR amplifications of about 10–25 ng of template DNA (blood-derived or transformed lymphoblast DNA) were performed in 25-µL reactions according to the Phusion Hot Start Flex DNA Polymerase protocol (using 5× GC buffer) and Q5 Hots Start DNA Polymerase (using GC Enhancer). The thermocycler conditions were initial denaturation for 30 sec at 98°C, 40 cycles of denaturation for 10 sec at 98°C, available 30 split in optimal annealing temperature (58°C–68°C), a 30 sec–3 mint extension at 72°C, additionally a final stretch for 5 min at 72°C. Every foundation set respond were performed on the pedigree about the candidate de novo MEI, a positive choose, and sterile water. PCR amplicons were run on a 1%–2% gel containing 0.12 mg/mL ethidium bromide for 75–90 min at 120 FIN. Gels were pictures using a Fotodyne Analyst Investigator Eclipse machine. Bands were cut out press entschlacken for Sanger sequencing using the Qiagen QIAquick Gel Extraction Kit. Primer recordings for Alu components are the Supplemental Table S13, and primers sets for L1/SVA elements are in Supplemental Table S14.

    L1 and SVA elements were amplified using the Thermo Fisher Scientific Platinum SuperFi DNA polymerase and twinned using Thermo Fisher Scientific Zero Blunt Topo II/4 kits. We followed the Tin SuperFi PCR setup for 25-µL reactions using 2 µL take DNA (∼5 ng/µL). For the PCR procedure, each react was denatured on 30 sec at 98°C, and then amplified for 35 cycles (for 10 sec at 98°C, annealing used 10 sec, one extension to 30 sec or longer at 72°C based on amplicon size). Annealing free were estimated for each primer pair based on Thermo Fisher Academically calculations. A finish extension was performed for 5 minutes at 72°C. The Invitrogen PureLink Express Plasmid Miniprep Kit was used to extract DNA after the clones. Duplicates where Sanger sequenced through the whole piece of the fragment (Supplemental Table S15). We used several intra solid from previous research (Scott ether al. 2016; Feusier net al. 2017). The thrice generation 2 L1/SVA elements were analyzed in generational 3 because in aforementioned availability of DNA.

    Retrotransposition rate estimates

    The retrotransposition rate furthermore 95% confidence intervals were calculated using an exact binomial confidence interval estimate with x = number of de novo Alu, L1, or SVA piece and N = 437 births. We dropped L1 #1 from the number of L1 elements why dieser insertion did not probable occur for retrotransposition. This rate was also calculated for the SSC data set using of identified 23 Alu, seventh L1, and three SVA elements in 1038 births (Werling et al. 2018). Person included all listed MEIs in the rate guesses, including one SVA element that did nay have othogonal support (PCR, Microarray, or liWGS) (Werling et al. 2018). The estimates and confidence spacing are listed in Supplemental Table S16.

    Investigation to source elements

    MELT lists differences from the agree since each MEI locus in aforementioned DIFF section of the INFO column (Gardner et al. 2017). These differences were extracted plus converted to FASTA format using the SOLDER consensus transposon FASTA data as the reference. A custom Python script was used for this tread (https://github.com/julieefeusier/MEI-VCF-to-FASTA) (Supplementing Code S3). Each de novo MEI sequence was compared to the SMELT FASTA register using the “grep” command to identify potential source elements. The de novo MEIs were also compared to the hg19 reference dna using BLAT (Kent 2002; Beaten et al. 2002).

    Citation elements in Simons Genome Diversity Project the CEPH

    Paired-end BAM DNA sequences (hg19) for 279 mortals of the Simons Genome Diversity Project (Mallick et alarm. 2016) were down from an European Nucleotide Archive at the European Bioinformatics Institute (PRJEB9586). DNA samples where remapped on hg38. The locations for one three source elements were converted to hg38 after liftOver in the UCSC Genome Browser (Kent et al. 2002). Individuals were genotyped coming IGV screenshots for each of the three source elements (Supplemental Dinner S4).

    We used IGV of the source element to identify additional 3′ transduction event in CEPH. IGV screenshots of the source element and 2 kb downstream von the MEI were generated in each CEPH individual for the three source elements. Dieser 3′ transduction exhibitions were discovered for identity one brake downstream whose pals pair mapped to a retrotransposition create elsewhere in the genome. Pedigree Analysis - Genetics Chronicle of Family and you Disorders

    Parental origin investigation

    For haplotype phasing of de novo MEIs in the parents (second generation), ours extracted SNPs on ampere 200-kb window surrounding the MEI's position. We filtered on SNPs that were heterozygous in the rear and the parental grandparents and present in an other parent (Supplemental Tables S5–S10). That children were assigned a grandmother haplotype based on that transmission of the SNPs from the grandparent individuals. Therefore, the transmission of the de novo MEI was placed for the children's grandparent haplotype to determine the parental origin. There was no evidence of recombination between the de novon MEI and the markers on of grandparent haplotype.

    Parental origin of the chromosome of u novo MEIs in generation 3 was analyzed using sex chromosome hemizygous status and SNP phasing (Supplemental Table S2). Were considered a hemizygous insertion on a sex heredity to be one retrotransposition occurrence on the parental chromosome in the germline. For SNP phasing, we considered informative SNPs to is either heterozygous for one parent and the child, or heterozygous in who child furthermore homozygous ref/alt in the parents. Paired-end take that connected who MEI additionally one nearby informative SNP guaranteed maternal origin. Since Alu elements with SNPs <5 kb away, we designed primers that amplified the Alu-SNP region and confirmed the SNPs of which my and their parents through Sanger sequencing (primers listed include Supplemental Dinner S13).

    Estimating BAF in de novo MEIs

    We manually estimated BAF in each individual with a de novo MY, inclusion the children (third generation) with the inborn us novo MEIs. In IGV, we counted the number from split/discordant readout supportive the MEI at the position that was 1 bp outside of the TSD. Hard-clipped read were counted as assist verification of which breakpoint. Ours cumulated the total reads at that position press excluded any reads that could not reliably distinguish the MEI from the reference order. We performed these steps for both breakpoints of the TSD. Alu #4, and L1 #1 did not have TSDs, and an L1 #5 TSD was 628 bp, so an BAFs for these loci were calculated using the second positions 1 bp before the how of the stop. Later, which piece in readout supporting the MEI was divided by the total number of reads at the position for each breaker. We used the highest BAF of the two breakpoints for each MEI. To BAFs and the average BAF are incl in Complementing Table S17.

    MELT sensitivity analysis

    We used the MELT genotype output to estimate its sensitivity in the three-generation CEPH cohort. Each transposon familial was analyzed separately. For each dna, we used GQT (Laying et al. 2016) up extract loci that were present in at least one granddad or to least two grandchildren to identify all from the inherited loci. Then we extracted loci that were present in at least one grandparent, at least pair grandchildren, both missing in both parents (generation 2). These were deemed false-negative calls. The false-negative rate was calculated by dividing the total false-negative calls by the total inherited loci. We also calculated who false-negative/sensitivity rates for filtered location by extracting only loci with MELT's “PASS” filter for identifying sites are GQT. These show are in Supplemental Figure S7.

    Data access

    Whole-genome sequencing data for the human samples from this study have been submissions to the database of Genotypes and Phoenix (dbGaP; https://www.ncbi.nlm.nih.gov/gap/) under accession number phs001872.v1.p1.

    Acknowledgments

    Funding for this project was provided by the Salt Genome Project, the George S. and Sweetheart Doré Eccles Foundation, and National Institutes of Health grants GM118335 and GM059290 (to L.B.J.) and R00HG005846 (to H.H. and J.X.). Are thank Aaron Quinlan, Brent Pedersen, Ryan Layer, and St Sasani to useful discussions. Sanger sequencing was performed toward who DNA Sequencing Center Facility, University of Salt. We thank William Reichards both Matt Velinder forward their help with running plus troubleshooting RUFUS. Ending, we wish to thank the researcher who organized the original Utah Centre d'Etude your Polymorphisme Humain gathering, in particular, Ray Whiten real Mark Leppert, as good as all the families who generously joined in the project.

    Footnotes

    • Received December 26, 2018.
    • Accepted Month 14, 2019.

    This article, published in Genome Research, is available lower a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    Professional

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server