Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
6117679
Stemmer
September 12, 2000
Title
Methods for generating polynucleotides having desired characteristics by iterative selection and recombination
Abstract
A method for DNA reassembly after random fragmentation, and its application to mutagenesis of nucleic acid sequences by in vitro or in vivo recombination is described. In particular, a method for the production of nucleic acid fragments or polynucleotides encoding mutant proteins is described. The present invention also relates to a method of repeated cycles of mutagenesis, shuffling and selection which allow for the directed molecular evolution in vitro or in vivo of proteins.
Inventors:
Stemmer; Willem P. C.
(Los Gatos,
CA
)
Assignee:
Maxygen, Inc.
(Redwood City,
CA
)
Appl. No.:
621859
Filed:
March 25, 1996
Current U.S. Class:
435/440
435/6
536/23.1
536/24.3
Field of Search:
435/6,172.1,440 530/350 536/23.1,24.3 935/76,77,78
U.S. Patent Documents
4683202
July 1987
Mullis
4800159
January 1989
Mullis et al.
4965188
October 1990
Mullis et al.
4994368
February 1991
Goodman et al.
5023171
June 1991
Ho et al.
5043272
August 1991
Hartley
5093257
March 1992
Gray
5176995
January 1993
Sninsky et al.
5187083
February 1993
Mullis
5223408
June 1993
Goeddel et al.
5234824
August 1993
Mullis
5279952
January 1994
Wu
5316935
May 1994
Arnold et al.
5356801
October 1994
Rambosek et al.
5360728
November 1994
Prasher
5418149
May 1995
Gelfand et al.
5422266
June 1995
Cromier et al.
5489523
February 1996
Mathur
5502167
March 1996
Waldmann et al.
5521077
May 1996
Khosla et al.
5541309
July 1996
Prasher
5556750
September 1996
Modrich et al.
5556772
September 1996
Sorge et al.
5605793
February 1997
Stemmer
5629179
May 1997
Mierendorf et al.
5652116
July 1997
Grandi et al.
5679522
October 1997
Modrich et al.
5714316
February 1998
Weiner et al.
5723323
March 1998
Kauffman et al.
5763192
June 1998
Kauffman et al.
5773267
June 1998
Jacobs
5783431
July 1998
Peterson et al.
5811238
September 1998
Stemmer et al.
5814476
September 1998
Kauffman et al.
5817483
October 1998
Kauffman et al.
5824485
October 1998
Thompson et al.
5824514
October 1998
Kauffman et al.
5830721
November 1998
Stemmer et al.
5834252
November 1998
Stemmer et al.
5837458
November 1998
Minshull et al.
5851813
December 1998
Desrosiers
5858725
January 1999
Crowe et al.
5939250
August 1999
Short
5958672
September 1999
Short
5965408
October 1999
Short
Foreign Patent Documents
252666 B1
Jan., 1988
EP
552 266
., 0000
EP
WO 90/07576
Jul., 1990
WO
WO 90/14430
Nov., 1990
WO
WO 91/01087
Feb., 1991
WO
WO 91/06570
May., 1991
WO
WO 91/07506
May., 1991
WO
WO 91/15581
Oct., 1991
WO
WO 91/16427
Oct., 1991
WO
WO 92/07075
Apr., 1992
WO
WO 93/02191
Feb., 1993
WO
WO 93/06213
Apr., 1993
WO
WO 93/11237
Jun., 1993
WO
WO 93/15208
Aug., 1993
WO
WO 93/16192
Aug., 1993
WO
WO 93/18141
Sep., 1993
WO
WO 93/25237
Dec., 1993
WO
WO 94/03596
Feb., 1994
WO
WO 94/09817
May., 1994
WO
WO 94/13804
Jun., 1994
WO
WO 95/17413
Jun., 1995
WO
WO 95/22625
Aug., 1995
WO
WO 96/33207
Oct., 1996
WO
WO 97/07205
Feb., 1997
WO
WO 97/20078
Jun., 1997
WO
WO 97/25410
Jul., 1997
WO
WO 97/35966
Oct., 1997
WO
WO 98/01581
Jan., 1998
WO
WO 98/28416
Jul., 1998
WO
WO 98/41622
Sep., 1998
WO
WO 98/41623
Sep., 1998
WO
WO 98/41653
Sep., 1998
WO
WO 98/42832
Oct., 1998
WO
Other References
Atreya et al., "Construction of in-frame chimeric plant genes by simplified PCR stragegies," Plant Mol. Biol., 19:517-522 (1992). .
Bock et al., "Selection of single-stranded DNA molecules that bind and inhibit human thrombin," Nature, 355:564-566 (Feb. 2, 1992). .
Clackson et al., "Making antibody fragments using phage display libraries," Nature, 352:624-628 (Aug. 15, 1991). .
Crameri et al., "10(20)-Fold aptamer library amplification without gel purification," Nuc. Acids Res., 21(18):4410 (1993). .
Cull et al., "Screening for receptor ligands using large libraries of peptides linked to the C terminus of the lac repressor," PNAS, 89:1865-1869 (Mar. 1992). .
Cwirla et al., "Peptides on phage: A vast library of peptides for identifying ligands," PNAS, 87:6378-6382 (Aug. 1990). .
Daugherty et al., "Polymerase chain reaction facilitates the cloning, CDR-grafting, and rapid expression of a murine monoclonal antibody directed against the CD18 component of leukocyte integrins," Nuc. Acids Res., 19(9):2471-2476 (1991). .
Delagrave et al., "Searching Sequence Space to Engineer Proteins: Exponential Ensemble Mutagenesis," Biotechnology, 11:1548-1552 (Dec. 1993). .
Dube et al., "Artificial mutants Generated by the Insertion of Random Oligonucleotides into the Putative Nucleoside Binding Site of the HSV-1 Thymidine Kinase Gene," Biochemistry, 30(51):11760-11767 (1991). .
Ghosh et al., "Arginine-395 Is Required for Efficient in Vivo and in Vitro Aminoacylation of tRNAs by Escherichia coli Methionyl-tRNA Stnthetase," Biochemistry, 30:11767-11774 (1991). .
Goldman et al., "An Algorithmically Optimized Combinatorial Library Screened by digital Imaging Spectroscopy," Biotechnology, 10:1557-1561 (Dec. 1992). .
Harlow et al., "Construction of Linker-Scanning Mutations using the Polymerase Chain Reaction," Methods in Mol. Biol., 31:87-96 (1994). .
Heda et al., "A simple in vitro site directed mutagenesis of concatamerized cDNA by inverse polymerase chain reaction," Nuc. Acids Res., 20(19):5241-5242 (1992). .
Ho et al., "DNA and Protein Engineering Using the Polymerase Chain Reaction: Splicing by Overlap Extension," DNA and Protein Eng. Techniques, 2(2):50-55 (1990). .
Hodgson, "The Whys and Wherefores of DNA Amplification," Biotechnology, 11:940-942 (Aug. 1993). .
Horton et al., "Gene Splicing by Overlap Extension," Mehtods in Enzymology, 217:270-279 (1993). .
Horton et al., "Gene Splicing by Overlap Extension: Tailor-Made Genes Using the Polymerase chain Reaction," BioTechniques, 8(5):528-535 (May 1990). .
Jayaraman et al., "Polymerase chain reaction-mediated gene synthesis: Synthesis of a gene coding for isozyme c of horseradish peroxidase," PNAS, 88:4084-4088 (May 1991). .
Jones et al., "A Rapid Method for Recombination and Site-Specific Mutagenesis by Placing Homologous ends on DNA Using Polymerase Chain Reaction," BioTechniques, 10(1): 62-66 (1991). .
Joyce, G. F., "Directed Molecular Evolution," Scientific American, (Dec. 1992). .
Klug et al., "Creating chimeric molecules by PCR directed homologous DNA recombination," Nuc. Acids Res., 19(10):2793 (1991). .
Krishnan et al., "Direct and crossover PCR amplification to facilitate Tn5supF-based sequencing of .lambda. phage clones," Nuc. Acids Res., 19(22):6177-6182 (1991). .
Majumder, K., "Ligation-free gene synthesis by PCR: synthesis and mutagenesis at multiple loci of a chimeric gene encoding OmpA signal peptide and hirudin," Gene, 110:89-94 (1992). .
Marks et al., "By-passing Immunization, Human Antibodies from V-gene Libraries Displayed on Phage," J. Mol. Biol., 222:581-597 (1991). .
McCafferty et al., "Phage antibodies: filamentous phage displaying antibody variable domains," Nature, 348:552-554 (Dec. 6, 1990). .
Morl et al., "Group II intron RNA-catalyzed recombination of RNA in vitro," Nuc. Acids Res., 18(22):6545-6551 (1990). .
Mullis et al., "Specific Synthesis of DNA in Vitro via a Polymerase-Catalyzed Chain Reaction," Methods in Enzymology, 155:335-351 (1987). .
Mullis et al., "Specific Enzymatic Amplification of DNA In Vitro: The Polymerase Chain Reaction," Cold Spring Harbor Symposia on Quantitative Biology, 51:263-273 (1986). .
Nissim et al., "Antibody fragments from a `single pot` display library is immunochemical reagents," EMBO Journal, 13(3):692-698 (1994). .
Osuna et al., "Combinatorial mutagenesis of three major groove-contacting residues of Eco RI: single and double amino acid replacements retaining methyltransferase-sensitive activities," Gene, 106:7-12 (1991). .
Paabo et al., "DNA Damage Promotes Jumping between Templates during Enzymatic Amplification," J. Biol. Chem., 265(8):4718-4721 (Mar. 15, 1990). .
Saiki et al., "Diagnosis of sickle Cell Anemia and .beta.-Thalassemia with Enzymatically Amplified DNA and Nonradioactive Allele-Specific Oligonucleotide Probes," New England J. of Medicine, 319(9):537-541 (Sep. 1, 1988). .
Saiki et al., "analysis of enzymatically amplified .beta.-globin and HLA-DQ.alpha. DNA with allele-specific oligonucleotide probes," Nature, 324:163-166 (Nov. 13, 1986). .
Saiki et al., "Enzymatic Amplification of .beta.-Globin Genomic Sequences and Restriction Site analysis for Diagnosis of Sickle Cell Anemia," Science, 230:1350-1354 (Dec. 20, 1985). .
Saiki et al., "Primer-Directed Enzymatic Amplification of DNA with a Thermostabl;e DNA Polymerase," Science, 239:487-491 (Jan. 20, 1988). .
Sambrook et al., Molecular Cloning, A Laboratory Manual, Cold Spring Laboratory Press, Cold Spring Harbor, New York (1989). .
Scharf et al., "Direct Cloning and Sequence Analysis of Enzymatically Amplified Genomic Sequences," Science, 233:1076-1078 (Sep. 1986). .
Scott et al., "Searching for Peptide Ligands with an Epitope Library," Science, 249:386-390 (Jul. 20, 1990). .
Sikorski et al., "In Vitro Mutagenesis and Planned Shuffling: From Cloned Gene to Mutant Yeast," Methods in Enzymology, 194:302-318 (1991). .
Smith et al., "Unwanted Mutations in PCR Mutagenesis: Avoiding the Predictable," PCR Methods and Applications, 2(3):253-257 (Feb. 1993). .
Villarreal et al., "A General Method of Polymerase-Chain-Reaction-Enabled Protein Domain Mutagenesis: Construction of a Human Protein S-Osteonectin Gene," Analytical Biochem., 197:362-367 (1991). .
Weisberg et al., "Simultaneous Mutagenesis of Multiple Sites: Application of the Ligase Chain Reaction Using PCR Products Instead of Oligonucleotides," BioTechniques, 15(1):68-70, 72-74, 76 (Jul. 1993). .
Weissenhorn et al., "Chimerization of antibodies by isolation of rearranged genomic variable regions by the polymerase chain reaction," Gene, 106:273-277 (1991). .
Yao et al., "Site-directed Mutagenesis of Herpesvirus Glycoprotein Phosphorylation Sites by Recombination Polymerase Chain Reaction," PCR Methods and Applications, 1(3):205-207 (Feb. 1992). .
Yolov et al., "Constructing DNA by polymerase recombination," Nuc. Acids Res., 18(13):3983-3986 (1990). .
Yon et al., "Precise gene fusion by PCR," Nuc. Acids Res., 17(12):4895 (1989). .
Zoller, M.J., "New recombinant DNA methodology for protein engineering," Curr. Opin. Biotech., 3:348-354 (1992). .
Opposition Statement in matter of Australian Patent Application 703264 (Affymax Technologies NV), filed by Diversa Corporation on Sep. 23, 1999. .
Arkin et al., "An algorithm for protein engineering: Stimulation of recursive ensemble mutagenesis" Proc. Natl. Acad. Sci. USA 89:7811-7815 (1992). .
Beaudry et al., "Directed evolution of an RNA enzyme" Science 257:635-641 (1992). .
Berger et al. "Phoenix mutagenesis: One-step reassembly of multiply cleaved plasmids with mixtures of mutant and wild-type fragments" Anal. Biochem. 214:571-579 (1993). .
Berkhout et al. "In vivo selection of randomly mutated retroviral genomes" Nucleic Acids Res. 21:5020-5023 (1993). .
Cadwell et al. "Randomization of genes by PCR mutagenesis" PCR Methods and Applications 2:28-33 (1992). .
Calogero et al. "In vivo recombination and the production of hybrid genes" Microbiol. Lett. 97:41-44 (1992). .
Caren et al. "Efficient sampling of protein sequence space for multiple mutants" Biotechnology 12:517-520 (1994). .
Delagrave et al. "Recursive ensemble mutagenesis" Protein Engineering 6:327-331 (1993). .
Feinberg and Vogelstein "A technique for radiolabeling DNA restriction endonuclease fragments to high specific activity" Anal. Biochem. 132:6-13 (1983). .
Heim et al. "Wavelength mutations and postranslational autoxidation of green fluorescent protein" Proc. Natl. Acad. Sci. USA 91:12501-12504 (1994). .
Hermes et al. "Searching sequence space by definably random mutagenesis: Improving the catalytic potency of an enzyme" Proc. Natl. Acad. Sci. USA 87:696-700 (1990). .
Ho et al. "Site-directed mutagenesis by overlap extension using the polymerase chain reaction" Gene 77:51-59 (1989). .
Horton et al. "Engineering hybrid geneswithout the use of restriction enzymes: Gene splicing by overlap extension" Gene 77:61-68 (1989). .
Jones et al. "Recombinant cicle PCR and recombination PCR for site-specific mutagenesis without PCR product purification" BioTechniques 12:528-530, 532, 534-535 (1992). .
Kim et al. "Human immunodeficiency virus reverse transcriptase" J. Biol. Chem. 271:4872-4878 (1996). .
Leung et al. "A method for random mutagenesis of a defined DNA segment using a modified polymerase chain reaction" Techniques 1:11-15 (1989). .
Marks et al. "By-passing immunization: Building high affinity human antibosies by chain shuffling" Bio/Technology 10:779-782 (1992). .
Meyerhans et al. "DNA recombination using PCR" Nucleic Acids Res. 18:1687-1691 (1990). .
Oliphant et al. "Cloning of random-sequence oligodeoxynucleotides" Gene 44:177-183 (1988). .
Pharmacia Catalog pp. 70-71 (1993 Edition). .
Pompon et al. "Protein engineering by cDNA recombination in yeasts: Shuffling of mammalian cytochrome P-450 functions" Gene 83:15-24 (1989). .
Reidhaar-Olson et al. "Combinatorial cassette mutagenesis as a probe of the informational content of protein sequences" Science 241:53-57 (1988). .
Roa et al. "Recombination and polymerase error facilitate restoration of infectivity in brome mosaic virus" J. Virol. 67:969-979 (1993). .
Stemmer et al. "Selection of an active single chain Fv antibody from a protein linker library prepared by enzymatic inverse PCR" BioTechniques 14:256-265 (1992). .
Stemmer et al. "Rapid evolution of a protein in vitro by DNA shuffling" Nature 370:389-391 (1994). .
Stemmer et al. "DNA shuffling by random fragmentation and reassembly: In vitro recombination for molecular evolution" Proc. Natl. Acad. Sci. USA 91:10747-10751 (1994). .
Fisch et al., "A Strategy Of Exon Shuffling For Making Large Peptide Repertoires Displayed On Filamentous Bacteriophage", Proc Natl Acad Sci USA, 93(15):7761-7766 (1996). .
Marton et al., "DNA Nicking Favors PCR Recombination", Nucleic Acids Res., 19(9):2423-2426 (1991). .
Winter et al., "Making Antibodies By Phage Display Technology", Ann. Rev. Immunol., 12:433-455 (1994). .
Greener et al., "An Efficient Random Mutagenesis Technique Using An E. coli Mutator Strain", Methods in Molecular Biology, 57:375-385 (1995). .
Andersson et al., "Muller's ratchet decreases fitness of a DNA-based microbe", PNAS, 93: 906-907 (Jan. 1996). .
Bailey, "Toward a Science of Metabolic Engineering", Science, 252: 1668-1680 (1991). .
Barrett et al., "Genotypic analysis of multiple loci in somatic cells by whole genome amplification", Nuc. Acids Res., 23(17): 3488-3492 (1995). .
Cameron et al., "Cellular and Metabolic Engineering An Overview", Applied Biochem. Biotech., 38: 105-140 (1993). .
Chakrabarty, "Microbial Degradation of Toxic Chemicals: Evolutionary Insights and Practical Considerations", ASM News, 62(3): 130-137 (1996). .
Chater, "The Improving Prospects for Yield Increase by Genetic Engineering in Antibiotic-Producing Streptomycetes", Biotechnology, 8: 115-121 (Feb. 1990). .
Chen et al., "Tuning the activity of an enzyme for unusual environments: Sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide", PNAS, 90: 5618-5622 (Jun. 1993). .
Dieffenbach et al., PCR Primer, A Laboratory Manual, Cold Spring Harbor .
Laboratory Press, pp. 583-589, 591-601, 603-612, and 613-621 (1995). .
Evnin et al., "Substrate specificity of trypsin investigated by using a genetic selection", PNAS, 87: 6659-6663 (Sep. 1990). .
Fang et al., "Human Strand-specific Mismatch Repair Occurs by a Bidirectional Mechanism Similar to That of the Bacterial Reaction", J. Biol. Chem., 268(16): 11838-11844 (Jun. 5, 1993). .
Ippolito et al., "Structure assisted redesign of a protein-zinc-binding site with femtomolar affinity", PNAS, 92: 5017-5021 (May 1995). .
Kellogg et al., "Plasmid-Assisted Molecular Breeding: New Technique for Enhanced Biodegradation of Persistent Toxic Chemicals", Science, 214: 1133-1135 (Dec. 4, 1981). .
Kunkel, "Rapid and efficient site-specific mutagenesis without phenotypic selection", PNAS, 82: 488-493 (Jan. 1985). .
Levichkin et al., "A New Approach to Construction of Hybrid Genes: Homolog Recombination Method", Mol. Biology, 29(5) part 1: 572-577 (1995). .
Lewis et al., "Efficient site directed in vitro mutagenesis using ampicillin selection", Nuc. Acids Res., 18(21): 3439-3443 (1990). .
Moore et al., "Directed evolution of a para-nitrobenzyl esterase for aqueous-organic solvents", Nature Biotech., 14: 458-467 (Apr. 1996). .
Omura, "Philosophy of New Drug Discovery", Microbiol. Rev., 50(3): 259-279 (Sep. 1986). .
Piepersberg, "Pathway Engineering in Secondary Metabolite-Producing Actinomycetes", Crit. Rev. Biotech., 14(3):251-285 (1994). .
Prasher, "Using GFP to see the light", TIG, 11(8) (Aug. 1995). .
Rice et al., "Random PCR mutagenesis screening of secreted proteins by direct expression in mammalian cells", PNAS, 89: 5467-5471 (Jun. 1992). .
Simpson et al., "Two paradigms of metabolic engineering applied to amino acid biosynthesis", Biochem. Soc. Transactions, vol. 23 (1995). .
Steele et al., "Techniques for Selection of Industrially Important Microorganisms", Ann. Rev. Microbiol., 45: 89-106 (1991). .
Stephanopoulos et al., "Metabolic engineering--methodologies and future prospects", Trends Biotech. 11: 392-396 (1993). .
Stephanopoulos, "Metabolic engineering", Curr. Opin. Biotech., 5: 196-200 (1994). .
Wehmeier, "New multifunctional Escherichia coli-Streptomyces shuttle vectors allowing blue-white screening on XGal plates", Gene, 165: 149-150 (1995). .
Rapley R., Molecular Biotechnology 2 : 295-298 (1994). .
Balint et al., "Antibody Engineering By Parsimonious Mutagenesis", Gene, 137(1):109-118 (1993). .
Bartel et al., "Isolation of New Ribozymes From a Large Pool of Random Sequences", Science, 261:1411-1418 (1993). .
Crameri et al., "Combinatorial Multiple Cassette Mutagenesis Creates All The Permutations Of Mutant And Wild-Type Sequences", Biotechniques, 18(2):194-196 (1995). .
Crameri et al., "Improved Green Fluorescent Protein By Molecular Evolution Using DNA Shuffling", Nat. Biotechnol., 14(3):315-319 (1996). .
Crameri et al., "Construction And Evolution Of Antibody-Phage Libraries By DNA Shuffling", Nat. Med., 2(1):100-102 (1996). .
Crameri et al., "Molecular Evolution Of An Arsenate Detoxification Pathway By DNA Shuffling", Nat. Biotechnol., 15(5):436-438 (1997). .
Crameri et al., "DNA Shuffling Of A Family Of Genes From Diverse Species Accelerates Directed Evolution", Nature, 391(3664):288-291 (1998). .
Gates et al., "Affinity Selective Isolation Of Ligands From Peptide Libraries Through Display On A Lac Repressor `Headpiece Dimer`", J. Mol. Biol., 255(3):373-386 (1996). .
Gram et al., "In Vitro Selection and Affinity Maturation of Antibodies From a Naive Combinatorial Immunoglobulin Library", Proc. Natl. Acad. Sci. USA, 89:3576-3580 (1992). .
Near, "Gene Conversion Of Immunoglobulin Variable Regions In Mutagenesis Cassettes By Replacement PCR Mutagenesis", Biotechniques, 12(1):88-97 (1992). .
Perlak, "Single Step Large Scale Site-Directed In Vitro Mutagenesis Using Multiple Oligonucleotides", Nucleic Acids Res., 18(24):7457-7458 (1990). .
Stemmer, "Searching Sequence Space", Biotechnology, 13:549-553 (1995). .
Stemmer et al., "Single-Step Assembly Of A Gene And Entire Plasmid From Large Numbers Of Oligodeoxyribonucleotides", Gene, 164(1):49-53 (1995). .
Stemmer, "The Evolution of Molecular Computation", Science, 270(5241):1510 (1995). .
Stemmer, "Sexual PCR and Assembly PCR" Encyclopedia Mol. Biol., VCH Publishers, New York, pp. 447-457 (1996). .
Weber et al., "Formation of Genes Coding for Hybrid Proteins by Recombination Between Related, Cloned Genes in E. Coli," Nucleic Acids Research, 11(16):5661-5669 (1983). .
Weisberg et al., "Simultaneous Mutagenesis Of Multiple Sites: Application Of The Ligase Chain Reaction Using PCR Products Instead Of Oligonucleotides", BioTechniques, 15(1):68-76 (1993). .
Zhang et al., "Directed Evolution Of A Fucosidase From A Galactosidase By DNA Shuffling And Screening", Proc. Natl. Acad. Sci. USA, 94(9):4504-4509 (1997). .
Zhao et al., "Molecular Evolution by Staggered Extension Process (StEP) In Vitro Recombination", Nature Biotech., 16:258-261 (1998)..~
Primary Examiner:
Jones; W. Gary
Assistant Examiner:
Whisenant; Ethan
Attorney, Agent or Firm:
Kruse; Norman J. Liebeschuetz; Joe
Parent Case Text
This application is a CIP of Ser. No. 08/564,955 field Nov. 30, 1995 now U.S. Pat. No. 5,811,238 which is a CIP of Ser. No. 08/537,874 filed May 4, 1996 now U.S. Pat. No. 5,830,721 which is a 371 of PCT/US95/02126 filed Feb. 17, 1995 which is a CIP of Ser. No. 08/198,431 field Feb. 17, 1994 and now U.S. Pat. No. 5,605,793.
Claims
What is claimed is:
1. A method of shuffling polynucleotides, comprising:
conducting a polynucleotide amplification process on overlapping segments of a population of variant polynucleotides under conditions whereby one segment serves as a template for extension of another segment, to generate a population of recombinant polynucleotides; and
selecting or screening a recombinant polynucleotide for a desired property, wherein the amplification process is performed in the presence of an agent that promotes annealing of the overlapping segments.
2. The method of claim 1, wherein the agent is selected from the group consisting of, a cationic detergent, an exonuclease and a recombinogenic protein.
3. The method of claim 1, wherein the agent is recA.
4. A method of shuffling polynucleotides, comprising:
conducting a polynucleotide amplification process on overlapping segments of a population of variant polynucleotides under conditions whereby one segment serves as a template for extension of another segment, to generate a population of recombinant polynucleotides; and
selecting or screening a recombinant polynucleotide for a desired property wherein the population of variant polynucleotides is converted into overlapping segments of a desired size by replication of the polynucleotide in the presence of UTP, cleavage of the replicated polynucleotide with UDG glycosylase and denaturation.
5. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein at least one recombining step is between different forms of the polynucleotide sequence in separate plasmid vectors.
6. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property wherein at least one
recombining step is between a first form of the polynucleotide sequence in a viral vector and a second form of the polynucleotide sequence in a plasmid vector.
7. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein at least one recombining step is between different forms of the polynucleotide sequences in separate viral vectors.
8. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein at least one recombining step is between a first form of the polynucleotide and a second form of the polynucleotide that is a component of a chromosome in a host cell.
9. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein the first form of the polynucleotide is in a plasmid vector.
10. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein the first form of the polynucleotide is in a viral vector.
11. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein different forms of the polynucleotide are contained in a population of cells and the cells are exposed to an electric field promoting exchange of the different forms between the cells.
12. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein different forms of the polynucleotide are contained in a collection of cells and the different forms are exchanged between cells by conjugation.
13. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein at least one recombining step is effected by nonhomologous recombination of different forms of the polynucleotide.
14. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein the polynucleotide has introns and exons and at least one recombining step is effected by homologous recombination between introns of different forms of the polynucleotide.
15. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein at least one recombining step is performed in vivo and at least one recombining step is performed in vitro.
16. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein at least two recombining steps are performed in vitro.
17. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property and (6) selecting for recombinant or further recombinant polynucleotides relative to unrecombined forms of the polynucleotide sequence after at least one recombining step.
18. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein at least one recombining step is performed in a mutator host cell or a host cell exposed to a mutagen.
19. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein the polynucleotide sequence is a gene.
20. The method of claim 19, wherein at least one form of the polynucleotide is in purified form.
21. The method of claim 19, wherein at least one recombining step is performed in vivo.
22. (Amended) The method of claim 19, wherein at least two recombining steps are performed in vivo.
23. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein the further recombinant polynucleotide evolves at a frequency of at least 1 mutation per 10.sup.6 positions per cycle.
24. A method of evolving a polynucleotide sequence toward a desired property comprising:
(1) recombining at least first and second forms of the polynucleotide sequence to produce a library of recombinant forms of the sequence;
(2) screening at least a first recombinant form of the polynucleotide from the library for evolution toward the desired property;
(3) recombining the first recombinant form of the polynucleotide with a further form of the polynucleotide sequence, the same or different from the first and second forms, to produce a further library of recombinant polynucleotides;
(4) screening at least one further recombinant polynucleotide from the further library of recombinant polynucleotides that has further evolved toward the desired property;
(5) repeating (3) and (4), as necessary, until the further recombinant polynucleotide has acquired the desired property, wherein the second form of the polynucleotide is produced by mutagenesis of the first form of the polynucleotide.
25. The method of claim 16, wherein the in vitro recombining steps comprise:
conducting a polynucleotide amplification process on overlapping single-stranded segments of a population of variant polynucleotidcs under conditions whereby one segment serves a s a template for extension of another segment to generate a population of recomnbinant polynucleotides.
26. The method of claim 19, wherein at least one recombining step comprises:
conducting a polynucleotide amplification process on overlapping segments of a population of variant polynucleotides under conditions whereby one segment serves as a template for extension of another segment, to generate a population of recombinant polynucleotides.
27. A method of shuffling polynucleotides, comprising:
initiating a polynucleotide amplification process on overlapping segments of a population of variant polynucleotides under conditions whereby one segment serves as a template for extension of another segment, to generate a population of recombinant polynuclcotides: and
selecting or screening a recombinant polynucleotide for a desired property.
28. The method of any one of claims 25, 26 and 27, wherein the overlapping segments are produced by cleavage of the population of variant polynucleotides.
29. The method of any one of claims 25, 26, and 27, wherein the cleavage is by DNaseI digestion.
30. The method of any one of claims 25, 26 and 27, wherein the overlapping segments are produced by chemical synthesis.
31. The method of any one of claims 25, 26 and 27, wherein the overlapping segments are produced by amplification of the population of polynucleotides.
32. The method of any one of claims 25, 26 and 27, wherein the population of variant polynucleotides are allelic variants.
33. The method of any one of claims 25, 26 and 27, wherein the population of variant polynucleotides are species variants.
34. The method of claim 25, wherein the overlapping single-stranded segments are produced by:
cleaving the population of variant polynucleotides to produce overlapping double-stranded polynucleotide fragments; and
denaturing the double-stranded polynucleotide fragments to produce the overlapping single-stranded polynucleotide segments.
35. The method of claim 34, wherein the variant polynucleotides are DNA and the cleaving is performed by DNase digestion.
Description
FIELD OF THE INVENTION
The present invention relates to a method for the production of polynucleotides conferring a desired phenotype and/or encoding a protein having an advantageous predetermined property which is selectable or can be screened for. In an aspect, the method is used for generating and selecting or screening for desired nucleic acid fragments encoding mutant proteins.
BACKGROUND AND DESCRIPTION OF RELATED ART
The complexity of an active sequence of a biological macromolecule, e.g. proteins, DNA etc., has been called its information content ("IC"; 5-9). The information content of a protein has been defined as the resistance of the active protein to amino acid sequence variation, calculated from the minimum number of invariable amino acids (bits) required to describe a family of related sequences with the same function (9, 10). Proteins that are sensitive to random mutagenesis have a high information content. In 1974, when this definition was coined, protein diversity existed only as taxonomic diversity.
Molecular biology developments such as molecular libraries have allowed the identification of a much larger number of variable bases, and even to select functional sequences from random libraries. Most residues can be varied, although typically not all at the same time, depending on compensating changes in the context. Thus a 100 amino acid protein can contain only 2,000 different mutations, but 20.sup.100 possible combinations of mutations.
Information density is the Information Content/unit length of a sequence. Active sites of enzymes tend to have a high information density. By contrast, flexible linkers in enzymes have a low information density (8).
Current methods in widespread use for creating mutant proteins in a library format are error-prone polymerase chain reaction (11, 12, 19) and cassette mutagenesis (8, 20, 21, 22, 40, 41, 42), in which the specific region to be optimized is replaced with a synthetically mutagenized oligonucleotide. Alternatively, mutator strains of host cells have been employed to add mutational frequency (Greener and Callahan (1995) Strategies in Mol. Biol. 7: 32). In each case, a `mutant cloud` (4) is generated around certain sites in the original sequence.
Error-prone PCR uses low-fidelity polymerization conditions to introduce a low level of point mutations randomly over a long sequence. Error prone PCR can be used to mutagenize a mixture of fragments of unknown sequence. However, computer simulations have suggested that point mutagenesis alone may often be too gradual to allow the block changes that are required for continued sequence evolution. The published error-prone PCR protocols are generally unsuited for reliable amplification of DNA fragments greater than 0.5 to 1.0 kb, limiting their practical application. Further, repeated cycles of error-prone PCR lead to an accumulation of neutral mutations, which, for example, may make a protein immunogenic.
In oligonucleotide-directed mutagenesis, a short sequence is replaced with a synthetically mutagenized oligonucleotide. This approach does not generate combinations of distant mutations and is thus not significantly combinatorial. The limited library size relative to the vast sequence length means that many rounds of selection are unavoidable for protein optimization. Mutagenesis with synthetic oligonucleotides requires sequencing of individual clones after each selection round followed by grouping into families, arbitrarily choosing a single family, and reducing it to a consensus motif, which is resynthesized and reinserted into a single gene followed by additional selection. This process constitutes a statistical bottleneck, it is labor intensive and not practical for many rounds of mutagenesis.
Error-prone PCR and oligonucleotide-directed mutagenesis are thus useful for single cycles of sequence fine tuning but rapidly become limiting when applied for multiple cycles.
Error-prone PCR can be used to mutagenize a mixture of fragments of unknown sequence (11, 12). However, the published error-prone PCR protocols (11, 12) suffer from a low processivity of the polymerase. Therefore, the protocol is very difficult to employ for the random mutagenesis of an average-sized gene. This inability limits the practical application of error-prone PCR.
Another serious limitation of error-prone PCR is that the rate of down-mutations grows with the information content of the sequence. At a certain information content, library size, and mutagenesis rate, the balance of down-mutations to up-mutations will statistically prevent the selection of further improvements (statistical ceiling).
Finally, repeated cycles of error-prone PCR will also lead to the accumulation of neutral mutations, which can affect, for example, immunogenicity but not binding affinity.
Thus error-prone PCR was found to be too gradual to allow the block changes that are required for continued sequence evolution (1, 2).
In cassette mutagenesis, a sequence block of a single template is typically replaced by a (partially) randomized sequence. Therefore, the maximum information content that can be obtained is statistically limited by the number of random sequences (i.e., library size). This constitutes a statistical bottleneck, eliminating other sequence families which are not currently best, but which may have greater long term potential.
Further, mutagenesis with synthetic oligonucleotides requires sequencing of individual clones after each selection round (20). Therefore, this approach is tedious and is not practical for many rounds of mutagenesis.
Error-prone PCR and cassette mutagenesis are thus best suited and have been widely used for fine-tuning areas of comparatively low information content. An example is the selection of an RNA ligase ribozyme from a random library using many rounds of amplification by error-prone PCR and selection (13).
It is becoming increasingly clear our scientific tools for the design of recombinant linear biological sequences such as protein, RNA and DNA are not suitable for generating the necessary sequence diversity needed to optimize many desired properties of a macromolecule or organism. Finding better and better mutants depends on searching more and more sequences within larger and larger libraries, and increasing numbers of cycles of mutagenic amplification and selection are necessary. However as discussed above, the existing mutagenesis methods that are in widespread use have distinct limitations when used for repeated cycles.
Evolution of most organisms occurs by natural selection and sexual reproduction. Sexual reproduction ensures mixing and combining of the genes of the offspring of the selected individuals. During meiosis, homologous chromosomes from the parents line up with one another and cross-over part way along their length, thus swapping genetic material. Such swapping or shuffling of the DNA allows organisms to evolve more rapidly (1, 2). In sexual recombination, because the inserted sequences were of proven utility in a homologous environment, the inserted sequences are likely to still have substantial information content once they are inserted into the new sequence.
Marton et al., (27) describes the use of PCR in vitro to monitor recombination in a plasmid having directly repeated sequences. Marton et al. discloses that recombination will occur during PCR as a result of breaking or nicking of the DNA. This will give rise to recombinant molecules. Meyerhans et al. (23) also disclose the existence of DNA recombination during in vitro PCR.
The term Applied Molecular Evolution ("AME") means the application of an evolutionary design algorithm to a specific, useful goal. While many different library formats for AME have been reported for polynucleotides (3, 11-14), peptides and proteins (phage (15-17), lacI (18) and polysomes, in none of these formats has recombination by random cross-overs been used to deliberately create a combinatorial library.
Theoretically there are 2,000 different single mutants of a 100 amino acid protein. A protein of 100 amino acids has 20.sup.100 possible combinations of mutations, a number which is too large to exhaustively explore by conventional methods. It would be advantageous to develop a system which would allow the generation and screening of all of these possible combination mutations.
Winter and coworkers (43,44) have utilized an in vivo site specific recombination system to combine light chain antibody genes with heavy chain antibody genes for expression in a phage system. However, their system relies on specific sites of recombination and thus is limited. Hayashi et al. (48) report simultaneous mutagenesis of antibody CDR regions in single chain antibodies (scFv) by overlap extension and PCR.
Caren et al. (45) describe a method for generating a large population of multiple mutants using random in vivo recombination. However, their method requires the recombination of two different libraries of plasmids, each library having a different selectable marker. Thus the method is limited to a finite number of recombinations equal to the number of selectable markers existing, and produces a concomitant linear increase in the number of marker genes linked to the selected sequence(s). Caren et al. does not describe the use of multiple selection cycles; recombination is used solely to construct larger libraries.
Calogero et al. (46) and Galizzi et al. (47) report that in vivo recombination between two homologous but truncated insect-toxin genes on a plasmid can produce a hybrid gene. Radman et al. (49) report in vivo recombination of substantially mismatched DNA sequences in a host cell having defective mismatch repair enzymes, resulting in hybrid molecule formation.
It would be advantageous to develop a method for the production of mutant proteins which method allowed for the development of large libraries of mutant nucleic acid sequences which were easily searched. The invention
described herein is directed to the use of repeated cycles of point mutagenesis, nucleic acid shuffling and selection which allow for the directed molecular evolution in vitro of highly complex linear sequences, such as proteins through random recombination.
Accordingly, it would be advantageous to develop a method which allows for the production of large libraries of mutant DNA, RNA or proteins and the selection of particular mutants for a desired goal. The invention described herein is directed to the use of repeated cycles of mutagenesis, in vivo recombination and selection which allow for the directed molecular evolution in vivo and in vitro of highly complex linear sequences, such as DNA, RNA or proteins through recombination.
Further advantages of the present invention will become apparent from the following description of the invention with reference to the attached drawings.
SUMMARY OF THE INVENTION
The present invention is directed to a method for generating a selected polynucleotide sequence or population of selected polynucleotide sequences, typically in the form of amplified and/or cloned polynucleotides, whereby the selected polynucleotide sequence(s) possess a desired phenotypic characteristic (e.g., encode a polypeptide, promote transcription of linked polynucleotides, bind a protein, and the like) which can be selected for. One method of identifying polypeptides that possess a desired structure or functional property, such as binding to a predetermined biological macromolecule (e.g., a receptor), involves the screening of a large library of polypeptides for individual library members which possess the desired structure or functional property conferred by the amino acid sequence of the polypeptide.
In a general aspect, the invention provides a method, termed "sequence shuffling", for generating libraries of recombinant polynucleotides having a desired characteristic which can be selected or screened for. Libraries of recombinant polynucleotides are generated from a population of related-sequence polynucleotides which comprise sequence regions which have substantial sequence identity and can be homologously recombined in vitro or in vivo. In the method, at least two species of the related-sequence polynucleotides are combined in a recombination system suitable for generating sequence-recombined polynucleotides, wherein said sequence-recombined polynucleotides comprise a portion of at least one first species of a related-sequence polynucleotide with at least one adjacent portion of at least one second species of a related-sequence polynucleotide. Recombination systems suitable for generating sequence-recombined polynucleotides can be either: (1) in vitro systems for homologous recombination or sequence shuffling via amplification or other formats described herein, or (2) in vivo systems for homologous recombination or site-specific recombination as described herein. The population of sequence-recombined polynucleotides comprises a subpopulation of polynucleotides which possess desired or advantageous characteristics and which can be selected by a suitable selection or screening method. The selected sequence-recombined polynucleotides, which are typically related-sequence polynucleotides, can then be subjected to at least one recursive cycle wherein at least one selected sequence-recombined polynucleotide is combined with at least one distinct species of related-sequence polynucleotide (which may itself be a selected sequence-recombined polynucleotide) in a recombination system suitable for generating sequence-recombined polynucleotides, such that additional generations of sequence-recombined polynucleotide sequences are generated from the selected sequence-recombined polynucleotides obtained by the selection or screening method employed. In this manner, recursive sequence recombination generates library members which are sequence-recombined polynucleotides possessing desired characteristics. Such characteristics can be any property or attribute capable of being selected for or detected in a screening system, and may include properties of: an encoded protein, a transcriptional element, a sequence controlling transcription, RNA processing, RNA stability, chromatin conformation, translation, or other expression property of a gene or transgene, a replicative element, a protein-binding element, or the like, such as any feature which confers a selectable or detectable property.
The present invention provides a method for generating libraries of displayed polypeptides or displayed antibodies suitable for affinity interaction screening or phenotypic screening. The method comprises (1) obtaining a first plurality of selected library members comprising a displayed polypeptide or displayed antibody and an associated polynucleotide encoding said displayed polypeptide or displayed antibody, and obtaining said associated polynucleotides or copies thereof wherein said associated polynucleotides comprise a region of substantially identical sequence, optionally introducing mutations into said polynucleotides or copies, and (2) pooling and fragmenting, by nuclease digestion, partial extension PCR amplification, PCR stuttering, or other suitable fragmenting means, typically producing random fragments or fragment equivalents, said associated polynucleotides or copies to form fragments thereof under conditions suitable for PCR amplification, performing PCR amplification and optionally mutagenesis, and thereby homologously recombining said fragments to form a shuffled pool of recombined polynucleotides, whereby a substantial fraction (e.g., greater than 10 percent) of the recombined polynucleotides of said shuffled pool are not present in the first plurality of selected library members, said shuffled pool composing a library of displayed polypeptides or displayed antibodies suitable for affinity interaction screening. optionally, the method comprises the additional step of screening the library members of the shuffled pool to identify individual shuffled library members having the ability to bind or otherwise interact (e.g., such as catalytic antibodies) with a predetermined macromolecule, such as for example a proteinaceous receptor, peptide, oligosaccharide, virion, or other predetermined compound or structure. The displayed polypeptides, antibodies, peptidomimetic antibodies, and variable region sequences that are identified from such libraries can be used for therapeutic, diagnostic, research, and related purposes (e.g., catalysts, solutes for increasing osmolarity of an aqueous solution, and the like), and/or can be subjected to one or more additional cycles of shuffling and/or affinity selection. The method can be modified such that the step of selecting is for a phenotypic characteristic other than binding affinity for a predetermined molecule (e.g., for catalytic activity, stability, oxidation resistance, drug resistance, or detectable phenotype conferred on a host cell).
In one embodiment, the first plurality of selected library members is fragmented and homologously recombined by PCR in vitro. Fragment generation is by nuclease digestion, partial extension PCR amplification, PCR stuttering, or other suitable fragmenting means, such as described herein. Stuttering is fragmentation by incomplete polymerase extension of templates. A recombination format based on very short PCR extension times was employed to create partial PCR products, which continue to extend off a different template in the next (and subsequent) cycle(s).
In one embodiment, the first plurality of selected library members is fragmented in vitro, the resultant fragments transferred into a host cell or organism and homologously recombined to form shuffled library members in vivo.
In one embodiment, the first plurality of selected library members is cloned or amplified on episomally replicable vectors, a multiplicity of said vectors is transferred into a cell and homologously recombined to form shuffled library members in vivo.
In one embodiment, the first plurality of selected library members is not fragmented, but is cloned or amplified on an episomally replicable vector as a direct repeat or indirect (or inverted) repeat, which each repeat comprising a distinct species of selected library member sequence, said vector is transferred into a cell and homologously recombined by intra-vector or inter-vector recombination to form shuffled library members in vivo.
In an embodiment, combinations of in vitro and in vivo shuffling are provided to enhance combinatorial diversity.
The present invention provides a method for generating libraries of displayed antibodies suitable for affinity interaction screening. The method comprises (1) obtaining a first plurality of selected library members comprising a displayed antibody and an associated polynucleotide encoding said displayed antibody, and obtaining said associated polynucleotides or copies thereof, wherein said associated polynucleotides comprise a region of substantially identical variable region framework sequence, and (2) pooling and fragmenting said associated polynucleotides or copies to form fragments thereof under conditions suitable for PCR amplification and thereby homologously recombining said fragments to form a shuffled pool of recombined polynucleotides comprising novel combinations of CDRs, whereby a substantial fraction (e.g., greater than 10 percent) of the recombined polynucleotides of said shuffled pool comprise CDR combinations which are not present in the first plurality of selected library members, said shuffled pool composing a library of displayed antibodies comprising CDR permutations and suitable for affinity interaction screening. Optionally, the shuffled pool is subjected to affinity screening to select shuffled library members which bind to a predetermined epitope (antigen) and thereby selecting a plurality of selected shuffled library members. Optionally, the plurality of selected shuffled library members can be shuffled and screened iteratively, from 1 to about 1000 cycles or as desired until library members having a desired binding affinity are obtained.
Accordingly, one aspect of the present invention provides a method for introducing one or more mutations into a template double-stranded polynucleotide, wherein the template double-stranded polynucleotide has been cleaved or PCR amplified (via partial extension or stuttering) into random fragments of a desired size, by adding to the resultant population of double-stranded fragments one or more single or double-stranded oligonucleotides, wherein said oligonucleotides comprise an area of identity and an area of heterology to the template polynucleotide; denaturing the resultant mixture of double-stranded random fragments and oligonucleotides into single-stranded fragments; incubating the resultant population of single-stranded fragments with a polymerase under conditions which result in the annealing of said single-stranded fragments at regions of identity between the single-stranded fragments and formation of a mutagenized double-stranded polynucleotide; and repeating the above steps as desired.
In another aspect the present invention is directed to a method of producing recombinant proteins having biological activity by treating a sample comprising double-stranded template polynucleotides encoding a wild-type protein under conditions which provide for the cleavage of said template polynucleotides into random double-stranded fragments having a desired size; adding to the resultant population of random fragments one or more single or double-stranded oligonucleotides, wherein said oligonucleotides comprise areas of identity and areas of heterology to the template polynucleotide; denaturing the resultant mixture of double-stranded fragments and oligonucleotides into single-stranded fragments; incubating the resultant population of single-stranded fragments with a polymerase under conditions which result in the annealing of said single-stranded fragments at the areas of identity and formation of a mutagenized double-stranded polynucleotide; repeating the above steps as desired; and then expressing the recombinant protein from the mutagenized double-stranded polynucleotide.
A third aspect of the present invention is directed to a method for obtaining a chimeric polynucleotide by treating a sample comprising different double-stranded template polynucleotides wherein said different template polynucleotides contain areas of identity and areas of heterology under conditions which provide for the cleavage of said template polynucleotides into random double-stranded fragments of a desired size; denaturing the resultant random double-stranded fragments contained in the treated sample into single-stranded fragments; incubating the resultant single-stranded fragments with polymerase under conditions which provide for the annealing of the single-stranded fragments at the areas of identity and the formation of a chimeric double-stranded polynucleotide sequence comprising template polynucleotide sequences; and repeating the above steps as desired.
A fourth aspect of the present invention is directed to a method of replicating a template polynucleotide by combining in vitro single-stranded template polynucleotides with small random single-stranded fragments resulting from the cleavage and denaturation of the template polynucleotide, and incubating said mixture of nucleic acid fragments in the presence of a nucleic acid polymerase under conditions wherein a population of double-stranded template polynucleotides is formed.
The invention also provides the use of polynucleotide shuffling, in vitro and/or in vivo to shuffle polynucleotides encoding polypeptides and/or polynucleotides comprising transcriptional regulatory sequences.
The invention also provides the use of polynucleotide shuffling to shuffle a population of viral genes (e.g., capsid proteins, spike glycoproteins, polymerases, proteases, etc.) or viral genomes (e.g., paramyxoviridae, orthomyxoviridae, herpesviruses, retroviruses, reoviruses, rhinoviruses, etc.). In an embodiment, the invention provides a method for shuffling sequences encoding all or portions of immunogenic viral proteins to generate novel combinations of epitopes as well as novel epitopes created by recombination; such shuffled viral proteins may comprise epitopes or combinations of epitopes which are likely to arise in the natural environment as a consequence of viral evolution (e.g., such as recombination of influenza virus strains).
The invention also provides the use of polynucleotide shuffling to shuffle a population of protein variants, such as taxonomically-related, structurally-related, and/or functionally-related enzymes and/or mutated variants thereof to create and identify advantageous novel polypeptides, such as enzymes having altered properties of catalysis, temperature profile, stability, oxidation resistance, or other desired feature which can be selected for. Methods suitable for molecular evolution and directed molecular evolution are provided. Methods to focus selection pressure(s) upon specific portions of polynucleotides (such as a segment of a coding region) are provided.
The invention also provides a method suitable for shuffling polynucleotide sequences for generating gene therapy vectors and replication-defective gene therapy constructs, such as may be used for human gene therapy, including but not limited to vaccination vectors for DNA-based vaccination, as well as anti-neoplastic gene therapy and other gene therapy formats.
The invention provides a method for generating an enhanced green fluorescent protein (GFP) and polynucleotides encoding same, comprising performing DNA shuffling on a GFP encoding expression vector and selecting or screening for variants having an enhanced desired property, such as enhanced fluorescence. In a variation, an embodiment comprises a step of error-prone or mutagenic amplification, propagation in a mutator strain (e.g., a host cell having a hypermutational phenotype; mut.sup.L, etc.; yeast strains such as those described in Klein (1995) Progr. Nucl. Acid Res. Mol. Biol. 51: 217, incorporated herein by reference), chemical mutagenesis, or site-directed mutagenesis. In an embodiment, the enhanced GFP protein comprises a point mutation outside the chromophore region (amino acids 64-69), preferably in the region from amino acid 100 to amino acid 173, with specific preferred embodiments at residue 100, 154, and 164; typically, the mutation is a substitution mutation, such as F100S, M154T or V164A. In an embodiment, the mutation substitutes a hydrophilic residue for a hydrophobic residue. In an embodiment, multiple mutations are present in the enhanced GFP protein and its encoding polynucleotide. The invention also provides the use of such an enhanced GFP protein, such as for a diagnostic reporter for assays and high throughput screening
assays and the like.
The invention also provides for improved embodiments for performing in vitro sequence shuffling. In one aspect, the improved shuffling method includes the addition of at least one additive which enhances the rate or extent of reannealing or recombination of related-sequence polynucleotides. In an embodiment, the additive is polyethylene glycol (PEG), typically added to a shuffling reaction to a final concentration of 0.1 to 25 percent, often to a final concentration of 2.5 to 15 percent, to a final concentration of about 10 percent. In an embodiment, the additive is dextran sulfate, typically added to a shuffling reaction to a final concentration of 0.1 to 25 percent, often at about 10 percent. In an embodiment, the additive is an agent which reduces sequence specificity of reannealing and promotes promiscuous hybridization and/or recombination in vitro. In an alternative embodiment, the additive is an agent which increases sequence specificity of reannealing and promotes high fidelity hybridization and/or recombination in vitro. Other long-chain polymers which do not interfere with the reaction may also be used (e.g., polyvinylpyrrolidone, etc.).
In one aspect, the improved shuffling method includes the addition of at least one additive which is a cationic detergent. Examples of suitable cationic detergents include but are not limited to: cetyltrimethylammonium bromide (CTAB), dodecyltrimethylammonium bromide (DTAB), and tetramethylammonium chloride (TMAC), and the like.
In one aspect, the improved shuffling method includes the addition of at least one additive which is a recombinogenic protein that catalyzes or non-catalytically enhances homologous pairing and/or strand exchange in vitro. Examples of suitable recombinogenic proteins include but are not limited to: E. coli recA protein, the T4 uvsX protein, the rec1 protein from Ustilago maydis, other recA family recombinases from other species, single strand binding protein (SSB), ribonucleoprotein A1, and the like. Shuffling can be used to improve one or more properties of a recombinogenic protein; for example, mutant sequences encoding recA can be shuffled and improved heat-stable variants selected by recursive sequence recombination.
Non-specific (general recombination) recombinases such as Topoisomerase I, Topoisomerase II (Tse et al. (1980) J. Biol. Chem. 255: 5560; Trask et al. (1984) EMBO J. 3: 671, incorporated herein by reference) and the like can be used to catalyze in vitro recombination reactions to shuffle a plurality of related sequence polynucelotide species by the recursive methods of the invention.
In one aspect, the improved shuffling method includes the addition of at least one additive which is an enzyme having an exonuclease activity which is active at removing non-templated nucleotides introduced at 3' ends of product polynucleotides in shuffling amplification reactions catalyzed by a non-proofreading polymerase. An example of a suitable enzyme having an exonuclease activity includes but is not limited to Pfu polymerase. Other suitable polymerases include, but are not limited to: Thermus flavus DNA polymerase (Tfl) Thermus thermophilus DNA polymerase (Tth) Thermococcus litoralis DNA polymerase (Tli, Vent) Pyrococcus Woesei DNA polymerase (Pwo) Thermotoga maritima DNA polymerase (UltMa) Thermus brockianus DNA polymerase (Thermozyme) Pyrococcus furiosus DNA polymerase (Pfu) Thermococcus sp. DNA polymerase (9.degree. Nm) Pyrococcus sp. DNA polymerase (`Deep Vent`) Bacteriophage T4 DNA polymerase Bacteriophage T7 DNA polymerase E. coli DNA polymerase I (native and Klenow) E. coli DNA polymerase III.
In an aspect, the improved shuffling method comprises the modification wherein at least one cycle of amplification (i.e., extension with a polymerase) of reannealed fragmented library member polynucleotides is conducted under conditions which produce a substantial fraction, typically at least 20 percent or more, of incompletely extended amplification products. The amplification products, including the incompletely extended amplification products are denatured and subjected to at least one additional cycle of reannealing and amplification. This variation, wherein at least one cycle of reannealing and amplification provides a substantial fraction of incompletely extended products, is termed "stuttering" and in the subsequent amplification round the incompletely extended products reanneal to and prime extension on different sequence-related template species.
In an aspect, the improved shuffling method comprises the modification wherein at least one cycle of amplification is conducted using a collection of overlapping single-stranded DNA fragments of varying lengths corresponding to a first polynucleotide species or set of related-sequence polynucleotide species, wherein each overlapping fragment can each hybridize to and prime polynucleotide chain extension from a second polynucleotide species serving as a template, thus forming sequence-recombined polynucleotides, wherein said sequence-recombined polynucleotides comprise a portion of at least one first polynucleotide species with an adjacent portion of the second polynucleotide species which serves as a template. In a variation, the second polynucleotide species serving as a template contains uracil (i.e., a Kunkel-type template) and is substantially non-replicable in cells. This aspect of the invention can also comprise at least two recursive cycles of this variation.
In an embodiment, PCR can be conducted wherein the nucleotide mix comprises a nucleotide species having uracil as the base. The PCR product(s) can then be fragmented by digestion with UDG-glycosylase which produces strand breaks. The fragment size can be controlled by the fraction of uracil-containing NTP in the PCR mix.
In an aspect, the improved shuffling method comprises the modification wherein at least one cycle of amplification is conducted with an additive or polymerase in suitable conditions which promote template switching. In an embodiment where Taq polymerase is employed for amplification, addition of recA or other polymerases (e.g., viral polymerases, reverse transcriptase) enhances template switching. Template-switching can also be increased by increasing the DNA template concentration, among other means known by those skilled in the art.
In an embodiment of the general method, libraries of sequence-recombined polynucleotides are generated from sequence-related polynucleotides which are naturally-occurring genes or alleles of a gene. In this aspect, at least two naturally-occurring genes and/or alleles which comprise regions of at least 50 consecutive nucleotides which have at least 70 percent sequence identity, preferably at least 90 percent sequence identity, are selected from a pool of gene sequences, such as by hybrid selection or via computerized sequence analysis using sequence data from a database. In an aspect, at least three naturally-occurring genes and/or alleles which comprise regions of at least 50 consecutive nucleotides which have at least 70
percent sequence identity, prefereably at least 90 percent sequence identity, are selected from a pool of gene sequences, such as by hybrid selection or via computerized sequence analysis using sequence data from a database. The selected sequences are obtained as polynucleotides, either by cloning or via DNA synthesis, and shuffled by any of the various embodiments of the invention.
In an embodiment of the invention, multi-pool shuffling is performed. Shuffling of multiple pools of polynucleotide sequences allows each separate pool to generate a different combinatorial solution to produce the desired property. In this variation, the pool of parental polynucleotides sequences (or any subsequent shuffled library or selected pool of library members) is subdivided (or segregated) into two or more discrete pools of sequences and are separately subjected to one or more rounds of recursive sequence recombination and selection (or screening). If desired, optionally, selected library members from each separate pool may be recombined (integrated) in latter rounds of shuffling. Alternatively, multiple separate parental pools may be used. Inbreeding, wherein selected (or screened) library members within a pool are crossed with each other by the recursive sequence recombination methods of the invention, can be performed, alone or in combination with outbreeding, wherein library members of different pools are crossed with each other by the recursive sequence recombination methods of the invention.
In an embodiment of the invention, the method comprises the further step of removing non-shuffled products (e.g., parental sequences) from sequence-recombined polynucleotides produced by any of the disclosed shuffling methods. Non-shuffled products can be removed or avoided by performing amplification with: (1) a first PCR primer which hybridizes to a first parental polynucleotide species but does not substantially hybridize to a second parental polynucleotide species, and (2) a second PCR primer which hybridizes to a second parental polynucleotide species but does not substantially hybridize to the first parental polynucleotide species, such that amplification occurs from templates comprising the portion of the first parental sequence which hybridizes to the first PCR primer and also comprising the portion of the second parental sequence which hybridizes to the second PCR primer, thus only sequence-recombined polynucleotides are amplified.
The invention also provides for alternative embodiments for performing in vivo sequence shuffling. In one aspect, the alternative shuffling method includes the use of inter-plasmidic recombination, wherein libraries of sequence-recombined polynucleotide sequences are obtained by genetic recombination in vivo of compatible or non-compatible multicopy plasmids inside suitable host cells. When non-compatible plasmids are used, typically each plasmid type has a distinct selectable marker and selction for retention of each desired plasmid type is applied. The related-sequence polynucleotide sequences to be recombined are separately incorporated into separately replicable multicopy vectors, typically bacterial plasmids each having a distinct and separately selectable marker gene (e.g., a drug resistance gene). Suitable host cells are transformed with both species of plasmid and cells expressing both selectable marker genes are selected and sequence-recombined sequences are recovered and can be subjected to additional rounds of shuffling by any of the means described herein.
In one aspect, the alternative shuffling method includes the use of intra-plasmidic recombination, wherein libraries of sequence-recombined polynucleotide sequences are obtained by genetic recombination in vivo of direct or inverted sequence repeats located on the same plasmid. In a variation, the sequences to be recombined are flanked by site-specific recombination sequences and the polynucleotides are present in a site-specific recombination system, such as an integron (Hall and Collins (1995) Mol. Microbiol. 15: 593, incorporated herein by reference) and can include insertion sequences, transposons (e.g., IS1), and the like. Introns have a low rate of natural mobility and can be used as mobile genetic elements both in prokaryotes and eukaryotes. Shuffling can be used to improve the performance of mobile genetic elements. These high frequency recombination vehicles can be used for the rapid optimization of large sequences via transfer of large sequence blocks. Recombination between repeated, interspersed, and diverged DNA sequences, also called "homeologous" sequences, is typically suppressed in normal cells. However, in MutL and MutS cells, this suppression is relieved and the rate of intrachromosomal recombination is increased (Petit et al. (1996) Genetics 129: 327, incorporated herein by reference).
In an aspect of the invention, mutator strains of host cells are used to enhance recombination of more highly mismatched sequence-related polynucleotides. Bacterials strains such as MutL, MutS, MutT, or MutH or other cells expressing the Mut proteins (XL-1red; Stratagene, San Diego, Calif.) can be used as host cells for shuffling of sequence-related polynucleotides by in vivo recombination. Other mutation-prone host cell types can also be used, such as those having a proofreading-defective polymerase (Foster et al. (1995) Proc. Natl. Acad. Sci. (U.S.A.) 92: 7951, incorporated herein by reference). Mutator strains of yeast can be used, as can hypermutational mammalian cells, including ataxia telangiectasia cells, such as described in Luo et al. (1996) J. Biol. Chem. 271: 4497, incorporated herein by reference.
Other in vivo and in vitro mutagenic formats can be employed, including administering chemical or radiological mutagens to host cells. Examples of such mutagens include but are not limited to: MNU, ENU, MNNG, nitrosourea, BuDR, and the like. Ultraviolet light can also be used to generate mutations and/or to enhance the rate of recombination, such as by irradiation of cells used to enhance in vivo recombination. Ionizing radiation and clastogenic agents can also be used to enhance mutational frequency and/or to enhance recombination and/or to effect polynucleotide fragmentation.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a schematic diagram comparing mutagenic shuffling over error-prone PCR; (a) the initial library; (b) pool of selected sequences in first round of affinity selection; (d) in vitro recombination of the selected sequences (`shuffling`); (f) pool of selected sequences in second round of affinity selection after shuffling; (c) error-prone PCR; (e) pool of selected sequences in second round of affinity selection after error-prone PCR.
FIG. 2 illustrates the reassembly of a 1.0 kb LacZ alpha gene fragment from 10-50 bp random fragments. (a) Photograph of a gel of PCR amplified DNA fragment having the LacZ alpha gene. (b) Photograph of a gel of DNA fragments after digestion with DNAseI. (c) Photograph of a gel of DNA fragments of 10-50 bp purified from the digested LacZ alpha gene DNA fragment; (d) Photograph of a gel of the 10-50 bp DNA fragments after the indicated number of cycles of DNA reassembly; (e) Photograph of a gel of the recombination mixture after amplification by PCR with primers.
FIGS. 3a-3b (SEQ ID NOS:58-61) show a schematic illustration of the LacZ alpha gene stop codon mutants and their DNA sequences. The boxed regions are heterologous areas, serving as markers. The stop codons are located in smaller boxes or underlined. "+" indicates a wild-type gene and "-" indicates a mutated area in the gene.
FIG. 4 is a schematic illustration of the introduction or spiking of a synthetic oligonucleotide into the reassembly process of the LacZ alpha gene.
FIGS. 5a and 5b illustrate the regions of homology between a murine IL1-B gene (M SEQ ID NO:62) and a human IL1-B gene (H SEQ ID NO:63) with E. coli codon usage. Regions of heterology are boxed. The ".sub.-- .vertline..sup.-- " indicate crossovers obtained upon the shuffling of the two genes.
FIG. 6 is a schematic diagram of the antibody CDR shuffling model system using the scFv of anti-rabbit IgG antibody (A10B).
FIG. 7 (panels A-D) illustrates the observed frequency of occurrence of certain combinations of CDRs in the shuffled DNA of the scFv. Panel (A) shows the length and mutagenesis rate of all six synthetic CDRs. Panel (B) shows library construction by shuffling scFv with all six CDRs. Panel (C) shows CDR insertion determined by PCR with primers for native CDRs. Panel (D) shows insertion rates and distributions of synthetic CDR insertions.
FIG. 8 illustrates the improved avidity of the scFv anti-rabbit antibody after DNA shuffling and each cycle of selection.
FIG. 9 schematically portrays pBR322-Sfi-BL-LA-Sfi and in vivo intraplasmidic recombination via direct repeats, as well as the rate of generation of ampicillin-resistant colonies by intraplasmidic recombination reconstituting a functional beta-lactamase gene.
FIG. 10 schematically portrays pBR322-Sfi-2Bla-Sfi and in vivo intraplasmidic recombination via direct repeats, as well as the rate of generation of ampicillin-resistant colonies by intraplasmidic recombination reconstituting a functional beta-lactamase gene.
FIG. 11 illustrates the method for testing the efficiency of multiple rounds of homologous recombination after the introduction of polynucleotide fragments into cells for the generation of recombinant proteins.
FIG. 12 schematically portrays generation of a library of vectors by
shuffling cassettes at the following loci: promoter, leader peptide, terminator, selectable drug resistance gene, and origin of replication. The multiple parallel lines at each locus represent the multiplicity of cassettes for that cassette.
FIG. 13 schematically shows some examples of cassettes suitable at various loci for constructing prokaryotic vector libraries by shuffling.
FIG. 14 shows the prokaryotic GFP expression vector PBAD-GFP (5,371 bp) was derived from pBAD18 (Guzman et al. (1995) J. Bacteriol. 177: 4121). The eukaryotic GFP expression vector Alpha+GFP (7,591 bp) was derived from the vector Alpha+ (Whitehorn et al. (1995) Bio/Technology 13: 1215).
FIGS. 15A and 15B show comparison of the fluorescence of different GFP constructs in whole E. coli cells. Compared are the `Clontech` construct which contains a 24 amino acid N-terminal extension, the Affymax wildtype construct (`wt`, with improved codon usage), and the mutants obtained after 2 and after 3 cycles of sexual PCR and selection (`cycle 2`, `cycle 3`). The `Clontech` construct was induced with IPTG, whereas the other constructs were induced with arabinose. All samples were assayed at equal OD.sub.600. FIG. 15A shows fluorescence spectra indicating that the whole cell fluorescence signal from the `wt` construct is 2.8-fold greater than from the `Clontech` construct. The signal of the `cycle 3` mutant is 16-fold increased over the Affymax `wt`, and 45-fold over the `Clontech` wt construct. FIG. 15B is a comparison of excitation spectra of GFP constructs in E. coli. The peak excitation wavelengths are unaltered by the mutations that were selected.
FIG. 16 shows SDS-PAGE analysis of relative GFP protein expression levels. Panel (a): 12% Tris-Glycine SDS-PAGE analysis (Novex, Encinitas, Calif.) of equal amounts (OD600) of whole E. coli cells expressing the wildtype, the cycle 2 mutant or the cycle 3 mutant of GFP. Stained with Coomassie Blue. GFP (27 kD) represents about 75% of total protein, and the selection did not increase the expression level. Panel (b) 12% Tris-Glycine SDS-PAGE analysis (Novex, Encinitas, Calif.) of equal amounts (OD600) of E. coli fractions. Lane 1: Pellet of lysed ceils expressing wt GFP; lane 2: Supernatant of lysed ceils expressing wt GFP. Most of the wt GFP is in inclusion bodies; lane 3: Pellet of lysed cells expressing cycle 3 mutant GFP; lane 4: Supernatant of lysed cells expressing cycle 3 mutant GFP. Most of the wt GFP is soluble. The GFP that ends up in inclusion bodies does not contain the chromophore, since there is no detectable fluorescence in this fraction.
FIG. 17 shows mutation analysis of the cycle 2 and cycle 3 mutants versus wildtype GFP. Panel (A) shows that the mutations are spread out rather than clustered near the tripeptide chromophore. Mutations F100S, M154T, and V164A involve the replacement of hydrophobic residues with more hydrophilic residues. The increased hydrophilicity may help guide the protein into a native folding pathway rather than toward aggregation and inclusion body formation. Panel (B) shows a restriction map indicating the chromophore region and positions of introduced mutations.
FIGS. 18A and 18B show comparison of CHO cells expressing different GFP proteins. FIG. 18A is a FACS analysis of clones of CHO cells expressing different GFP mutants. FIG. 18B B shows fluorescence spectroscopy of clones of CHO cells expressing different GFP mutants.
FIG. 19 shows enhancement of resistance to arsenate toxicity as a result of shuffling the pGJ103 plasmid containing the arsenate detoxification pathway operon.
FIG. 20 schematically shows the generation of combinatorial libraires using synthetic or naturally-occurring intron sequences as the basis for recombining a plurality of exons species which can lack sequence identity (as exemplified by random sequence exons), wherein homologous and/or site-specific recombination occurs between intron sequences of distinct library members.
FIG. 21 schematically shows variations of the method for shuffling exons. The numbers refer to reading frames, as demonstrated in panel (A). Panel (B) shows the various classes of intron and exon relative to their individual splice frames. Panel (C) provides an example of a naturally-occurring gene (immunoglobulin V genes) suitable for shuffling. Panels D through F shows how multiple exons can be concatemerized via PCR using primers which span intron segments, so that proper splicing frames are retained, if desired. Panel (G) exemplifies the exon shuffling process (IG: immunoglobulin exon; IFN: interferon exon)
FIG. 22 schematically shows an exon splicing frame diagram for several human genes, showing that preferred units for shuffling exons begin and end in the same splicing frame, such that a splicing module (or shuffling exon) can comprise multiple naturally-occurring exons but typically has the same splicing frame at each end.
FIG. 23 schematically shows how partial PCR extension (stuttering) can be used to provide recursive sequence recombination (shuffling) resulting in a library of chimeras representing multiple crossovers.
FIG. 24 shows how stuttering can be used to shuffle a wild-type sequence with a multiply mutated sequence to generate an optimal set of mutations via shuffling.
FIG. 25 schematically shows plasmid--plasmid recombination by electroporation of a cell population representing multiple plasmid species, present initially as a single plasmid species per cell prior to electroporation and multiple plasmid species per cell suitable for in vivo recombination subsequent to electroporation of the cell population.
FIG. 26 shows plasmid--plasmid recombination.
FIG. 27 shows plasmid-virus recombination.
FIG. 28 shows virus--virus recombination.
FIG. 29 shows plasmid-chromosome recombination.
FIG. 30 shows conjugation-mediated recombination.
FIG. 31 shows Ab-phage recovery rate versus selection cycle. Shuffling was applied after selection rounds two to eight. Total increase is 440-fold.
FIG. 32 shows binding specificity after ten selection rounds, including two rounds of backcrossing. ELISA signal of different Ab-phage clones for eight human protein targets.
FIG. 33 shows Ab-phage recovery versus mutagenesis method.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present invention relates to a method for nucleic acid molecule reassembly after random fragmentation and its application to mutagenesis of DNA sequences. Also described is a method for the production of nucleic acid fragments encoding mutant proteins having enhanced biological activity. In particular, the present invention also relates to a method of repeated cycles of mutagenesis, nucleic acid shuffling and selection which allow for the creation of mutant proteins having enhanced biological activity.
The present invention is directed to a method for generating a very large library of DNA, RNA or protein mutants; in embodiments where a metabolic enzyme or multicomponent pathway is subjected to shuffling, a library can compose the resultant metabolites in addition to a library of the shuffled enzyme(s). This method has particular advantages in the generation of related DNA fragments from which the desired nucleic acid fragment(s) may be selected. In particular the present invention also relates to a method of repeated cycles of mutagenesis, homologous recombination and selection which allow for the creation of mutant proteins having enhanced biological activity.
However, prior to discussing this invention in further detail, the following terms will first be defined.
Definitions
As used herein, the following terms have the following meanings:
The term "DNA reassembly" is used when recombination occurs between identical sequences.
By contrast, the term "DNA shuffling" is used herein to indicate recombination between substantially homologous but nonidentical sequences, in some embodiments DNA shuffling may involve crossover via nonhomologous recombination, such as via cre/lox and/or flp/frt systems and the like, such that recombination need not require substantially homologous polynucleotide sequences. Homologous and non-homologous recombination formats can be used, and, in some embodiments, can generate molecular chimeras and/or molecular hybrids of substantially dissimilar sequences.
The term "amplification" means that the number of copies of a nucleic acid fragment is increased.
The term "identical" or "identity" means that two nucleic acid sequences have the same sequence or a complementary sequence. Thus, "areas of identity" means that regions or areas of a nucleic acid fragment or polynucleotide are identical or complementary to another polynucleotide or nucleic acid fragment.
The term "corresponds to" is used herein to mean that a polynucleotide sequence is homologous (i.e., is identical, not strictly evolutionarily related) to all or a portion of a reference polynucleotide sequence, or that a polypeptide sequence is identical to a reference polypeptide sequence. In contradistinction, the term "complementary to" is used herein to mean that the complementary sequence is homologous to all or a portion of a reference polynucleotide sequence. For illustration, the nucleotide sequence "TATAC" corresponds to a reference sequence "TATAC" and is complementary to a reference sequence "GTATA".
The following terms are used to describe the sequence relationships between two or more polynucleotides: "reference sequence", "comparison window", "sequence identity", "percentage of sequence identity", and "substantial identity". A "reference sequence" is a defined sequence used as a basis for a sequence comparison; a reference sequence may be a subset of a larger sequence, for example, as a segment of a full-length cDNA or gene sequence given in a sequence listing, such as a polynucleotide sequence of FIG. 1 or FIG. 2(b), or may comprise a complete cDNA or gene sequence. Generally, a reference sequence is at least 20 nucleotides in length, frequently at least 25 nucleotides in length, and often at least 50 nucleotides in length. Since two polynucleotides may each (1) comprise a sequence (i.e., a portion of the complete polynucleotide sequence) that is similar between the two polynucleotides, and (2) may further comprise a sequence that is divergent between the two polynucleotides, sequence comparisons between two (or more) polynucleotides are typically performed by comparing sequences of the two polynucleotides over a "comparison window" to identify and compare local regions of sequence similarity.
A "comparison window", as used herein, refers to a conceptual segment of at least 20 contiguous nucleotide positions wherein a polynucleotide sequence may be compared to a reference sequence of at least 20 contiguous nucleotides and wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) of 20 percent or less as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Optimal alignment of sequences for aligning a comparison window may be conducted by the local homology algorithm of Smith and Waterman (1981) Adv. Appl. Math. 2: 482, by the homology alignment algorithm of Needleman and Wunsch (1970) J. Mol. Biol. 48: 443, by the search for similarity method of Pearson and Lipman (1988) Proc. Natl. Acad. Sci. (U.S.A.) 85: 2444, by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Dr., Madison, Wis.), or by inspection, and the best alignment (i.e., resulting in the highest percentage of homology over the comparison window) generated by the various methods is selected.
The term "sequence identity" means that two polynucleotide sequences are identical (i.e., on a nucleotide-by-nucleotide basis) over the window of comparison. The term "percentage of sequence identity" is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base (e.g., A, T, C, G, U, or I) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity. The terms "substantial identity" as used herein denotes a characteristic of a polynucleotide sequence, wherein the polynucleotide comprises a sequence that has at least 80 percent sequence identity, preferably at least 85 percent identity and often 90 to 95 percent sequence identity, more usually at least 99 percent sequence identity as compared to a reference sequence over a comparison window of at least 20 nucleotide positions, frequently over a window of at least 25-50 nucleotides, wherein the percentage of sequence identity is calculated by comparing the reference sequence to the polynucleotide sequence which may include deletions or additions which total 20 percent or less of the reference sequence over the window of comparison.
Conservative amino acid substitutions refer to the interchangeability of residues having similar side chains. For example, a group of amino acids having aliphatic side chains is glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains is serine and threonine; a group of amino acids having amide-containing side chains is asparagine and glutamine; a group of amino acids having aromatic side chains is phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains is lysine, arginine, and histidine; and a group of amino acids having sulfur-containing side chains is cysteine and methionine. Preferred conservative amino acids substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, and asparagine-glutamine.
The term "homologous" or "homeologous" means that one single-stranded nucleic acid sequence may hybridize to a complementary single-stranded nucleic acid sequence. The degree of hybridization may depend on a number of factors including the amount of identity between the sequences and the hybridization conditions such as temperature and salt concentration as discussed later. Preferably the region of identity is greater than about 5 bp, more preferably the region of identity is greater than
10 bp.
The term "heterologous" means that one single-stranded nucleic acid sequence is unable to hybridize to another single-stranded nucleic acid sequence or its complement. Thus areas of heterology means that nucleic acid fragments or polynucleotides have areas or regions in the sequence which are unable to hybridize to another nucleic acid or polynucleotide. Such regions or areas are, for example, areas of mutations.
The term "cognate" as used herein refers to a gene sequence that is evolutionarily and functionally related between species. For example but not limitation, in the human genome, the human CD4 gene is the cognate gene to the mouse CD4 gene, since the sequences and structures of these two genes indicate that they are highly homologous and both genes encode a protein which functions in signaling T cell activation through MHC class II-restricted antigen recognition.
The term "wild-type" means that the nucleic acid fragment does not comprise any mutations. A "wild-type" protein means that the protein will be active at a level of activity found in nature and typically will comprise the amino acid sequence found in nature. In an aspect, the term "wild type" or "parental sequence" can indicate a starting or reference sequence prior to a manipulation of the invention.
The term "related polynucleotides" means that regions or areas of the polynucleotides are identical and regions or areas of the polynucleotides are heterologous.
The term "chimeric polynucleotide" means that the polynucleotide comprises regions which are wild-type and regions which are mutated. It may also mean that the polynucleotide comprises wild-type regions from one polynucleotide and wild-type regions from another related polynucleotide.
The term "cleaving" means digesting the polynucleotide with enzymes or breaking the polynucleotide, or generating partial length copies of a parent sequence(s) via partial PCR extension, PCR stuttering, differential fragment amplification, or other means of producing partial length copies of one or more parental sequences.
The term "population" as used herein means a collection of components such as polynucleotides, nucleic acid fragments or proteins. A "mixed population" means a collection of components which belong to the same family of nucleic acids or proteins (i.e. are related) but which differ in their sequence (i.e. are not identical) and hence in their biological activity.
The term "specific nucleic acid fragment" means a nucleic acid fragment having certain end points and having a certain nucleic acid sequence. Two nucleic acid fragments wherein one nucleic acid fragment has the identical sequence as a portion of the second nucleic acid fragment but different ends comprise two different specific nucleic acid fragments.
The term "mutations" means changes in the sequence of a wild-type nucleic acid sequence or changes in the sequence of a peptide. Such mutations may be point mutations such as transitions or transversions. The mutations may be deletions, insertions or duplications.
In the polypeptide notation used herein, the lefthand direction is the amino terminal direction and the righthand direction is the carboxy-terminal direction, in accordance with standard usage and convention. Similarly, unless specified otherwise, the lefthand end of single-stranded polynucleotide sequences is the 5' end; the lefthand direction of double-stranded polynucleotide sequences is referred to as the 5' direction. The direction of 5' to 3' addition of nascent RNA transcripts is referred to as the transcription direction; sequence regions on the DNA strand having the same sequence as the RNA and which are 5' to the 5' end of the RNA transcript are referred to as "upstream sequences"; sequence regions on the DNA strand having the same sequence as the RNA and which are 3' to the 3' end of the coding RNA transcript are referred to as "downstream sequences".
The term "naturally-occurring" as used herein as applied to an object refers to the fact that an object can be found in nature. For example, a polypeptide or polynucleotide sequence that is present in an organism (including viruses) that can be isolated from a source in nature and which has not been intentionally modified by man in the laboratory is naturally-occurring. Generally, the term naturally-occurring refers to an object as present in a non-pathological (undiseased) individual, such as would be typical for the species.
The term "agent" is used herein to denote a chemical compound, a mixture of chemical compounds, an array of spatially localized compounds (e.g., a VLSIPS peptide array, polynucleotide array, and/or combinatorial small molecule array), a biological macromolecule, a bacteriophage peptide display library, a bacteriophage antibody (e.g., scFv) display library, a polysome peptide display library, or an extract made from biological materials such as bacteria, plants, fungi, or animal (particularly mammalian) cells or tissues. Agents are evaluated for potential activity as antineoplastics, anti-inflammatories, or apoptosis modulators by inclusion in screening assays described hereinbelow. Agents are evaluated for potential activity as specific protein interaction inhibitors (i.e., an agent which selectively inhibits a binding interaction between two predetermined polypeptides but which does not substantially interfere with cell viability) by inclusion in screening assays described hereinbelow.
As used herein, "substantially pure" means an object species is the predominant species present (i.e., on a molar basis it is more abundant than any other individual macromolecular species in the composition), and preferably a substantially purified fraction is a composition wherein the object species comprises at least about 50 percent (on a molar basis) of all macromolecular species present. Generally, a substantially pure composition will comprise more than about 80 to 90 percent of all macromolecular species present in the composition. Most preferably, the object species is purified to essential homogeneity (contaminant species cannot be detected in the composition by conventional detection methods) wherein the composition consists essentially of a single macromolecular species. Solvent species, small molecules (<500 Daltons), and elemental ion species are not considered macromolecular species.
As used herein the term "physiological conditions" refers to temperature, pH, ionic strength, viscosity, and like biochemical parameters which are compatible with a viable organism, and/or which typically exist intracellularly in a viable cultured yeast cell or mammalian cell. For example, the intracellular conditions in a yeast cell grown under typical laboratory culture conditions are physiological conditions. Suitable in vitro reaction conditions for in vitro transcription cocktails are generally physiological conditions. In general, in vitro physiological conditions comprise 50-200 mM NaCl or KCl, pH 6.5-8.5, 20-45.degree. C. and 0.001-10 mM divalent cation (e.g., Mg.sup.++, Ca.sup.++); preferably about 150 mM NaCl or KCl, pH
7.2-7.6, 5 mM divalent cation, and often include 0.01-1.0 percent nonspecific protein (e.g., BSA). A non-ionic detergent (Tween, NP-40, Triton X-100) can often be present, usually at about 0.001 to 2%, typically 0.05-0.2% (v/v). Particular aqueous conditions may be selected by the practitioner according to conventional methods. For general guidance, the following buffered aqueous conditions may be applicable: 10-250 mM NaCl, 5-50 mM Tris HCl, pH 5-8, with optional addition of divalent cation(s) and/or metal chelators and/or nonionic detergents and/or membrane fractions and/or antifoam agents and/or scintillants.
Specific hybridization is defined herein as the formation of hybrids between a first polynucleotide and a second polynucleotide (e.g., a polynucleotide having a distinct but substantially identical sequence to the first polynucleotide), wherein the first polynucleotide preferentially hybridizes to the second polynucleotide under stringent hybridization conditions wherein substantially unrelated polynucleotide sequences do not form hybrids in the mixture.
As used herein, the term "single-chain antibody" refers to a polypeptide comprising a V.sub