Bioinformatics. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Bioinformatics - Группа авторов страница 18
Michael F. Sloma, PhD is a data scientist at Xometry, Gaithersburg, MD, USA. He received his BA degree in Chemistry from Wells College. He earned his doctoral degree in Biochemistry in the laboratory of David Mathews at the University of Rochester, where his research focused on computational methods to predict RNA structure from sequence.
W. Scott Watkins, MS is a researcher and laboratory manager in the Department of Human Genetics at the University of Utah, Salt Lake City, UT, USA. He has a long-standing interest in human population genetics and evolution. His current interests include the development and application of high-throughput computational methods to mobile element biology, congenital heart disease, and personalized medicine.
David S. Wishart, PhD is a Distinguished University Professor in the Departments of Biological Sciences and Computing Science at the University of Alberta, Edmonton, Alberta, Canada. Dr. Wishart has been developing bioinformatics programs and databases since the early 1980s and has made bioinformatics an integral part of his research program for nearly four decades. His interest in bioinformatics led to the development of a number of widely used bioinformatics tools for structural biology, bacterial genomics, pharmaceutical research, and metabolomics. Some of Dr. Wishart's most widely known bioinformatics contributions include the Chemical Shift Index (CSI) for protein secondary structure identification by nuclear magnetic resonance spectroscopy, PHAST for bacterial genome annotation, the DrugBank database for drug research, and MetaboAnalyst for metabolomic data analysis. Over the course of his academic career, Dr. Wishart has published more than 400 research papers, with many being in the field of bioinformatics. In addition to his long-standing interest in bioinformatics research, Dr. Wishart has been a passionate advocate for bioinformatics education and outreach. He is one of the founding members of the Canadian Bioinformatics Workshops (CBW) – a national bioinformatics training program that has taught more than 3000 students over the past two decades. In 2002 he established Canada's first undergraduate bioinformatics degree program at the University of Alberta and has personally mentored nearly 130 undergraduate and graduate students, many of whom have gone on to establish successful careers in bioinformatics.
Tyra G. Wolfsberg, PhD is the Associate Director of the Bioinformatics and Scientific Programming Core at the National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH), Bethesda, MD, USA. Her research program focuses on developing methodologies to integrate sequence, annotation, and experimentally generated data so that bench biologists can quickly and easily obtain results for their large-scale experiments. She maintains a long-standing commitment to bioinformatics education and outreach. She has authored a chapter on genomic databases for previous editions of this textbook, as well as a chapter on the NCBI MapViewer for Current Protocols in Bioinformatics and Current Protocols in Human Genetics. She serves as the co-chair of the NIH lecture series Current Topics in Genome Analysis; these lectures are archived online and have been viewed over 1 million times to date. In addition to teaching bioinformatics courses at NHGRI, she served for 13 years as a faculty member in bioinformatics at the annual AACR Workshop on Molecular Biology in Clinical Oncology.
Michael Zuker, PhD retired as a Professor of Mathematical Sciences at Rensselaer Polytechnic Institute, Troy, NY, USA, in 2016. He was an Adjunct Professor in the RNA Institute at the University of Albany and remains affiliated with the RNA Institute. He works on the development of algorithms to predict folding, hybridization, and melting profiles in nucleic acids. His nucleic acid folding and hybridization web servers have been running at the University of Albany since 2010. His educational activities include developing and teaching his own bioinformatics course at Rensselaer and participating in both a Chautauqua short course in bioinformatics for college teachers and an intensive bioinformatics course at the University of Michigan. He currently serves on the Scientific Advisory Board of Expansion Therapeutics, Inc. at the Scripps Research Institute in Jupiter, Florida.
About the Companion Website
This book is accompanied by a companion website:
www.wiley.com/go/baxevanis/Bioinformatics_4e
The website includes:
Test Samples
Word Samples
Scan this QR code to visit the companion website.
1 Biological Sequence Databases
Andreas D. Baxevanis
Introduction
Over the past several decades, there has been a feverish push to understand, at the most elementary of levels, what constitutes the basic “book of life.” Biologists (and scientists in general) are driven to understand how the millions or billions of bases in an organism's genome contain all of the information needed for the cell to conduct the myriad metabolic processes necessary for the organism's survival – information that is propagated from generation to generation. To have a basic understanding of how the collection of individual nucleotide bases drives the engine of life, large amounts of sequence data must be collected and stored in a way that these data can be searched and analyzed easily. To this end, much effort has gone into the design and maintenance of biological sequence databases. These databases have had a significant impact on the advancement of our understanding of biology not just from a computational standpoint but also through their integrated use alongside studies being performed at the bench.
The history of sequence databases began in the early 1960s, when Margaret Dayhoff and colleagues (1965) at the National Biomedical Research Foundation (NBRF) collected all of the protein sequences known at that time – all 65 of them – and published them in a book called the Atlas of Protein Sequence and Structure. It is important to remember that, at this point in the history of biology, the focus was on sequencing proteins through traditional techniques such as the Edman degradation rather than on sequencing DNA, hence the overall small number of available sequences. By the late 1970s, when a significant number of nucleotide sequences became available, those were also included in later editions of the Atlas. As this collection evolved, it included text-based descriptions to accompany the protein sequences, as well as information regarding the evolution of many protein families. This work, in essence, was the first annotated sequence database, even though it was in printed form. Over time, the amount of data contained in the Atlas became unwieldy and the need for it to be available in electronic form became obvious. From the early 1970s to the late 1980s, the contents of the Atlas were distributed electronically by NBRF (and later by the Protein Information Resource, or PIR) on magnetic tape, and the distribution included some basic programs that could be used to search and evaluate distant evolutionary relationships.
The next phase in the history of sequence databases was precipitated by the veritable explosion in the amount of nucleotide sequence data available to researchers by the end of the 1970s. To address the need for more robust public sequence databases, the Los Alamos National Laboratory (LANL) created the Los Alamos DNA Sequence Database in 1979, which became known as GenBank in 1982 (Benson et al. 2018). Meanwhile, the European Molecular Biology Laboratory (EMBL) created the EMBL Nucleotide Sequence Data Library in 1980. Throughout the