Genomics, Transcriptomics and Proteomics: Glossary of Terms

A glossary for the terms used in this seminar.

Allele: alternative form of a gene, e.g. dominant (always expressed if present) or recessive (only expressed if no dominant allele is present).

Amplification: an increase in the number of copies of a specific DNA fragment.

Base pair (bp): two complementary nucleotide bases joined together by chemical bonds. The two strands of the DNA molecule are held together in the shape of a double helix by the bonds between base pairs. The base adenine pairs with thymine, and guanine pairs with cytosine.

Bioinformatics: the science of informatics as applied to biological research. Informatics is the management and analysis of data using advanced computing techniques. Bioinformatics is particularly important as an adjunct to genomics research, because of the large amount of complex data this research generates.

Biomarker: observable change (not necessarily pathological) in the function of an organism, related to a specific exposure or event.

Candidate Gene: A gene that has been implicated in causing or contributing to the development of a particular disease.

C.elegans: Caenorhabditis elegans, a nemotode or roundworm, the first animal to have its genome completely sequenced and all the genes fully characterised.

Chromosome: The DNA in a cell is divided into structures called chromosomes. Chromosomes are large enough to be seen under a microscope. In humans, all cells other than germ cells usually contain 46 chromosomes: 22 pairs of autosomes and either a pair of X chromosomes (in females) or an X chromosome and a Y chromosome (in males). In each pair of chromosomes, one chromosome is inherited from an individual's father and one from his or her mother.

Clone: A term which is applied to genes, cells, or entire organisms which are derived from - and are genetically identical to - a single common ancestor gene, cell, or organism, respectively. Cloning of genes and cells to create many copies in the laboratory is a common procedure essential for biomedical research. Note that several processes which are commonly described as cell 'cloning' give rise to cells which are almost but not completely genetically identical to the ancestor cell. 'Cloning' of organisms from embryonic cells occurs naturally in nature (e.g. with the occurrence of identical twins). The laboratory cloning of a sheep ('Dolly') using the genetic material from a cell of an adult animal has recently been reported.

Cloning: the process of producing a genetically identical copy (clone).

cloning vector: DNA molecule originating from a virus, a plasmid, or the cell of a higher organism into which another DNA fragment of appropriate size can be integrated without loss of the vectors capacity for self-replication; vectors introduce foreign DNA into host cells, where it can be reproduced in large quantities. Examples are plasmids, cosmids, and yeast artificial chromosomes; vectors are often recombinant molecules containing DNA sequences from several sources.

Coding regions: those parts of the DNA that contain the information needed to form proteins. Other parts of the DNA may have non-coding functions (e.g. start-stop, pointing or timer functions) or as yet unresolved functions or maybe even 'noise'.

Codon: a set of three nucleotide bases in a DNA or RNA sequence, which together code for a unique amino acid. For example, the set AUG (adenine, uracil, guanine) codes for the amino acid methionine.

Combinatorial Chemistry: A technique for rapidly and systematically assembling a variety of molecular entities, or building blocks, in many different combinations, to create tens of thousands of diverse compounds that can be tested in drug discovery screening assays to identify potential useful candidates.

Complementary DNA (cDNA): cDNA is DNA that is synthesised in the lab from mRNA by reverse transcription. A cDNA is so-called because its sequence is the complement of the original mRNA sequence.

Deletion: in the process of DNA replication, a deletion occurs if a nucleotide or series of nucleotides is not copied. Such deletions may be harmless, may result in disease, or may in rare cases be beneficial.

Deoxyribose: A type of sugar which is a component of DNA (Deoxyribonucleic Acid). DNA is a molecule formed of two strands, each of which includes deoxyribose.

DNA (Deoxyribonucleic Acid): the molecule that encodes genetic information. DNA is a double-stranded helix held together by bonds between pairs of nucleotides. See base, base pair, and double helix.

DNA probe: a piece of single-stranded DNA, typically labelled so that it can be detected (for example, a radioactive or fluorescent label can be used), which can single out and bind with (and only with) another specific piece of DNA. DNA probes can be used to determine which sequences are present in a given length of DNA or which genes are present in a sample of DNA.

DNA repair genes: genes which code for proteins which correct 'mistakes' in DNA sequences. When these genes are altered, mutations may be able to accumulate in the genome, ultimately resulting in disease. See genetic mutation, p53 and suppressor gene.

DNA replication: the process of making copies of strands of DNA. Existing DNA is used as a template for synthesising the new strands.

Electrophoresis: A method of separating large molecules (such as DNA fragments or proteins) from a mixture of similar molecules. An electric current is passed through a medium containing the mixture, and each kind of molecule travels through the medium at a different rate, depending on its electrical charge and size. Separation is based on these differences. Agarose and acrylamide gels are the media commonly used for electrophoresis of proteins and nucleic acids.

Endonuclease: An enzyme that cleaves its nucleic acid substrate at internal sites in the nucleotide sequence.

Exogenous DNA: DNA which has been introduced into an organism but which originated outside that organism (e.g. material inserted into a cell by a virus).

Exon: exons are those portions of a gene which code for proteins.

Expressed sequence tag (EST): a short strand of DNA (approximately 200 base pairs long) which is part of a cDNA. Because an EST is usually unique to a particular cDNA, and because cDNAs correspond to a particular gene in the genome, ESTs can be used to help identify unknown genes and to map their position in the genome.

Full gene sequence: the complete order of bases in a gene. This order determines which protein a gene will produce.

Gene: a length of DNA which codes for a particular protein, or in certain cases a functional or structural RNA molecule.

Gene Expression: the process by which the information in a gene is used to create proteins.

Gene Families: Groups of closely related genes that make similar products.

Gene Library: A collection of cloned DNA fragments which, taken together, represent the entire genome of a specific organism. Such libraries or 'gene banks' are assembled so as to allow the isolation and study of individual genes. Gene libraries are produced by first breaking up or 'fractionating' an entire genome. This fractionation can be accomplished either by physical methods or by use of restriction enzymes. The genome fragments are then cloned (multiplied in number) and stored for later use.

Gene Product: the protein produced by a gene.

Genetic Code: the set of codons in DNA or mRNA. Each codon is made up of three nucleotides which call for a unique amino acid. For example, the set AUG (adenine, uracil, guanine) calls for the amino acid methionine. The sequence of codons along an mRNA molecule specifies the sequence of amino acids in a particular protein.

Genetic Engineering: altering the genetic material of cells or organisms in order to make them capable of making new substances or performing new functions.

Genetic Map: a map of a genome which shows the relative positions of the genes and/or markers on the chromosomes.

Genetic Mutation: a change in the nucleotide sequence of a DNA molecule. Genetic mutations are a kind of genetic polymorphism. The term 'mutation', as opposed to 'polymorphism', is generally used to refer to changes in DNA sequence which are not present in most individuals of a species and either have been associated with disease (or risk of disease) or have resulted from damage inflicted by external agents (such as viruses or radiation).

Genetic Polymorphism: a difference in DNA sequence among individuals, groups, or populations (e.g. a genetic polymorphism might give rise to blue eyes versus brown eyes, or straight hair versus curly hair). Genetic polymorphisms may be the result of chance processes, or may have been induced by external agents (such as viruses or radiation). If a difference in DNA sequence among individuals has been shown to be associated with disease, it will usually be called a genetic mutation. Changes in DNA sequence which have been confirmed to be caused by external agents are also generally called 'mutations' rather than 'polymorphisms'.

Genetic Predisposition: susceptibility to a disease which is related to a genetic mutation, which may or may not result in actual development of the disease.

Genomic DNA: The basic chromosome set consisting of a species-specific number of linkage groups and the genes contained therein.

Genome: all the genetic material in the chromosomes of a particular organism; its size is generally given as its total number of base pairs.

Genomic Library: A collection of clones made from a set of randomly generated overlapping DNA fragments representing the entire genome of an organism.

Genomics: the study of genes and their function. Recent advances in genomics are bringing about a revolution in our understanding of the molecular mechanisms of disease, including the complex interplay of genetic and environmental factors. Genomics is also stimulating the discovery of breakthrough healthcare products by revealing thousands of new biological targets for the development of drugs, and by giving scientists innovative ways to design new drugs, vaccines and DNA diagnostics. Genomics-based therapeutics include 'traditional' small chemical drugs, protein drugs, and potentially gene therapy.

Genotype: the particular genetic pattern seen in the DNA of an individual. 'Genotype' is usually used to refer to the particular pair of alleles that an individual possesses at a certain location in the genome. Compare this with phenotype.

Hepatocytes: liver cells.

Hepatotoxicity: toxicity to the liver.

Heterologous Expression Systems: systems that allow expression of a gene in a different organism.

Human Genome Project: an international research effort aimed at discovering the full sequence of bases in the human genome. Led in the United States by the National Institutes of Health and the Department of Energy.

Human Genome Initiative: Collective name for several projects begun in 1986 by DOE to (1) create an ordered set of DNA segments from known chromosomal locations, (2) develop new computational methods for analyzing genetic map and DNA sequence data, and (3) develop new techniques and instruments for detecting and analyzing DNA. This DOE initiative is now known as the Human Genome Program. The national effort, led by DOE and NIH, is known as the Human Genome Project

Hybridization: The process of joining two complementary strands of DNA or one each of DNA and RNA to form a double-stranded molecule.

Idiosyncrasy: specific (and usually unexplained) reaction of an individual to e.g. a chemical exposure to which most other individuals do not react at all. Examples: some people react to their first aspirin with a potentially fatal shock. General allergic reactions do not fall into this category.

In Situ Hybridization (ISH): Use of a DNA or RNA probe to detect the presence of the complementary DNA sequence in cloned bacterial or cultured eukaryotic cells.

E intron: a length of DNA which is interspersed among the protein-coding sequences (exons) in a gene. Introns are transcribed (see transcription) into mRNA but are then cut out of the mRNA sequence before protein synthesis occurs.

Kilobase (kb): a length of DNA equal to 1000 nucleotides.

Knockout Animals: genetically engineered animals in which one or more genes, usually present and active in the normal animal, are absent or inactive.

Library: a set of clones of DNA sequences from an organism's genome. A particular library might include, for example, clones of all of the DNA sequences expressed in a certain kind of cell, or in a certain organ of the body.

Marker: a sequence of bases at a unique physical location in the genome, which varies sufficiently between individuals that its pattern of inheritance can be tracked through families and/or it can be used to distinguish among cell types. A marker may or may not be part of a gene. Markers are essential for use in linkage studies and genetic maps to help scientists to narrow down the possible location of new genes, and to discover the associations between genetic mutations and disease.

Messenger RNA (mRNA): the DNA of a gene is transcribed (see transcription) into mRNA molecules, which then serve as a template for the synthesis of proteins.

Metabonome: constituent metabolites in a biological sample.

Metabonomics: techniques available to identify the presence and concentrations of metabolites in a biological sample.

Murine: of the mouse.

Mutation: A change, deletion, or rearrangement in the DNA sequence that may lead to the synthesis of an altered inactive protein the loss of the ability to produce the protein. If a mutation occurs in a germ cell, then it is a heritable change in that it can be transmitted from generation to generation. Mutations may also be in somatic cells and are not heritable in the traditional sense of the word, but are transmitted to all daughter cells.

Nephrotoxicity: toxicity to the kidney.

NMR: Nuclear Magnetic Resonance, a technique to identify atoms in a sample by measuring the signal given off by the relaxation of e.g. protons previously aligned in a strong magnetic field.

Non-genotoxic Carcinogen: a substance that causes cancer, not by primarily damaging the genetic material, but by mechanisms that stimulate cell proliferation, thus increasing the chances for natural mutations to be reproduced, and/or selection of specific cell populations that may derange in a later stage.

Nucleic Acid: one of the family of molecules which includes the DNA and RNA molecules. Nucleic acids were so named because they were originally discovered within the nucleus of cells, but they have since been found to exist outside the nucleus as well.

Nucleotide: the 'building block' of nucleic acids, such as the DNA molecule. A nucleotide consists of one of four bases - adenine, guanine, cytosine, or thymine - attached to a phosphate-sugar group. In DNA the sugar group is deoxyribose, while in RNA (a DNA-related molecule which helps to translate genetic information into proteins), the sugar group is ribose, and the base uracil substitutes for thymine. Each group of three nucleotides in a gene is known as a codon. A nucleic acid is a long chain of nucleotides joined together, and therefore is sometimes referred to as a 'polynucleotide'.

Nucleus: the membrane bound structure containing a cell's central DNA found within all eukarotic cells.

Null Allele: inactive form of a gene.

Oligonucleotide: A molecule made up of a small number of nucleotides, typically fewer than 25. These are frequently used as DNA synthesis primers.

Oncogene: a gene which is associated with the development of cancer.

Pharmacogenomics: The science of understanding the correlation between an individual patient's genetic make-up (genotype) and their response to drug treatment. Some drugs work well in some patient populations and not as well in others. Studying the genetic basis of patient response to therapeutics allows drug developers to more effectively design therapeutic treatments.

Phenotype: a set of observable physical characteristics of an individual organism. A single characteristic can be referred to as a 'trait', although a single trait is sometimes also called a phenotype. For example, blond hair could be called a trait or a phenotype, as could obesity. A phenotype can be the result of many factors, including an individual's genotype, environment, and lifestyle, and the interactions among these factors. The observed manifestation of a genotype. The phenotype may be expressed physically, biochemically, or physiologically.

Plasmid: A structure composed of DNA that is separate from the cell's genome. In bacteria, plasmids confer a variety of traits and can be exchanged between individuals - even those of different species. Plasmids can be manipulated in the laboratory to deliver specific genetic sequences into a cell.

Polymerase Chain Reaction (PCR): a method for creating millions of copies of a particular segment of DNA. If a scientist needs to detect the presence of a very small amount of a particular DNA sequence, PCR can be used to amplify the amount of that sequence until there are enough copies available to be detected.

Polymorphism: in this context, the existence of inter-individual differences in DNA sequences coding for one specific gene. The effects of such differences may vary dramatically, ranging from no effect at all to the building of inactive proteins.

Primer: Short pre-existing polynucleotide chain to which new deoxyribonucleotides can be added by DNA polymerase.

Probe: Single-stranded DNA or RNA molecules of specific base sequence, labelled either radioactively or immunologically, that are used to detect the complementary base sequence by hybridisation.

Promoter: a segment of DNA located at the 'front' end of a gene, which provides a site where the enzymes in involved in the transcription process can bind on to a DNA molecule, and initiate transcription. Promoters are critically involved in the regulation of gene expression.

Proteome: total protein complement expressed by a cell, tissue or organism.

Proteomics: study of protein properties on a large scale to obtain a global, integrated view of cellular processes including expression levels, post translational modifications, interactions and location.

Recombinant DNA: DNA molecules that have been created by combining DNA more than one source.

Regulatory Gene: a gene which controls the protein-synthesising activity of other genes.

Reverse Transcriptase: An enzyme used by retroviruses to form a complementary DNA sequence (cDNA) from an RNA template -usually the genome of the retrovirus. The enzyme then performs a complimentary template of the cDNA strand such that a double stranded DNA molecule is formed. This double stranded DNA molecule is then inserted into the chromosome of the host cell which has been infected by the retrovirus. Reverse transcriptase is one of the key components that HIV uses to mount its attack.

RNA (ribonucleic acid): a molecule similar to DNA, which helps in the process of decoding the genetic information carried by DNA.

Serum-responsiveness: cell proliferative reaction to the addition of serum to tissue culture medium after prior deprivation.

Sequencing: determining the order of nucleotides in a DNA or RNA molecule, or determining the order of amino acids in a protein.

Signature Sequencing: sequencing of a short stretch of cDNA close to the end of the complementary mRNA. Sequence stretches of some 20 nucleotides are sufficiently discriminative to identify the transcript of an individual gene in a mammalian tissue.

Single Nucleotide Polymorphism (SNP): Inter-individual variations in the genetic code at the level of one nucleotide.

Southern Blotting: Transfer by absorption of DNA fragments separated in electrophoretic gels to membrane filters for detection of specific base sequences by radiolabeled complementary probes.

Splicing: the removal of introns from the sequence of mRNA. When an mRNA molecule is synthesized from a DNA template, introns are transcribed (see transcription) along with exons. In the splicing process, this material is cut out and the exons are joined together to form a continuous coding sequence.

Suppressor Gene: a gene which helps to reverse the effects of damage to an individual's genetic material, typically effects which might lead to uncontrolled cell growth (as would occur in cancer). A suppressor gene may, for example, code for a protein which checks genes for misspellings, and/or which triggers a cell's self-destruction if too many genetic mutations have accumulated.

Toxicogenomics: a new scientific subdiscipline that combines the emerging technologies of genomics and bioinformatics to identify and characterize mechanisms of action of known and suspected toxicants. Currently, the premier toxicogenomic tools are the DNA microarray and the DNA chip, which are used for the simultaneous monitoring of expression levels of hundreds to thousands of genes.

Transcription: the process during which the information in a length of DNA is used to construct an mRNA molecule.

Transcriptomics: techniques available to identify mRNA from actively transcribed genes.

Transcriptome: mRNA from actively transcribed genes

Transcript Profiling: see transcripomics

Transfer RNA (tRNA): RNA molecules which bond with amino acids and transfer them to ribosomes, where protein synthesis is completed.

Transformation: A process by which the genetic material carried by an individual cell is altered by incorporation of exogenous DNA into its genome.

Transgenic: An organism whose genome has been altered by the inclusion of foreign genetic material. This foreign genetic material may be derived from other individuals of the same species or from wholly different species. Genetic material may also be of an artificial nature. Foreign genetic information can be added to the organism during its early development and incorporated in cells of the entire organism. As an example, mice embryos have been given the gene for rat growth hormone allowing mice to grow into large adults. Genetic information can also be added later in development to selected portions of the organism. As an example, experimental genetic therapy to treat cystic fibrosis involves selective addition of genes responsible for lung function and is administered directly to the lung tissue of children and adults. Transgenic organisms have been produced that provide enhanced agricultural and pharmaceutical products. Insect resistant crops and cows that produce human hormones in their milk are just two examples.

Transgenic Organism: an organism whose genome has been altered by the incorporation of foreign, or exogenous DNA.

Translation: the process during which the information in mRNA molecules is used to construct proteins.

Vector: [1] An organism which serves to transfer a disease causing organism (pathogen) from one organism to another. [2] a mechanism whereby foreign gene(s) are moved into an organism and inserted into that organism's genome. Retroviruses such as HIV serve as vectors by inserting genetic information (DNA ) into the genome of human cells. Bacteria can serve as vectors in plant populations.

Xenobiotic(s): substances not normally present in the reference organism