Why do two different E. coli reference genomes have different lengths?

Why do two different E. coli reference genomes have different lengths?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I've downloaded two different reference genome of E. coli (E. coli K-12 MG1655: U00096.1 and E. coli K-12 MG1655: U00096.2) and they have different lengths. I searched for the meaning of reference genome numbers in detail but I couldn't find anything that help me with my question: Why do two different E. coli reference genomes have different lengths?

U00096.2 is an updated version of U00096.1; you should preferably use the former for your analysis. In fact, even U00096.2 has been updated. The latest version is U00096.3. In general, the number after the dot (period) in NCBI accession numbers, denotes the version.

From NCBI:

VERSION is made of the accession number of the database record followed by a dot and a version number (and is therefore sometimes referred to as the "accession.version")

As WYSIWYG answered, they are updated versions.

The difference in length is mainly because that later versions are sequenced / aligned with better equipement and techniques, providing results that better reflects the reality.

Some sections of the genomes can be harder to sequence and efficiently align, but with more advanced techniques it can be done.

Sequencing 100% of a genome with no error and perfect alignement is not yet possible but we are getting there slowly.

Visually comparing bacteria and mammal genomes

     Bacteria and mammals are obviously very different organisms on the outside and inside, but both utilize DNA in very similar fashions. For example they both have sections of DNA that are transcribed into mRNA, which then codes for and is translated into proteins. Nevertheless, when comparing the locations of coding regions on DNA an obvious difference emerges.

Part 1 - Find and observe a mammalian genome

  1. Open OrganismView (quick link) and search for one of the following mammals: Dog, Human, Pig, Horse, or any other mammal you can find in the results field

           - Note: many of the results will be for viruses associated with the animal you searched for.  To be sure you selected a mammal and not a virus look in the "Organism Information" box for "Mammalia"

     2.  Make sure the mammal you searched for is highlighted then click Launch genome viewer near the bottom left side of the page

           - Note: This will open GenomeView in a new window that displays the mammal's genome.  You can navigate the genome by using your mouse to zoom out and sliding to the right.  The green and grey boxes with arrowheads represent genes. Though there are many graphics to represent other things (i.e. Genomic features) in genomes: example graphics.

     3.  Quickly observe the gene locations, note the space they take up in the genome, their proximity to other genes, and their structure (e.g. introns and exons)

Part 2 - Find and observe a bacterial genome

     1.  Open OrganismView (quick link) and search for Escherichia coli (commonly known as E. coli) and select the first result

     2.  Click Launch genome viewer and observe the organism's gene

     3.  Quickly observe the gene locations, note the space they take up in the genome, their proximity to other genes, and their structure (e.g. introns and exons)

Part 3 - Identify differences between mammals and bacteria

     1.  List two obvious differences between the genome

     2.  Propose some hypothesizes that explain these differences

Part 4 - Additional concepts

  • What is an operon?
  • Do mammal and/or bacterial genomes have operons?
  • What is the function of operons?


The comparison of the small molecule metabolism pathways in Escherichia coli and Saccharomyces cerevisiae (yeast) shows that 271 enzymes are common to both organisms. These common enzymes involve 384 gene products in E. coli and 390 in yeast, which are between one half and two thirds of the gene products of small molecule metabolism in E. coli and yeast, respectively. The arrangement and family membership of the domains that form all or part of 374 E. coli sequences and 343 yeast sequences was determined. Of these, 70% consist entirely of homologous domains, and 20% have homologous domains linked to other domains that are unique to E. coli, yeast, or both. Over two thirds of the enzymes common to the two organisms have sequence identities between 30% and 50%. The remaining groups include 13 clear cases of nonorthologous displacement. Our calculations show that at most one half to two thirds of the gene products involved in small molecule metabolism are common to E. coli and yeast. We have shown that the common core of 271 enzymes has been largely conserved since the separation of prokaryotes and eukaryotes, including modifications for regulatory purposes, such as gene fusion and changes in the number of isozymes in one of the two organisms. Only one fifth of the common enzymes have nonhomologous domains between the two organisms. Around the common core very different extensions have been made to small molecule metabolism in the two organisms.

Here we compare the enzymes of small molecule metabolism found in the prokaryote Escherichia coli and the unicellular eukaryote Saccharomyces cerevisiae (yeast). There is evidence for the existence of prokaryotes 3.8 billion years ago (bya) and of eukaryotes 2.7 bya (Mojzsis et al. 1996 Brocks et al. 1999). Endosymbiosis of an α-proteobacterium is widely accepted as the origin of mitochondria, and mitochondrial genes, in the eukaryotes (Margulis 1970). This endosymbiosis event must have occured before the divergence of plants, 1.6 bya (Lang et al. 1999 Wang et al. 1999), and arguments have been made for it being much earlier (Martin and Müller 1998). Thus according to these estimates, most of the enzymes of small molecule metabolism in E. coli and yeast have had between 1.6 and 2.7 by of separate evolution, depending on whether the yeast enzymes originate from the eukaryotic ancestor or the protomitochondrial genome (Brown and Doolittle 1997).

Regardless of the origin of the enzymes, during this time there have been countless chances for orthologous genes in the two organisms to diverge by mutation, to undergo recombinations resulting in domain loss or accretion, and to change gene structure by gene fusion or fission. New genes for an existing function could be acquired by horizontal transfer or functional displacement of one gene by another within a genome. In addition, many new genes have arisen by duplication and divergence to produce new enzymatic functions and pathways.

Until now, investigations of these evolutionary processes have been limited to studying one aspect, such as gene fusion (Enright et al. 1999) or nonorthologous displacement (Koonin et al. 1996 Makarova et al. 1999), or have focused on differences in pathway topologies rather than the evolution of common enzymes (Huynen et al. 1999). Here we investigate, and to some extent quantify, the frequency of all these evolutionary processes in a large set of enzymes common to the two very distantly related organisms. The extensive information available on the enzymes and pathways of small molecule metabolism in E. coli and yeast allows us to determine the extent to which different evolutionary processes have taken place since they separated from their last common ancestor. At present such a comparison would be much less successful in any other pair of organisms due to the lack of knowledge of their enzymes and pathways. E. coli and yeast have long been model organisms and have been the subjects of very extensive experimental characterization of their genes and proteins, including the determination of their complete genome sequence.

We show that over half of the gene products involved in small molecule metabolism of E. coli and yeast carry out common reactions in the two organisms. Our approach is to use sequence and structural information to characterise the domain structure and the evolutionary relationships of these shared enzymes. The use of structural information together with powerful multiple sequence comparison methods, as well as assignment to sequence families, provides us with an almost complete picture of the protein families that the enzymes belong to, including very distant evolutionary relationships.

Knowledge of the domain architecture of common enzymes allows us to assess the extent of conservation between enzymes, but also provides insight into aspects of the regulation of enzymes, such as differing numbers of isozymes in E. coli and yeast and instances of gene fusion. As well as affecting regulation of otherwise separate genes, gene fusion serves to co-localize gene products. Protein–protein interactions have the same effect, and we survey and compare protein-protein interactions as well as gene fusions in yeast.

2.2. The Anatomy of the Eukaryotic Genome

We have already learnt that the human genome is split into two components: the nuclear genome and the mitochondrial genome (see Figure 1.1). This is the typical pattern for most eukaryotes, the bulk of the genome being contained in the chromosomes in the cell nucleus and a much smaller part located in the mitochondria and, in the case of photosynthetic organisms, in the chloroplasts. We will look first at the nuclear genome.

2.2.1. Eukaryotic nuclear genomes

The nuclear genome is split into a set of linear DNA molecules, each contained in a chromosome. No exceptions to this pattern are known: all eukaryotes that have been studied have at least two chromosomes and the DNA molecules are always linear. The only variability at this level of eukaryotic genome structure lies with chromosome number, which appears to be unrelated to the biological features of the organism. For example, yeast has 16 chromosomes, four times as many as the fruit fly. Nor is chromosome number linked to genome size: some salamanders have genomes 30 times bigger than the human version but split into half the number of chromosomes. These comparisons are interesting but at present do not tell us anything useful about the genomes themselves they are more a reflection of the non-uniformity of the evolutionary events that have shaped genome architecture in different organisms.

Packaging of DNA into chromosomes

Chromosomes are much shorter than the DNA molecules that they contain: the average human chromosome has just under 5 cm of DNA. A highly organized packaging system is therefore needed to fit a DNA molecule into its chromosome. We must understand this packaging system before we start to think about how genomes function because the nature of the packaging has an influence on the processes involved in expression of individual genes (Section 8.2).

The important breakthroughs in understanding DNA packaging were made in the early 1970s by a combination of biochemical analysis and electron microscopy. It was already known that nuclear DNA is associated with DNA-binding proteins called histones but the exact nature of the association had not been delineated. In 1973-74 several groups carried out nuclease protection experiments on chromatin (DNA-histone complexes) that had been gently extracted from nuclei by methods designed to retain as much of the chromatin structure as possible. In a nuclease protection experiment the complex is treated with an enzyme that cuts the DNA at positions that are not ‘protected’ by attachment to a protein. The sizes of the resulting DNA fragments indicate the positioning of the protein complexes on the original DNA molecule (Figure 2.4). After limited nuclease treatment of purified chromatin, the bulk of the DNA fragments have lengths of approximately 200 bp and multiples thereof, suggesting a regular spacing of histone proteins along the DNA.

Figure 2.4

Nuclease protection analysis of chromatin from human nuclei. Chromatin is gently purified from nuclei and treated with a nuclease enzyme. On the left, the nuclease treatment is carried out under limiting conditions so that the DNA is cut, on average, (more. )

Box 2.1

Agarose gel electrophoresis. Separation of DNA and RNA molecules of different lengths Gel electrophoresis is the standard method for separating DNA molecules of different lengths. It has many applications in size analysis of DNA fragments and can also (more. )

In 1974 these biochemical results were supplemented by electron micrographs of purified chromatin, which enabled the regular spacing inferred by the protection experiments to be visualized as beads of protein on the string of DNA (Figure 2.5A). Further biochemical analysis indicated that each bead, or nucleosome, contains eight histone protein molecules, these being two each of histones H2A, H2B, H3 and H4. Structural studies have shown that these eight proteins form a barrel-shaped core octamer with the DNA wound twice around the outside (Figure 2.5B). Between 140 and 150 bp of DNA (depending on the species) are associated with the nucleosome particle, and each nucleosome is separated by 50� bp of linker DNA, giving the repeat length of 190� bp previously shown by the nuclease protection experiments.

Figure 2.5

Nucleosomes. (A) Electron micrograph of a purified chromatin strand showing the �s-on-a-string’ structure. (Courtesy of Dr Barbara Hamkalo, University of California, Irvine.) (B) The model for the �s-on-a-string’ (more. )

As well as the proteins of the core octamer, there is a group of additional histones, all closely related to one another and collectively called linker histones. In vertebrates these include histones H1a-e, H1°, H1t and H5. A single linker histone is attached to each nucleosome, to form the chromatosome, but the precise positioning of this linker histone is not known. Structural studies support the traditional model in which the linker histone acts as a clamp, preventing the coiled DNA from detaching from the nucleosome (Figure 2.5C Zhou et al., 1998 Travers, 1999). However, other results suggest that, at least in some organisms, the linker histone is not located on the extreme surface of the nucleosome-DNA assembly, as would be expected if it really were a clamp, but instead is inserted between the core octamer and the DNA (Pruss et al., 1995 Pennisi, 1996).

The �s-on-a-string’ structure shown in Figure 2.5A is thought to represent an unpacked form of chromatin that occurs only infrequently in living nuclei. Very gentle cell breakage techniques developed in the mid-1970s resulted in a more condensed version of the complex, called the 30 nm fiber (it is approximately 30 nm in width). The exact way in which nucleosomes associate to form the 30 nm fiber is not known, but several models have been proposed, the most popular of which is the solenoid structure shown in Figure 2.6. The individual nucleosomes within the 30 nm fiber may be held together by interactions between the linker histones, or the attachments may involve the core histones, whose protein ‘tails’ extend outside the nucleosome (see Figure 8.9). The latter hypothesis is attractive because chemical modification of these tails results in the 30 nm fiber opening up, enabling genes contained within it to be activated (Section 8.2.1).

Figure 2.6

The solenoid model for the 30 nm chromatin fiber. In this model, the �s-on-a-string’ structure of chromatin is condensed by winding the nucleosomes into a helix with six nucleosomes per turn. Higher levels of chromatin packaging are (more. )

The special features of metaphase chromosomes

The 30 nm fiber is probably the major type of chromatin in the nucleus during interphase, the period between nuclear divisions. When the nucleus divides, the DNA adopts a more compact form of packaging, resulting in the highly condensed metaphase chromosomes that can be seen with the light microscope and which have the appearance generally associated with the word 𠆌hromosome’ (Figure 2.7). The metaphase chromosomes form at a stage in the cell cycle after DNA replication has taken place and so each one contains two copies of its chromosomal DNA molecule. The two copies are held together at the centromere, which has a specific position within each chromosome. Individual chromosomes can therefore be recognized because of their size and the location of the centromere relative to the two ends. Further distinguishing features are revealed when chromosomes are stained. There are a number of different staining techniques (Table 2.4), each resulting in a banding pattern that is characteristic for a particular chromosome. This means that the set of chromosomes possessed by an organism can be represented as a karyogram, in which the banded appearance of each one is depicted. The human karyogram is shown in Figure 2.8.

Figure 2.7

The typical appearance of a metaphase chromosome. Metaphase chromosomes are formed after DNA replication has taken place, so each one is, in effect, two chromosomes linked together at the centromere. The arms are called the chromatids. A telomere is the (more. )

Table 2.4

Staining techniques used to produce chromosome banding patterns.

Figure 2.8

The human karyogram. The chromosomes are shown with the G-banding pattern obtained after Giemsa staining. Chromosome numbers are given below each structure and the band numbers to the left. ‘rDNA’ is a region containing a cluster of repeat (more. )

Both the DNA in the centromere regions, and the proteins attached to it, have special features. The nucleotide sequence of centromeric DNA is best understood in the plant Arabidopsis thaliana, whose amenity to genetic analysis has enabled the positions of the centromeres on the DNA sequence to be located with some precision. Also, a special effort was made to sequence these centromeric regions, which are frequently excluded from draft genome sequences because of problems in obtaining an accurate reading through the highly repetitive structure that characterize these regions. Arabidopsis centromeres span 0.9𠄱.2 Mb of DNA and each one is made up largely of 180-bp repeat sequences. In humans the equivalent sequences are 171 bp and are called alphoid DNA. Before the Arabidopsis sequences were obtained it was thought that these repeat sequences were by far the principal component of centromeric DNA. However, Arabidopsis centromeres also contain multiple copies of genome-wide repeats, along with a few genes, the latter at a density of 7𠄹 per 100 kb compared with 25 genes per 100 kb for the non-centromeric regions of Arabidopsis chromosomes (Copenhaver et al., 1999). The discovery that centromere DNA contains genes was a big surprise because it was thought that these regions were genetically inactive.

The special centromeric proteins in humans include at least seven that are not found elsewhere in the chromosome (Warburton, 2001). One of these proteins, CENP-A, is very similar to histone H3 and is thought to replace this histone in the centromeric nucleosomes. It is assumed that the small distinctions between CENP-A and H3 confer special properties on centromeric nucleosomes, but exactly what these properties might be and how they relate to the function of the centromere is not yet known. Part of the function of the centromere itself is revealed by the electron microscope, which shows that in a dividing cell a pair of plate-like kinetochores are present on the surface of the chromosome in the centromeric region. These structures act as the attachment points for the microtubules that radiate from the spindle pole bodies located at the nuclear surface and which draw the divided chromosomes into the daughter nuclei (Figure 2.9). Part of the kinetochore is made up of alphoid DNA plus CENP-A and other proteins, but its structure has not been described in detail (Vafa and Sullivan, 1997).

Figure 2.9

The role of the kinetochores during nuclear division. During the anaphase period of nuclear division (see Figures 5.14 and 5.15), individual chromosomes are drawn apart by the contraction of microtubules attached to the kinetochores.

A second important part of the chromosome is the terminal region or telomere. Telomeres are important because they mark the ends of chromosomes and therefore enable the cell to distinguish a real end from an unnatural end caused by chromosome breakage - an essential requirement because the cell must repair the latter but not the former. Telomeric DNA is made up of hundreds of copies of a repeated motif, 5′-TTAGGG-3′ in humans, with a short extension of the 3′ terminus of the double-stranded DNA molecule (Figure 2.10). Two special proteins bind to the repeat sequences in human telomeres. These are called TRF1, which helps to regulate the length of the telomere, and TRF2, which maintains the single-strand extension. If TRF2 is inactivated then this extension is lost and the two polynucleotides fuse together in a covalent linkage (van Steensel et al., 1998). Other telomeric proteins are thought to form a linkage between the telomere and the periphery of the nucleus, the area in which the chromosome ends are localized (Tham and Zakian, 2000). Still others mediate the enzymatic activity that maintains the length of each telomere during DNA replication. We will return to this last activity in Section 13.2.4: it critical to the survival of the chromosome and may be a key to understanding cell senescence and death.

Figure 2.10

Telomeres. The sequence at the end of a human telomere. The length of the 3′ extension is different in each telomere. See Section 13.2.4 for more details about telomeric DNA.

Box 2.1

Unusual chromosome types. The karyograms of some organisms display unusual features not displayed by the human version. These include the following: Minichromosomes are relatively short in length but rich in genes. The chicken genome, for example, is (more. )

Where are the genes in a eukaryotic genome?

In the previous section we learnt that Arabidopsis centromeres contain genes but at a lesser density than that in the rest of the chromosomes. This alerts us to the fact that the genes are not arranged evenly along the length of a chromosome. In most organisms, genes appear to be distributed more at less at random, with substantial variations in gene density at different positions within a chromosome. The average gene density in Arabidopsis is 25 genes per 100 kb, but even outside of the centromeres and telomeres the density varies from 1 to 38 genes per 100 kb, as illustrated in Figure 2.11 for the largest of the plant's five chromosomes. The same is true for human chromosomes, where the density ranges from 0 to 64 genes per 100 kb.

Figure 2.11

Gene density along the largest of the five Arabidopsis thaliana chromosomes. Chromosome 1, which is 29.1 Mb in length, is illustrated with the sequenced portions shown in red and the centromere and telomeres in blue. The gene map below the chromosome (more. )

The uneven gene distribution within human chromosomes was suspected for several years before the draft sequence was completed. There were two lines of evidence, one of which related to the banding patterns that are produced when chromosomes are stained. The dyes used in these procedures (see Table 2.4) bind to DNA molecules, but in most cases with preferences for certain base pairs. Giemsa, for example, has a greater affinity for DNA regions that are rich in A and T nucleotides. The dark G-bands in the human karyogram (see Figure 2.8) are therefore thought to be AT-rich regions of the genome. The base composition of the genome as a whole is 59.7% A + T so the dark G-bands must have AT contents substantially greater than 60%. Cytogeneticists therefore predicted that there would be fewer genes in dark G-bands because genes generally have AT contents of 45�%. This prediction was confirmed when the draft genome sequence was compared with the human karyogram (IHGSC, 2001).

The second line of evidence pointing to uneven gene distribution derived from the isochore model of genome organization (Gardiner, 1996). According to this model, the genomes of vertebrates and plants (and possibly of other eukaryotes) are mosaics of segments of DNA, each at least 300 kb in length, with each segment having a uniform base composition that differs from that of the adjacent segments. Support for the isochore model comes from experiments in which genomic DNA is broken into fragments of approximately 100 kb, treated with dyes that bind specifically to AT- or GC-rich regions, and the pieces separated by density gradient centrifugation (Technical Note 2.2). When this experiment is carried out with human DNA, five fractions are seen, each representing a different isochore type with a distinctive base composition: two AT-rich isochores, called L1 and L2, and three GC-rich classes: H1, H2 and H3. The last of these, H3, is the least abundant in the human genome, making up only 3% of the total, but contains over 25% of the genes. This is a clear indication that genes are not distributed evenly through the human genome. The draft genome sequence suggests that the isochore theory over-simplifies what is, in reality, a much more complex pattern of variations in base composition along the length of each human chromosome (IHGSC, 2001). But even if it turns out to be a misconception, the isochore theory has played an important role in helping molecular biologists of the pre-sequence era to understand genome structure.

Box 2.2

Ultracentrifugation techniques. Methodology for separation of cell components and large molecules The development of high-speed centrifuges in the 1920s led to techniques for separating organelles and other fractions from disrupted cells. The first technique (more. )

What genes are present in a eukaryotic genome?

There are various ways to categorize the genes in a eukaryotic genome. One possibility is to classify the genes according to their function, as shown in Figure 1.18 (page 21) for the human genome. This system has the advantage that the fairly broad functional categories used in Figure 1.18 can be further subdivided to produce a hierarchy of increasingly specific functional descriptions for smaller and smaller sets of genes. The weakness with this approach is that functions have not yet been assigned to many eukaryotic genes, so this type of classification leaves out a proportion of the total gene set. A more powerful method is to base the classification not on the functions of genes but on the structures of the proteins that they specify. A protein molecule is constructed from a series of domains, each of which has a particular biochemical function. Examples are the zinc finger, which is one of several domains that enable a protein to bind to a DNA molecule (Section 9.1.4), and the �th domain’, which is present in many proteins involved in apoptosis, the process of programmed cell death. Each domain has a characteristic amino acid sequence, perhaps not exactly the same sequence in every example of that domain, but close enough for the presence of a particular domain to be recognizable by examining the amino acid sequence of the protein. The amino acid sequence of a protein is specified by the nucleotide sequence of its gene, so the domains present in a protein can be determined from the nucleotide sequence of the gene that codes for that protein. The genes in a genome can therefore be categorized according to the protein domains that they specify. This method has the advantage that it can be applied to genes whose functions are not known and hence can encompass a larger proportion of the set of genes in a genome.

Classification schemes based on gene function suggest that all eukaryotes possess the same basic set of genes, but that more complex species have a greater number of genes in each category. For example, humans have the greatest number of genes in all but one of the categories used in Figure 2.12, the exception being ‘metabolism’ where Arabidopsis comes out on top as a result of its photosynthetic capability, which requires a large set of genes not present in the other four genomes included in this comparison. This functional classification reveals other interesting features, notably that C. elegans has a relatively high number of genes whose functions are involved in cell-cell signaling, which is surprising given that this organism has just 959 cells. Humans, who have 10 13 cells, have only 250 more genes for cell-cell signaling. In general, this type of analysis emphasizes the similarities between genomes, but does not reveal the genetic basis of the vastly different types of biological information contained in the genomes of, for example, fruit flies and humans. The domain approach holds more promise in this respect because it shows that the human genome specifies a number of protein domains that are absent from the genomes of other organisms, these domains including several involved in activities such as cell adhesion, electric couplings, and growth of nerve cells (Table 2.5). These functions are interesting because they are ones that we look on as conferring the distinctive features of vertebrates compared with other types of eukaryote.

Figure 2.12

Comparison of the gene catalogs of Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, fruit fly and humans. Genes are categorized according to their function, as deduced from the protein domains specified by each gene. Redrawn from (more. )

Table 2.5

Examples of protein domains specified by different genomes.

Is it possible to identify a set of genes that are present in vertebrates but not in other eukaryotes? This analysis can only be done in an approximate way at present because only a few genome sequences are available. It currently appears that approximately one-fifth to one-quarter of the genes in the human genome are unique to vertebrates, and a further quarter are found only in vertebrates and other animals (Figure 2.13).

Figure 2.13

Relationship between the human gene catalog and the catalogs of other groups of organism. The pie chart categorizes the human gene catalog according to the distribution of individual genes in other organisms. The chart shows, for example, that 22% of (more. )

Families of genes

Since the earliest days of DNA sequencing it has been known that multigene families - groups of genes of identical or similar sequence - are common features of many genomes. For example, every eukaryote that has been studied (as well as all but the simplest bacteria) has multiple copies of the genes for the non-coding ribosomal RNAs (rRNAs Section 3.2.1). This is illustrated by the human genome, which contains approximately 2000 genes for the 5S rRNA (so-called because it has a sedimentation coefficient of 5S see Technical Note 2.2), all located in a single cluster on chromosome 1. There are also about 280 copies of a repeat unit containing the 28S, 5.8S and 18S rRNA genes, grouped into five clusters of 50� repeats, one on each of chromosomes 13, 14, 15, 21 and 22 (see Figure 2.8). Ribosomal RNAs are components of the protein-synthesizing particles called ribosomes, and it is presumed that their genes are present in multiple copies because there is a heavy demand for rRNA synthesis during cell division, when several tens of thousands of new ribosomes must be assembled.

The rRNA genes are examples of ‘simple’ or 𠆌lassical’ multigene families, in which all the members have identical or nearly identical sequences. These families are believed to have arisen by gene duplication, with the sequences of the individual members kept identical by an evolutionary process that, as yet, has not been fully described (Section 15.2.1). Other multigene families, more common in higher eukaryotes than in lower eukaryotes, are called 𠆌omplex’ because the individual members, although similar in sequence, are sufficiently different for the gene products to have distinctive properties. One of the best examples of this type of multigene family are the mammalian globin genes. The globins are the blood proteins that combine to make hemoglobin, each molecule of hemoglobin being made up of two α-type and two β-type globins. In humans the α-type globins are coded by a small multigene family on chromosome 16 and the β-type globins by a second family on chromosome 11 (Figure 2.14). These genes were among the first to be sequenced, back in the late 1970s (Fritsch et al., 1980). The sequence data showed that the genes in each family are similar to one another, but by no means identical. In fact the nucleotide sequences of the two most different genes in the β-type cluster, coding for the β- and ε-globins, display only 79.1% identity. Although this is similar enough for both proteins to be β-type globins, it is sufficiently different for them to have distinctive biochemical properties. Similar variations are seen in the α-cluster.

Figure 2.14

The human α- and β-globin gene clusters. The α-globin cluster is located on chromosome 16 and the β-cluster on chromosome 11. Both clusters contain genes that are expressed at different developmental stages and each includes (more. )

Why are the members of the globin gene families so different from one another? The answer was revealed when the expression patterns of the individual genes were studied. It was discovered that the genes are expressed at different stages in human development: for example, in the β-type cluster ε is expressed in the early embryo, Gγ and Aγ (whose protein products differ by just one amino acid) in the fetus, and δ and β in the adult (Figure 2.14). The different biochemical properties of the resulting globin proteins are thought to reflect slight changes in the physiological role that hemoglobin plays during the course of human development.

In some multigene families, the individual members are clustered, as with the globin genes, but in others the genes are dispersed around the genome. An example of a dispersed family is the five human genes for aldolase, an enzyme involved in energy generation, which are located on chromosomes 3, 9, 10, 16 and 17. The important point is that, even though dispersed, the members of the multigene family have sequence similarities that point to a common evolutionary origin. When these sequence comparisons are made it is sometimes possible to see relationships not only within a single gene family but also between different families. All of the genes in the α- and β-globin families, for example, have some sequence similarity and are thought to have evolved from a single ancestral globin gene. We therefore refer to these two multigene families as comprising a single globin gene superfamily, and from the similarities between the individual genes we can chart the duplication events that have given rise to the series of genes that we see today (Section 15.2.1).

Box 2.2

Two examples of unusual gene organization. These are occasionally found in small compact genomes such as those of viruses. Usually the amino acid sequences of the proteins coded by a pair of overlapping genes are not similar because the mRNAs are translated (more. )

2.2.2. Eukaryotic organelle genomes

Now we move out of the nucleus to examine the genomes present in the mitochondria and chloroplasts of eukaryotic cells. The possibility that some genes might be located outside of the nucleus - extrachromosomal genes as they were initially called - was first raised in the 1950s as a means of explaining the unusual inheritance patterns of certain genes in the fungus Neurospora crassa, the yeast S. cerevisiae and the photosynthetic alga Chlamydomonas reinhardtii. Electron microscopy and biochemical studies at about the same time provided hints that DNA molecules might be present in mitochondria and chloroplasts. Eventually, in the early 1960s, these various lines of evidence were brought together and the existence of mitochondrial and chloroplast genomes, independent of and distinct from the nuclear genome, was accepted.

Physical features of organelle genomes

Almost all eukaryotes have mitochondrial genomes, and all photosynthetic eukaryotes have chloroplast genomes. Initially, it was thought that virtually all organelle genomes were circular DNA molecules. Electron microscopy studies had shown both circular and linear DNA in some organelles, but it was assumed that the linear molecules were simply fragments of circular genomes that had become broken during preparation for electron microscopy. We still believe that most mitochondrial and chloroplast genomes are circular, but we now recognize that there is a great deal of variability in different organisms. In many eukaryotes the circular genomes coexist in the organelles with linear versions and, in the case of chloroplasts, with smaller circles that contain subcomponents of the genome as a whole. The latter pattern reaches its extreme in the marine algae called dinoflagellates, whose chloroplast genomes are split into many small circles, each containing just a single gene (Zhang et al., 1999). We also now realize that the mitochondrial genomes of some microbial eukaryotes (e.g. Paramecium, Chlamydomonas and several yeasts) are always linear (Nosek et al., 1998).

Copy numbers for organelle genomes are not particularly well understood. Each human mitochondrion contains about 10 identical molecules, which means that there are about 8000 per cell, but in S. cerevisiae the total number is probably smaller (less than 6500) even though there may be over 100 genomes per mitochondrion. Photosynthetic microorganisms such as Chlamydomonas have approximately 1000 chloroplast genomes per cell, about one-fifth the number present in a higher plant cell. One mystery, which dates back to the 1950s and has never been satisfactorily solved, is that when organelle genes are studied in genetic crosses the results suggest that there is just one copy of a mitochondrial or chloroplast genome per cell. This is clearly not the case but indicates that our understanding of the transmission of organelle genomes from parent to offspring is less than perfect.

Mitochondrial genome sizes are variable (Table 2.6) and are unrelated to the complexity of the organism. Most multicellular animals have small mitochondrial genomes with a compact genetic organization, the genes being close together with little space between them. The human mitochondrial genome (see Figure 1.22), at 16 569 bp, is typical of this type. Most lower eukaryotes such as S. cerevisiae (Figure 2.15), as well as flowering plants, have larger and less compact mitochondrial genomes, with a number of the genes containing introns. Chloroplast genomes have less variable sizes (Table 2.6) and most have a structure similar to that shown in Figure 2.16 for the rice chloroplast genome.

Table 2.6

Sizes of mitochondrial and chloroplast genomes.

Figure 2.15

The Saccharomyces cerevisiae mitochondrial genome. Because of their relatively small sizes, many mitochondrial genomes have been completely sequenced. In the yeast genome, the genes are more spaced out than in the human mitochondrial genome (Figure 1.22) (more. )

Figure 2.16

The rice chloroplast genome. Only those genes with known functions are shown. A number of the genes contain introns which are not indicated on this map. These discontinuous genes include several of those for tRNAs, which is why the tRNA genes are of different (more. )

The genetic content of organelle genomes

Organelle genomes are much smaller than their nuclear counterparts and we therefore anticipate that their gene contents are much more limited, which is indeed the case. Again, mitochondrial genomes display the greater variability, gene contents ranging from five for the malaria parasite P. falciparum to 92 for the protozoan Reclinomonas americana (Table 2.7 Lang et al., 1997 Palmer, 1997a). All mitochondrial genomes contain genes for the non-coding rRNAs and at least some of the protein components of the respiratory chain, the latter being the main biochemical feature of the mitochondrion. The more gene-rich genomes also code for tRNAs, ribosomal proteins, and proteins involved in transcription, translation and transport of other proteins into the mitochondrion from the surrounding cytoplasm (Table 2.7). Most chloroplast genomes appear to possess the same set of 200 or so genes, again coding for rRNAs and tRNAs, as well as ribosomal proteins and proteins involved in photosynthesis (see Figure 2.16).

Table 2.7

Features of mitochondrial genomes.

A general feature of organelle genomes emerges from Table 2.7. These genomes specify some of the proteins found in the organelle, but not all of them. The other proteins are coded by nuclear genes, synthesized in the cytoplasm, and transported into the organelle. If the cell has mechanisms for transporting proteins into mitochondria and chloroplasts, then why not have all the organelle proteins specified by the nuclear genome? We do not yet have a convincing answer to this question, although it has been suggested that at least some of the proteins coded by organelle genomes are extremely hydrophobic and cannot be transported through the membranes that surround mitochondria and chloroplasts, and so simply cannot be moved into the organelle from the cytoplasm (Palmer, 1997b). The only way in which the cell can get them into the organelle is to make them there in the first place.

The origins of organelle genomes

The discovery of organelle genomes led to many speculations about their origins. Today most biologists accept that the endosymbiont theory is correct, at least in outline, even though it was considered quite unorthodox when first proposed in the 1960s. The endosymbiont theory is based on the observation that the gene expression processes occurring in organelles are similar in many respects to equivalent processes in bacteria. In addition, when nucleotide sequences are compared organelle genes are found to be more similar to equivalent genes from bacteria than they are to eukaryotic nuclear genes. The endosymbiont theory therefore holds that mitochondria and chloroplasts are the relics of free-living bacteria that formed a symbiotic association with the precursor of the eukaryotic cell, way back at the very earliest stages of evolution.

Support for the endosymbiont theory has come from the discovery of organisms which appear to exhibit stages of endosymbiosis that are less advanced than seen with mitochondria and chloroplasts. For example, an early stage in endosymbiosis is displayed by the protozoan Cyanophora paradoxa, whose photosynthetic structures, called cyanelles, are different from chloroplasts and instead resemble ingested cyanobacteria. Similarly, the Rickettsia, which live inside eukaryotic cells, might be modern versions of the bacteria that gave rise to mitochondria (Andersson et al., 1998). It has also been suggested that the hydrogenosomes of trichomonads (unicellular microbes, many of which are parasites), some of which have a genome but most of which do not, represent an advanced type of mitochondrial endosymbiosis (Palmer, 1997b Akhmanova et al., 1998).

If mitochondria and chloroplasts were once free-living bacteria, then since the endosymbiosis was set up there must have been a transfer of genes from the organelle into the nucleus. We do not understand how this occurred, or indeed whether there was a mass transfer of many genes at once, or a gradual trickle from one site to the other. But we do know that DNA transfer from organelle to nucleus, and indeed between organelles, still occurs. This was discovered in the early 1980s, when the first partial sequences of chloroplast genomes were obtained. It was found that in some plants the chloroplast genome contains segments of DNA, often including entire genes, that are copies of parts of the mitochondrial genome. The implication is that this so-called promiscuous DNA has been transferred from one organelle to the other. We now know that this is not the only type of transfer that can occur. The Arabidopsis mitochondrial genome contains various segments of nuclear DNA as well as 16 fragments of the chloroplast genome, including six tRNA genes that have retained their activity after transfer to the mitochondrion. The nuclear genome of this plant includes several short segments of the chloroplast and mitochondrial genomes as well as a 270-kb piece of mitochondrial DNA located within the centromeric region of chromosome 2 (Copenhaver et al., 1999 AGI, 2000). The transfer of mitochondrial DNA to vertebrate nuclear genomes has also been documented.


Different factors have been proposed to be related to codon usage bias, including gene expression level (reflecting selection for optimizing the translation process by tRNA abundance), guanine-cytosine content (GC content, reflecting horizontal gene transfer or mutational bias), guanine-cytosine skew (GC skew, reflecting strand-specific mutational bias), amino acid conservation, protein hydropathy, transcriptional selection, RNA stability, optimal growth temperature, hypersaline adaptation, and dietary nitrogen. [11] [12] [13] [14] [15] [16]

Mutational bias versus selection Edit

Although the mechanism of codon bias selection remains controversial, possible explanations for this bias fall into two general categories. One explanation revolves around the selectionist theory, in which codon bias contributes to the efficiency and/or accuracy of protein expression and therefore undergoes positive selection. The selectionist model also explains why more frequent codons are recognized by more abundant tRNA molecules, as well as the correlation between preferred codons, tRNA levels, and gene copy numbers. Although it has been shown that the rate of amino acid incorporation at more frequent codons occurs at a much higher rate than that of rare codons, the speed of translation has not been shown to be directly affected and therefore the bias towards more frequent codons may not be directly advantageous. However, the increase in translation elongation speed may still be indirectly advantageous by increasing the cellular concentration of free ribosomes and potentially the rate of initiation for messenger RNAs (mRNAs). [17]

The second explanation for codon usage can be explained by mutational bias, a theory which posits that codon bias exists because of nonrandomness in the mutational patterns. In other words, some codons can undergo more changes and therefore result in lower equilibrium frequencies, also known as “rare” codons. Different organisms also exhibit different mutational biases, and there is growing evidence that the level of genome-wide GC content is the most significant parameter in explaining codon bias differences between organisms. Additional studies have demonstrated that codon biases can be statistically predicted in prokaryotes using only intergenic sequences, arguing against the idea of selective forces on coding regions and further supporting the mutation bias model. However, this model alone cannot fully explain why preferred codons are recognized by more abundant tRNAs. [17]

Mutation-selection-drift balance model Edit

To reconcile the evidence from both mutational pressures and selection, the prevailing hypothesis for codon bias can be explained by the mutation-selection-drift balance model. This hypothesis states that selection favors major codons over minor codons, but minor codons are able to persist due to mutation pressure and genetic drift. It also suggests that selection is generally weak, but that selection intensity scales to higher expression and more functional constraints of coding sequences. [17]

Effect on RNA secondary structure Edit

Because secondary structure of the 5’ end of mRNA influences translational efficiency, synonymous changes at this region on the mRNA can result in profound effects on gene expression. Codon usage in noncoding DNA regions can therefore play a major role in RNA secondary structure and downstream protein expression, which can undergo further selective pressures. In particular, strong secondary structure at the ribosome-binding site or initiation codon can inhibit translation, and mRNA folding at the 5’ end generates a large amount of variation in protein levels. [18]

Effect on transcription or gene expression Edit

Heterologous gene expression is used in many biotechnological applications, including protein production and metabolic engineering. Because tRNA pools vary between different organisms, the rate of transcription and translation of a particular coding sequence can be less efficient when placed in a non-native context. For an overexpressed transgene, the corresponding mRNA makes a large percent of total cellular RNA, and the presence of rare codons along the transcript can lead to inefficient use and depletion of ribosomes and ultimately reduce levels of heterologous protein production. In addition, the composition of the gene (e.g. the total number of rare codons and the presence of consecutive rare codons) may also affect translation accuracy. [19] [20] However, using codons that are optimized for tRNA pools in a particular host to overexpress a heterologous gene may also cause amino acid starvation and alter the equilibrium of tRNA pools. This method of adjusting codons to match host tRNA abundances, called codon optimization, has traditionally been used for expression of a heterologous gene. However, new strategies for optimization of heterologous expression consider global nucleotide content such as local mRNA folding, codon pair bias, a codon ramp, codon harmonization or codon correlations. [21] [22] With the number of nucleotide changes introduced, artificial gene synthesis is often necessary for the creation of such an optimized gene.

Specialized codon bias is further seen in some endogenous genes such as those involved in amino acid starvation. For example, amino acid biosynthetic enzymes preferentially use codons that are poorly adapted to normal tRNA abundances, but have codons that are adapted to tRNA pools under starvation conditions. Thus, codon usage can introduce an additional level of transcriptional regulation for appropriate gene expression under specific cellular conditions. [22]

Effect on speed of translation elongation Edit

Generally speaking for highly expressed genes, translation elongation rates are faster along transcripts with higher codon adaptation to tRNA pools, and slower along transcripts with rare codons. This correlation between codon translation rates and cognate tRNA concentrations provides additional modulation of translation elongation rates, which can provide several advantages to the organism. Specifically, codon usage can allow for global regulation of these rates, and rare codons may contribute to the accuracy of translation at the expense of speed. [23]

Effect on protein folding Edit

Protein folding in vivo is vectorial, such that the N-terminus of a protein exits the translating ribosome and becomes solvent-exposed before its more C-terminal regions. As a result, co-translational protein folding introduces several spatial and temporal constraints on the nascent polypeptide chain in its folding trajectory. Because mRNA translation rates are coupled to protein folding, and codon adaptation is linked to translation elongation, it has been hypothesized that manipulation at the sequence level may be an effective strategy to regulate or improve protein folding. Several studies have shown that pausing of translation as a result of local mRNA structure occurs for certain proteins, which may be necessary for proper folding. Furthermore, synonymous mutations have been shown to have significant consequences in the folding process of the nascent protein and can even change substrate specificity of enzymes. These studies suggest that codon usage influences the speed at which polypeptides emerge vectorially from the ribosome, which may further impact protein folding pathways throughout the available structural space. [23]

In the field of bioinformatics and computational biology, many statistical methods have been proposed and used to analyze codon usage bias. [24] Methods such as the 'frequency of optimal codons' (Fop), [25] the relative codon adaptation (RCA) [26] or the codon adaptation index (CAI) [27] are used to predict gene expression levels, while methods such as the 'effective number of codons' (Nc) and Shannon entropy from information theory are used to measure codon usage evenness. [28] Multivariate statistical methods, such as correspondence analysis and principal component analysis, are widely used to analyze variations in codon usage among genes. [29] There are many computer programs to implement the statistical analyses enumerated above, including CodonW, GCUA, INCA, etc. Codon optimization has applications in designing synthetic genes and DNA vaccines. Several software packages are available online for this purpose (refer to external links).


D. radiodurans recA isolates carrying gross genome rearrangements were sequenced and their genomes fully assembled de novo with the goal of identifying genome rearrangements and characterizing the D. radiodurans in situ RecA-independent DSB repair. The detected rearrangements consisted of large deletions in chromosome II in all the sequenced recA isolates. Characteristics of the detected DSB repair differed significantly from the SSA repair previously demonstrated in D. radiodurans the detected DSB repair utilized short repeats as opposed to otherwise abundantly present long repeats and worked over larger linear DNA distances from those previously tested. We detected no sequence changes in regions bordering large deletions, i.e. no proof of a NHEJ mechanism, in concordance with literature. Our results suggest that large genome deletions in D. radiodurans recA mutants occur via alternative end-joining (A-EJ) that mechanistically resembles SSA. All the deletions were situated in a similar region of chromosome II, likely due to a combination of several factors: (i) negative selection for rearrangements in other genome regions, (ii) higher occurrence or co-occurrence of DSBs at the terminus region of chromosome II resulting from both the recA genotype and convergence of replication forks, and (iii) negative filtering of isolates possibly carrying smaller-scale genome rearrangements (due to limitations of PFGE as a method for rearrangement detection). Except for the genome rearrangements described above, we found no evidence of other rearrangements in the five sequenced strains. However, our PFGE system for rearrangement detection might have missed clones carrying small scale and/or lethal rearrangements caused by mechanisms other than A-EJ.

The conclusions of our study are limited by the type of experiments we have done. We detect a new DSB repair mechanism in D. radiodurans, but its exact identification relies on matching a limited set of the detected characteristics with characteristics typical for potential mechanisms reported in the literature. Even though reported characteristics of A-EJ best match the observations, additional work is needed to delineate possible functional overlaps or cross-talk with other DNA repair mechanisms, and identify enzymatic functions involved. Our experiments could only detect A-EJ through genome rearrangements unexpectedly, all the detected rearrangements occurred in the similar region of chromosome II, on which non-essential functions tend to be coded. Further experimentation is needed to confirm whether other genomic changes could be associated with the novel mechanism, and whether other genome regions are susceptible to these changes.

Our previous and present results are the first to demonstrate large DNA rearrangements involving only genome sequences naturally present in D. radiodurans cells (Repar et al. 2010 10 this paper). In addition, all the detected rearrangements were observed in living cells thus implying that the underlying A-EJ mechanism contributes to cell survival through DSB repair. Although this contribution might appear negligible compared to that of the RecA-dependent repair mechanisms, the A-EJ pathway may provide a significant add to the survival kit of D. radiodurans, especially when combined with an effective antioxidation protein-protection that is also present in this bacterium 23,24,25 . Indeed, D. radiodurans lacking recA is similarly radiation resistant as wild-type E. coli 70 suggesting that under the conditions of antioxidation protein-protection, RecA-independent DNA repair mechanisms, such as SSA and A-EJ, can significantly contribute to radiation survival.


Figure 1. Bacteriophages attached to a host cell (transmission electron micrograph). In bacteriophage with tails, like the one shown here, the tails serve as a passageway for transmission of the phage genome. (credit: modification of work by Dr. Graham Beards scale-bar data from Matt Russell)

Most bacteriophages are dsDNA viruses, which use host enzymes for DNA replication and RNA transcription. Phage particles must bind to specific surface receptors and actively insert the genome into the host cell. (The complex tail structures seen in many bacteriophages are actively involved in getting the viral genome across the prokaryotic cell wall.) When infection of a cell by a bacteriophage results in the production of new virions, the infection is said to be productive. If the virions are released by bursting the cell, the virus replicates by means of a lytic cycle (Figure 2). An example of a lytic bacteriophage is T4, which infects Escherichia coli found in the human intestinal tract. Sometimes, however, a virus can remain within the cell without being released. For example, when a temperate bacteriophage infects a bacterial cell, it replicates by means of a lysogenic cycle (Figure 2), and the viral genome is incorporated into the genome of the host cell. When the phage DNA is incorporated into the host-cell genome, it is called a prophage . An example of a lysogenic bacteriophage is the λ (lambda) virus, which also infects the E. coli bacterium. Viruses that infect plant or animal cells may sometimes undergo infections where they are not producing virions for long periods. An example is the animal herpesviruses, including herpes simplex viruses, the cause of oral and genital herpes in humans. In a process called latency , these viruses can exist in nervous tissue for long periods of time without producing new virions, only to leave latency periodically and cause lesions in the skin where the virus replicates. Even though there are similarities between lysogeny and latency, the term lysogenic cycle is usually reserved to describe bacteriophages. Latency will be described in more detail in the next section.

Figure 2. A temperate bacteriophage has both lytic and lysogenic cycles. In the lytic cycle, the phage replicates and lyses the host cell. In the lysogenic cycle, phage DNA is incorporated into the host genome, where it is passed on to subsequent generations. Environmental stressors such as starvation or exposure to toxic chemicals may cause the prophage to excise and enter the lytic cycle.

Practice Question

Which of the following statements is false?

  1. In the lytic cycle, new phage are produced and released into the environment.
  2. In the lysogenic cycle, phage DNA is incorporated into the host genome.
  3. An environmental stressor can cause the phage to initiate the lysogenic cycle.
  4. Cell lysis only occurs in the lytic cycle.

Helicobacter Pylori

CagA and the cag PAI

The cag PAI is a chromosomal region of >30 kb, encoding ∼28 genes. Distribution of the cag PAI varies with geographic regions, about 70% of strains from Western countries contain the island, while cag PAI carriage is almost universal in Asia. In contrast, at least one population of H. pylori, termed hpAfrica2, exists in South Africa that always lacks a cag PAI. Possession by a strain of a functional cag PAI is associated with a higher risk of tissue response (greater numbers of inflammatory cells, more induction of proinflammatory cytokines, such as IL-8), and a higher risk of ulcers, mucosal atrophy, and gastric carcinoma. Several of the cag PAI genes encode proteins with homology to components of the T pilus of Agrobacterium tumefaciens, the prototype of a type IV secretion apparatus. Type IV secretion systems are multisubunit nanomachines that can introduce proteins (and/or DNA) into host cells and thereby influence cellular functions. After contact with host cells, using an integrin on the epithelial cell surface as a receptor, the cag type IV apparatus forms a pilus-like appendage that translocates the protein CagA into host epithelial cells. After its delivery into the host cell, CagA becomes phosphorylated by cellular kinases at specific phosphorylation sites (EPIYA motifs), and binds to several target proteins, including SHP-2, Csk, and PAR-1. These interactions, of which some are phosphorylation-dependent and some are phosphorylation-independent, induce multiple events that contribute to cellular responses, such as the morphogenetic changes characteristic of cell infection with cag-positive H. pylori strains, and may ultimately lead to malignant transformation. CagA has therefore been termed a bacterial oncoprotein. About 70% of all H. pylori strains possess the cag PAI.

Future studies

Detection of ATP-dependent DNA ligases in Bacteria has opened up many questions related to the role of these DNA end-joining enzymes in Bacteria. The most important of these is do these enzymes play any role in cellular functions? The prevalence of such sequences in so many unrelated organisms suggests that they do (or did) play a biological role. However, we emphasize that many of the points proposed within this review are based on putative protein function assigned from the analysis of genome sequences. As a high priority, it must be established experimentally that the potential genes identified in each organism encode active DNA ligases. In Eukarya, the various DNA ligases are used in different aspects of DNA metabolism, and it is important to define whether there is a similar division of labour between the various Bacterial isozymes. Furthermore, identification of their cellular role(s) may provide answers to evolutionary questions that are raised by the presence of multiple DNA ligases in Bacteria. For example, why do some Bacteria have several types of DNA ligases when others make do with only one? Also, if some Bacteria use both ATP- and NAD + -dependent DNA ligases, why do Eukarya use only ATP-dependent enzymes? The imminent publication of many genome sequences and the application of proteomic technologies should answer these questions.

Proposals for DNA ligases to be considered as potential targets for novel antibiotics stemmed from the observation that the enzymes found in Bacteria and Eukarya used different cofactors. As NAD + -dependent enzymes have not been found in Eukarya, these enzymes may still provide valuable antibiotic targets. However, the discovery of ATP-dependent DNA ligases in Bacteria prompts re-evaluation of this pharmacological application. Bacterial NAD + -dependent DNA ligases have been studied for more than 30 years, but surprisingly few genetic and biochemical details are known about their regulation. DNA ligases are essential elements in many DNA metabolic pathways in Bacterial cells, but their mode of interaction with other components of these reactions is unknown. Significant breakthroughs in studies of DNA ligases have occurred as a result of the utilization of a broad range of techniques encompassing molecular biology and protein biochemistry. Application of these techniques to proteins from a wider range of organisms will, undoubtedly, provide further insights into these ubiquitous and important enzymes.


Department of Molecular Evolution University of Gdańsk, Wita Stwosza 59, Gdańsk, 80-308, Poland

Agata Jurczak-Kurek & Agata Mieszkowska

Laboratory of Molecular Biology (affiliated with University of Gdańsk), Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Wita Stwosza 59, Gdańsk, 80-308, Poland

Tomasz Gąsior & Alicja Węgrzyn

Department of Molecular Biology, and University of Gdańsk, Wita Stwosza 59, Gdańsk, 80-308, Poland

Bożena Nejman-Faleńczyk, Sylwia Bloch, Aleksandra Dydecka, Gracja Topka, Agnieszka Necel & Grzegorz Węgrzyn

Department of Genetics and Marine Biotechnology, Institute of Oceanology, Polish Academy of Sciences, Powstańców Warszawy 55, Sopot, 81-712, Poland

Magdalena Jakubowska-Deredas & Borys Wróbel

Laboratory of Electron Microscopy, University of Gdańsk, Wita Stwosza 59, Gdańsk, 80-308, Poland

Magdalena Narajczyk & Malwina Richert

Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University in Poznań, Umultowska 89, Poznań, 61-614, Poland