Origin of Life Probability

Copyright 1995,1996 G.R. Morton (home.entouch.net/dmd/prob.htm)

ABSTRACT: The probability argument against the random finding of a given sequence is one of the main stays of the anti-evolutionary position. I have noted before that I view that argument as a weak one for a variety of reasons. In this note I will show that the finding of a functional sequence by a random search is quite likely on normal evolutionary time scales. Because of this, and other weaknesses in the traditional apologetic, Christianity needs to move to a more defendable apologetic.

Duane Gish once wrote:

“The highly specific biological activity of each protein is due to the precise way the amino acids are arranged, just as the information conveyed by this sentence is determined by the precise sequence of 190 letters found in it.” ~Duane Gish, “The Origin of Life,” Proc. First Int. Conf. on Creationism, Pittsburgh: Creation Science Fellowship, 1986, p. 62

There is a major problem with that sentence. This is not the only way to state what Gish wanted to state. For instance, he could have written “Biological activity is due to very specific orderings of amino acids as this sentences meaning is due to the 123 letter order.”

This is only a hint of how much variability there is in sequence space in order to convey the same message. There is an amazing flexibility in the language to perform the same task. I once calculated that there are over 330,000 ways to convey the information, “if you pick your nose; you get warts.” These vary from relative pigeonish phrases like "pick nose get wart" to more complex statements, “If you put your digits into your nares, you will contract a hypertrophy of the corium.” There are various orders of this statement. It can be reversed. “To contract a hypertrophy of the corium, place your digits into your nares.” But you can substitute nasal openings, nostrils, nasal passages, for nares. You can get more gross and talk about what you pick and extract. :-) All of sequences were less than 80 in length and I only quit calculating because my imagination played out and I was getting bored.

So the question is, if I wish to convey a certain message, how likely is it that I can find a sequence to perform a given function? There is a way to randomly produce a useful sequence which is not all that improbable.

Let's use a less gross example than the nose picking one above. Lets find a functional sequence to answer the question your wife asked you when you were first married. “What do you want for breakfast?” (and you thought I was going to say something else. tsk tsk.) There are lots of ways to answer this question. What we will do is choose a 70 unit long sequence of 20 letters, ruling out the use of z,q,x,k,v and j. Thus, we have in this 70 unit long sequence 1.18 x 10^91 different possible combinations. Normally the anti-evolutionists say, like Gish, that the likelihood of finding just the correct sequence is too unlikely to occur. This is usually based upon the idea that one and only one sequence will perform the task. This is untrue as we have seen. Even finding 330,000 ways to say I want eggs, does not solve the problem. 330,000 ways to say I want eggs out of 1 x 10^78 is still too improbable for one to consider realistically.

In order to solve the problem we need one other factor. What is the shortest sequence which performs the function? The shortest I can think of is simply “eggs”. But this is not a full sentence and would be too brusque for your bride. So lets say the shortest sentence is “I eat eggs” without the spaces this is a 8 letter sequence.

What I noticed was that with a 2 unit long sequence, i.e., in a 2-d phase space, the sequence ab occurs at only one point out of the 26 x 26 points in a 26 character set. That is 1/676=.0014. If you embed this 2d space into a third (e.g. using a 3 unit long sequence), there are then 26 locations with the sequence ab. There are 26 sequences *ab and 26 sequences ab* for a total of 52 sequences in the phase space. Thus the odds of finding a sequence with ab is 52/17576=.0029, a considerable improvement in the odds of finding ab. Embedding the 2d in a 4d space requires **ab,*ab*,ab** be the sequences desired.(here * is variable) There are 3 x 26^2 in the 4d sequence and thus the odds are .0044 of finding an ab. Each subsequent embedding raises the odds of finding a particular short sequence. It would appear that the equation ought to look something like:


where N is the number of dimensions in the larger phase space, n is the number of dimensions in the smaller phase space and L is the number of characters which can be selected. This equation ignores those sequences which have multiple copies of the desired embedded sequence, but they are a small quantity by comparison and can be safely ignored.

Thus in the search of a 70-d space for a 8-unit sequence (“I eat eggs”), should yield

prob =(70-8+1)(20^(62))/(20^70)=2.4 x 10^-9

This is the probability that you will randomly make a 70 unit long sequence which contains the string “ieateggs” somewhere in it. But one can object that this embedding of the wanted string in another one makes it unlikely to be useful. After all, the string


does not seem to convey much information. But, as is often noted in discussions of the origin of protein or DNA sequences, once formed the sequence is likely to be cut randomly. So what are the odds that a sequence with “iwanteggs” will be cut twice, at just the correct location? If we consider that a sequence that is not cut is equivalent to cutting it past the terminal character of the sequence, there are 71 places you can cut the sequence. Thus for the above sequence, randomly cut, there is a 1/(71*71)= 1/5041 chance of cutting it in such a fashion that the “iwanteggs” statement is extracted. Thus the total probability of finding a useful sequence in the 70 unit long sequence is 4.76 x 10^-13.

How likely are we to find this useful sequence? If we were to assign amino acids to the letters, and write this sequence in proteins, and then create a vat with 10^14 70-amino acid proteins, (which is not at all impossible nor would this occupy a huge space.) you would most likely find 10 of the “ieateggs” sequence in the first vat.

This is not all. The next shortest useful sequence to answer your bride's question is “I want eggs” This is a nine character sequence The odds of finding this sequence in a 70-unit long sequence is 2.40 x 10^-14. In your first vat of proteins there is a high probability that one “iwanteggs” will be found. But there is also the phrase “I like eggs” which is also 9 and has a probability of 2.40 x 10^-14 of being in the vat after each sequence is cut twice. There is also, “I need eggs”, “I wish eggs” and “I have eggs”.

If we look for 10-sequence solutions, we have “I covet eggs”, “I crave eggs”, “I fancy eggs”, “I favor eggs”. Each of these has a probability approximately 10^-15. You would be likely to find one of these in the first 10 vats.

In addition to these, if we go to an 11-length solution, we have phrases like “I ingest eggs”, “I devour eggs”, “I fancy eggs”, “I gobble eggs”. These have a likelihood of 10^-16.

This can go on and on. Within the 70-d space there are hundreds of thousands of ways of saying that you want eggs for breakfast.

One question which can be addressed here is how can a short useable sequence become longer. Well, if you come down to breakfast and say brusquely to your bride, that “I eat eggs”, she might cook them for a few days but eventually she will demand a politer response, like “Dear, I eat eggs”. Small additions from one useable form to another due to selection pressure caused by your hunger pangs when your bride doesn't fix your breakfast, can eventually lead you to say, “My beautiful wife, I am most desirous of eating two eggs this morning” Obviously this sequence has a greater functionality than simply, “I eat eggs”.

Do proteins act in the same fashion as the language above? Yes. Gerald Joyce is one of the leaders in the field of directed evolution. I would point you to Discover, May 1994, “Speeding Through Evolution,” and to Gerald E. Joyce, “Directed Evolution,” Scientific American, Dec. 1992, pp. somewhere around p. 94,95 or Beaudry and Joyce, Science, 257:637-638, 1992.

Sean Eddy of the Washington University School of Medicine recently wrote on Talk Origins, (message <EDDY.95Aug17084136@wol.wustl.edu>) that RNA sequence space is teeming with interesting functionalities. All based upon Joyce's work.

“Thus, the weaknesses in the traditional creationist probability argument is two fold. It assumes that one and only one sequence can perform a given function. And secondly, it assumes that only the most complex forms must be made at first. This ignores the potential of short sequences performing the same function.”

When one adds this weakness to the other weaknesses mentioned over the past few weeks the weakness of our apologetical approach becomes obvious. The problems are:

  1. The amount of genetic variability in humans which requires an ancient humanity in order to fit the Biblical data.
  2. The inability for young-earth creationists to account within their time frame for how the caves could be formed in which fossil man lived.
  3. The fact that fossil man apparently built religious altars of various forms which is unaccounted for by those defending a recent origin of Adam.
  4. The inability of old earth creationists to point to a place and a set of rocks to explain how the flood occurred and how it matches the Biblical account (how could Noah float for a year and land anywhere near mountains?).
  5. Whether one accepts the fossils we discussed in June and July as truly transitional or not, is less important to the apologetical case than what those fossils appear like. If they have the appearance of being transitional forms, all our pleading that these are really NOT transitional forms will fall on deaf ears.

The young earth creationists position Christianity in opposition to almost every piece of observational data science collects, from astronomy, biology, geology, paleontology, physics and anthropology. The PC and TE positions, with a recent creation of man, are much better, but they place Christianity in opposition to certain biomolecular data (MHC and other allelic diversity) and anthropological data (the nature of fossil man) as noted above.

It is very obvious that the positions we are defending apologetically, are not very secure.

The question those interested in Christian apologetics and the relation between science and the early chapters of Genesis should ask themselves, is whether the purpose of the Christian apologist is to explain the observational data in a Biblical framework or to explain the data away. These are two very different approaches. But if the probability argument against evolution is as weak as I showed above, Christianity had best find a better way to handle the area of Science and the Bible.

No comments:

Post a Comment