|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551 |
- title: Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine
- url: https://berthub.eu/articles/posts/reverse-engineering-source-code-of-the-biontech-pfizer-vaccine/
- hash_url: 66ef8e7fa0942fc975723f7df4d932e9
-
- <p>Welcome! In this post, we’ll be taking a character-by-character look at the
- source code of the BioNTech/Pfizer SARS-CoV-2 mRNA vaccine.</p>
-
- <blockquote>
- <p><em>I want to thank the large cast of people who spent time previewing this
- article for legibility and correctness. All mistakes remain mine though,
- but I would love to hear about them quickly at bert@hubertnet.nl or
- <a href="https://twitter.com/PowerDNS_Bert" target="_blank">@PowerDNS_Bert</a></em></p>
- </blockquote>
-
- <p>Now, these words may be somewhat jarring - the vaccine is a liquid that gets
- injected in your arm. How can we talk about source code?</p>
-
- <p>This is a good question, so let’s start off with a small part of the very
- source code of the BioNTech/Pfizer vaccine, also known as
- <a href="https://en.wikipedia.org/wiki/Tozinameran" target="_blank">BNT162b2</a>, also
- known as Tozinameran <a href="https://twitter.com/PowerDNS_Bert/status/1342109138965422083" target="_blank">also known as
- Comirnaty</a>.</p>
-
- <p></p><center>
- <figure>
- <img src="/articles/bnt162b2.png" alt="First 500 characters of the BNT162b2 mRNA. Source: World Health Organization"> <figcaption>
- <p>First 500 characters of the BNT162b2 mRNA. Source: <a href="https://mednet-communities.net/inn/db/media/docs/11889.doc" target="_blank">World Health Organization</a></p>
- </figcaption>
- </figure>
-
- </center>
-
- <p>The BNT162b2 mRNA vaccine has this digital code at its heart. It is 4284
- characters long, so it would fit in a bunch of tweets. At the very
- beginning of the vaccine production process, someone uploaded this code to a
- DNA printer (yes), which then converted the bytes on disk to actual DNA
- molecules.</p>
-
- <p></p><center>
- <figure>
- <img src="/articles/bioxp-3200.jpg" alt="A Codex DNA BioXp 3200 DNA printer"> <figcaption>
- <p>A <a href="https://codexdna.com/products/bioxp-system/" target="_blank">Codex DNA</a> BioXp 3200 DNA printer</p>
- </figcaption>
- </figure>
-
- </center>
-
- <p>Out of such a machine come tiny amounts of DNA, which after a lot of
- biological and chemical processing end up as RNA (more about which later) in
- the vaccine vial. A 30 microgram dose turns out to actually contain 30
- micrograms of RNA. In addition, there is a clever lipid (fatty) packaging
- system that gets the mRNA into our cells.</p>
-
- <p>RNA is the volatile ‘working memory’ version of DNA. DNA is like the flash
- drive storage of biology. DNA is very durable, internally redundant and
- very reliable. But much like computers do not execute code directly from a
- flash drive, before something happens, code gets copied to a faster,
- more versatile yet far more fragile system.</p>
-
- <p>For computers, this is RAM, for biology it is RNA. The resemblance is
- striking. Unlike flash memory, RAM degrades very quickly unless lovingly
- tended to. The reason the Pfizer/BioNTech mRNA vaccine must be stored in the
- deepest of deep freezers is the same: RNA is a fragile flower.</p>
-
- <p>Each RNA character weighs on the order of 0.53·10â»Â²Â¹ grams, meaning
- there are around 6·10¹ⶠcharacters in a single 30 microgram vaccine dose.
- Expressed in bytes, this is around 14 petabytes, although it must be said
- this consists of around <a href="https://docs.google.com/spreadsheets/d/1vc6p9IXQVRpVQntcI1tCdSMLNDuT8fl8rags0gDxMZA/edit?usp=sharing" target="_blank">13,000 billion
- repetitions</a> of the same 4284
- characters. The actual informational content of the vaccine is just over a
- kilobyte. <a href="https://www.ncbi.nlm.nih.gov/projects/sviewer/?id=NC_045512&tracks=%5Bkey:sequence_track,name:Sequence,display_name:Sequence,id:STD649220238,annots:Sequence,ShowLabel:false,ColorGaps:false,shown:true,order:1%5D%5Bkey:gene_model_track,name:Genes,display_name:Genes,id:STD3194982005,annots:Unnamed,Options:ShowAllButGenes,CDSProductFeats:true,NtRuler:true,AaRuler:true,HighlightMode:2,ShowLabel:true,shown:true,order:9%5D&v=1:29903&c=null&select=null&slim=0" target="_blank">SARS-CoV-2 itself</a> weighs in at around 7.5 kilobytes.</p>
-
- <blockquote>
- <p>Update: In the original post these numbers were off. <a href="https://docs.google.com/spreadsheets/d/1vc6p9IXQVRpVQntcI1tCdSMLNDuT8fl8rags0gDxMZA/edit?usp=sharing" target="_blank">Here is a
- spreadsheet</a>
- with the correct calculations.</p>
- </blockquote>
-
- <h2 id="the-briefest-bit-of-background">The briefest bit of background</h2>
-
- <p>DNA is a digital code. Unlike computers, which use 0 and 1, life uses A, C, G
- and U/T (the ‘nucleotides’, ‘nucleosides’ or ‘bases’).</p>
-
- <p>In computers we store the 0 and 1 as the presence or absence of a charge, or
- as a current, as a magnetic transition, or as a voltage, or as a modulation
- of a signal, or as a change in reflectivity. Or in short, the 0 and 1 are
- not some kind of abstract concept - they live as electrons and in many other
- physical embodiments.</p>
-
- <p>In nature, A, C, G and U/T are molecules, stored as chains in DNA (or RNA).</p>
-
- <p>In computers, we group 8 bits into a byte, and the byte is the typical unit
- of data being processed.</p>
-
- <p>Nature groups 3 nucleotides into a codon, and this codon is the typical unit
- of processing. A codon contains 6 bits of information (2 bits per DNA
- character, 3 characters = 6 bits. This means 2ⶠ= 64 different codon values).</p>
-
- <p>Pretty digital so far. When in doubt, <a href="https://mednet-communities.net/inn/db/media/docs/11889.doc" target="_blank">head to the WHO
- document</a> with the
- digital code to see for yourself.</p>
-
- <blockquote>
- <p><em>Some further reading is <a href="https://berthub.eu/articles/posts/what-is-life/" target="_blank">available
- here</a> - this link (‘What
- is life’) might help make sense of the rest of this page. Or, if you like
- video, I have <a href="https://berthub.eu/dna" target="_blank">two hours for you</a>.</em></p>
- </blockquote>
-
- <h2 id="so-what-does-that-code-do">So what does that code DO?</h2>
-
- <p>The idea of a vaccine is to teach our immune system how to fight a pathogen,
- without us actually getting ill. Historically this has been done by
- injecting a weakened or incapacitated (attenuated) virus, plus an ‘adjuvant’
- to scare our immune system into action. This was a decidedly analogue
- technique involving billions of eggs (or insects). It also required a lot
- of luck and loads of time. Sometimes a different (unrelated) virus was also
- used.</p>
-
- <p>An mRNA vaccine achieves the same thing (‘educate our immune system’) but in
- a laser like way. And I mean this in both senses - very narrow but also
- very powerful.</p>
-
- <p>So here is how it works. The injection contains volatile genetic material
- that describes the famous SARS-CoV-2 ‘Spike’ protein. Through clever
- chemical means, the vaccine manages to get this genetic material into some of
- our cells.</p>
-
- <p>These then dutifully start producing SARS-CoV-2 Spike proteins in large
- enough quantities that our immune system springs into action. Confronted
- with Spike proteins, and (importantly) tell-tale signs that cells have been
- taken over, our immune system develops a powerful response against multiple
- aspects of the Spike protein AND the production process.</p>
-
- <p>And this is what gets us to the 95% efficient vaccine.</p>
-
- <h2 id="the-source-code">The source code!</h2>
-
- <p><a href="https://youtu.be/jp0opnxQ4rY?t=8" target="_blank">Let’s start at the very beginning, a very good place
- to start</a>. The WHO document has this
- helpful picture:</p>
-
- <p></p><center>
- <figure>
- <img src="/articles/vaccine-toc.png">
- </figure>
-
- </center>
-
- <p>This is a sort of table of contents. We’ll start with the ‘cap’, actually
- depicted as a little hat.</p>
-
- <p>Much like you can’t just plonk opcodes in a file on a computer and run it,
- the biological operating system requires headers, has linkers and things
- like calling conventions.</p>
-
- <p>The code of the vaccine starts with the following two nucleotides:</p>
-
- <pre><code>GA
- </code></pre>
-
- <p>This can be compared very much to every <a href="https://en.wikipedia.org/wiki/DOS_MZ_executable" target="_blank">DOS and Windows executable starting
- with MZ</a>, or UNIX scripts starting with
- <a href="https://en.wikipedia.org/wiki/Shebang_(Unix)" target="_blank"><code>#!</code></a>. In both life and
- operating systems, these two characters are not executed in any way. But
- they have to be there because otherwise nothing happens.</p>
-
- <p>The mRNA ‘cap’ <a href="https://en.wikipedia.org/wiki/Five-prime_cap#Function" target="_blank">has a number of
- functions</a>. For one, it marks code as coming
- from the nucleus. In our case of course it doesn’t, our code comes from a
- vaccination. But we don’t need to tell the cell that. The cap makes our code
- look legit, which protects it from destruction.</p>
-
- <p>The initial two <code>GA</code> nucleotides are also chemically slightly different from
- the rest of the RNA. In this sense, the <code>GA</code> has some out-of-band
- signaling on it.</p>
-
- <h2 id="the-five-prime-untranslated-region">The “five-prime untranslated region”</h2>
-
- <p>Some lingo here. RNA molecules can only be read in one direction.
- Confusingly, the part where the reading begins is called the 5’ or
- ‘five-prime’. The reading stops at the 3’ or three-prime end.</p>
-
- <p>Life consists of proteins (or things made by proteins). And these proteins
- are described in RNA. When RNA gets converted into proteins, this is called
- translation.</p>
-
- <p>Here we have the 5’ untranslated region (‘UTR’), so this bit does not end up
- in the protein:</p>
-
- <pre><code>GAAΨAAACΨAGΨAΨΨCΨΨCΨGGΨCCCCACAGACΨCAGAGAGAACCCGCCACC
- </code></pre>
-
- <p>Here we encounter our first surprise. The normal RNA characters are A, C, G
- and U. U is also known as ’T’ in DNA. But here we find a Ψ, what is going
- on?</p>
-
- <p>This is one of the exceptionally clever bits about the vaccine. Our body
- runs a powerful antivirus system (“the original one”). For this reason,
- cells are extremely unenthusiastic about foreign RNA and try very hard to
- destroy it before it does anything.</p>
-
- <p>This is somewhat of a problem for our vaccine - it needs to sneak past our
- immune system. Over many years of experimentation, it was found that if the
- U in RNA is replaced by a slightly modified molecule, our immune system
- loses interest. For real.</p>
-
- <p>So in the BioNTech/Pfizer vaccine, every U has been replaced by
- 1-methyl-3’-pseudouridylyl, denoted by Ψ. The really clever bit is that
- although this replacement Ψ placates (calms) our immune system, it is
- accepted as a normal U by relevant parts of the cell.</p>
-
- <p>In computer security we also know this trick - it sometimes is possible to
- transmit a slightly corrupted version of a message that confuses firewalls and
- security solutions, but that is still accepted by the backend servers -
- which can then get hacked.</p>
-
- <p>We are now reaping the benefits of fundamental scientific research performed
- in the past. The
- <a href="https://twitter.com/PennMedicine/status/1341766354232365059" target="_blank">discoverers</a>
- of this Ψ technique had to fight to get
- <a href="https://www.statnews.com/2020/11/10/the-story-of-mrna-how-a-once-dismissed-idea-became-a-leading-technology-in-the-covid-vaccine-race/" target="_blank">their</a>
- work funded and then accepted. We should all be very grateful, and I am sure
- the <a href="https://twitter.com/PowerDNS_Bert/status/1329861047168225281" target="_blank">Nobel prizes will arrive in due
- course</a>.</p>
-
- <blockquote>
- <p>Many people have asked, could viruses also use the Ψ technique to beat our
- immune systems? In short, this is extremely unlikely. Life simply does
- not have the machinery to build 1-methyl-3’-pseudouridylyl nucleotides.
- Viruses rely on the machinery of life to reproduce themselves, and this
- facility is simply not there. The mRNA vaccines quickly degrade in the
- human body, and there is no possibility of the Ψ-modified RNA
- replicating with the Ψ still in there. “<a href="https://www.deplatformdisease.com/blog/no-really-mrna-vaccines-are-not-going-to-affect-your-dna" target="_blank">No, Really, mRNA Vaccines Are Not Going To Affect Your
- DNA</a>“
- is also a good read.</p>
- </blockquote>
-
- <p>Ok, back to the 5’ UTR. What do these 51 characters do? As everything in
- nature, almost nothing has one clear function.</p>
-
- <p>When our cells need to <em>translate</em> RNA into proteins, this is done using a
- machine called the ribosome. The ribosome is like a 3D printer for
- proteins. It ingests a strand of RNA and based on that it emits a string of
- amino acids, which then fold into a protein.</p>
-
- <p></p><center>
- <video controls loop>
- <source src="/articles/protein-short.mp4" type="video/mp4">
- </source></video>
- <br>
- Source: <a href="https://commons.wikimedia.org/wiki/File:Protein_translation.gif" target="_blank">Wikipedia user Bensaccount</a>
- </center>
-
- <p>This is what we see happening above. The black ribbon at the bottom is RNA.
- The ribbon appearing in the green bit is the protein being formed. The
- things flying in and out are amino acids plus adaptors to make them fit on
- RNA.</p>
-
- <p>This ribosome needs to physically sit on the RNA strand for it to get to
- work. Once seated, it can start forming proteins based on further RNA it
- ingests. From this, you can imagine that it can’t yet read the parts where
- it lands on first. This is just one of the functions of the UTR: the
- ribosome landing zone. The UTR provides ‘lead-in’.</p>
-
- <p>In addition to this, the UTR also contains metadata: when should translation
- happen? And how much? For the vaccine, they took the most ‘right now’ UTR
- they could find, taken from the <a href="https://www.tandfonline.com/doi/full/10.1080/15476286.2018.1450054" target="_blank">alpha globin
- gene</a>.
- This gene is known to robustly produce a lot of proteins. In previous
- years, scientists had already found ways to optimize this UTR even further
- (according to the WHO document), so this is not quite the alpha globin UTR.
- It is better.</p>
-
- <h2 id="the-s-glycoprotein-signal-peptide">The S glycoprotein signal peptide</h2>
-
- <p>As noted, the goal of the vaccine is to get the cell to produce copious
- amounts of the Spike protein of SARS-CoV-2. Up to this point, we have mostly
- encountered metadata and “calling convention” stuff in the vaccine source
- code. But now we enter the actual viral protein territory.</p>
-
- <p>We still have one layer of metadata to go however. Once the ribosome (from the
- splendid animation above) has made a protein, that protein still needs to go
- somewhere. This is encoded in the “S glycoprotein signal peptide (extended leader
- sequence)“.</p>
-
- <p>The way to see this is that at the beginning of the protein there is a sort
- of address label - encoded as part of the protein itself. In this specific
- case, the signal peptide says that this protein should exit the cell via the
- “endoplasmic reticulum”. Even Star Trek lingo is not as fancy as this!</p>
-
- <p>The “signal peptide” is not very long, but when we look at the code, there
- are differences between the viral and vaccine RNA:</p>
-
- <p>(Note that for comparison purposes, I have replaced the fancy modified Ψ by a
- regular RNA U)</p>
-
- <pre><code> 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
- Virus: AUG UUU GUU UUU CUU GUU UUA UUG CCA CUA GUC UCU AGU CAG UGU GUU
- Vaccine: AUG UUC GUG UUC CUG GUG CUG CUG CCU CUG GUG UCC AGC CAG UGU GUG
- ! ! ! ! ! ! ! ! ! ! ! ! ! !
- </code></pre>
-
- <p>So what is going on? I have not accidentally listed the RNA in groups of 3
- letters. Three RNA characters make up a codon. And every codon encodes for a
- specific amino acid. The signal peptide in the vaccine consists of <em>exactly</em>
- the same amino acids as in the virus itself.</p>
-
- <p>So how come the RNA is different?</p>
-
- <p>There are 4³=64 different codons, since there are 4 RNA characters, and
- there are three of them in a codon. Yet there are only 20 different
- amino acids. This means that multiple codons encode for the same amino acid.</p>
-
- <p>Life uses the following nearly universal table for mapping RNA codons to
- amino acids:</p>
-
- <p></p><center>
- <figure>
- <img src="/articles/rna-codon-table.png" alt="The RNA codon table (Wikipedia)"> <figcaption>
- <p><a href="https://en.wikipedia.org/wiki/DNA_and_RNA_codon_tables" target="_blank">The RNA codon table</a> (Wikipedia)</p>
- </figcaption>
- </figure>
-
- </center>
-
- <p>In this table, we can see that the modifications in the vaccine (UUU ->
- UUC) are all <em>synonymous</em>. The vaccine RNA code is different, but the same
- amino acids and the same protein come out.</p>
-
- <p>If we look closely, we see that the majority of the changes happen in the
- third codon position, noted with a ‘3’ above. And if we check the universal
- codon table, we see that this third position indeed often does not matter
- for which amino acid is produced.</p>
-
- <p>So, the changes are synonymous, but then why are they there? Looking
- closely, we see that all changes <em>except one</em> lead to more C and Gs.</p>
-
- <p>So why would you do that? As noted above, our immune system takes a very dim
- view of ‘exogenous’ RNA, RNA code coming from outside the cell. To evade
- detection, the ‘U’ in the RNA was already replaced by a Ψ.</p>
-
- <p>However, it turns out that RNA with <a href="https://www.nature.com/articles/nrd.2017.243" target="_blank">a higher
- amount</a> of Gs and Cs is
- also <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1463026/" target="_blank">converted more efficiently into
- proteins</a>,</p>
-
- <p>And this has been achieved in the vaccine RNA by replacing many characters
- with Gs and Cs wherever this was possible.</p>
-
- <blockquote>
- <p>I’m slightly fascinated by the <em>one</em> change that did not lead to an
- additional C or G, the CCA -> CCU modification. If anyone knows the reason,
- please let me know! Note that I’m aware that some codons are more common
- than others in the human genome, but <a href="https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006024" target="_blank">I also read that this does not
- influence translation speed a
- lot</a>.</p>
- </blockquote>
-
- <h2 id="the-actual-spike-protein">The actual Spike protein</h2>
-
- <p>The next 3777 characters of the vaccine RNA are similarly ‘codon optimized’
- to add a lot of C’s and G’s. In the interest of space I won’t list all
- the code here, but we are going to zoom in on one exceptionally special
- bit. This is the bit that makes it work, the part that will actually help us
- return to life as normal:</p>
-
- <pre><code> * *
- L D K V E A E V Q I D R L I T G
- Virus: CUU GAC AAA GUU GAG GCU GAA GUG CAA AUU GAU AGG UUG AUC ACA GGC
- Vaccine: CUG GAC CCU CCU GAG GCC GAG GUG CAG AUC GAC AGA CUG AUC ACA GGC
- L D P P E A E V Q I D R L I T G
- ! !!! !! ! ! ! ! ! ! !
- </code></pre>
-
- <p>Here we see the usual synonymous RNA changes. For example, in the first
- codon we see that CUU is changed into CUG. This adds another ‘G’ to the
- vaccine, which we know helps enhance protein production. Both CUU
- and CUG encode for the amino acid ‘L’ or Leucine, so nothing changed in the
- protein.</p>
-
- <p>When we compare the entire Spike protein in the vaccine, all changes are
- synonymous like this.. except for two, and this is what we see here.</p>
-
- <p>The third and fourth codons above represent actual changes. The K and V
- amino acids there are both replaced by ‘P’ or Proline. For ‘K’ this required
- three changes (‘!!!’) and for ‘V’ it required only two (‘!!’).</p>
-
- <p><strong>It turns out that these two changes enhance the vaccine efficiency
- enormously</strong>.</p>
-
- <p>So what is happening here? If you look at a real SARS-CoV-2 particle, you
- can see the Spike protein as, well, a bunch of spikes:</p>
-
- <p></p><center>
- <figure>
- <img src="/articles/sars-em.jpg" alt="SARS virus particles (Wikipedia)"> <figcaption>
- <p><a href="https://en.wikipedia.org/wiki/Severe_acute_respiratory_syndrome_coronavirus" target="_blank">SARS virus particles</a> (Wikipedia)</p>
- </figcaption>
- </figure>
-
- </center>
-
- <p>The spikes are mounted on the virus body (‘the nucleocapsid protein’). But
- the thing is, our vaccine is only generating the spikes itself, and we’re
- not mounting them on any kind of virus body.</p>
-
- <p>It turns out that, unmodified, freestanding Spike proteins collapse into a
- different structure. If injected as a vaccine, this would indeed cause our
- bodies to develop immunity.. but only against the collapsed spike protein.</p>
-
- <p>And the real SARS-CoV-2 shows up with the spiky Spike. The vaccine would not
- work very well in that case.</p>
-
- <p>So what to do? In <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5584442/" target="_blank">2017 it was described how putting a double Proline
- substitution in just the right
- place</a> would make the
- SARS-CoV-1 and MERS
- S proteins take up their ‘pre-fusion’ configuration, even without being part of
- the whole virus. This works <a href="https://cen.acs.org/pharmaceuticals/vaccines/tiny-tweak-behind-COVID-19/98/i38" target="_blank">because Proline is a very rigid amino
- acid</a>. It
- acts as a kind of splint, stabilising the protein in the state we need to
- show to the immune system.</p>
-
- <p>The <a href="https://twitter.com/goodwish916" target="_blank">people</a> that
- <a href="https://twitter.com/KizzyPhD" target="_blank">discovered</a> this should be walking
- around high-fiving themselves incessantly. Unbearable amounts of smugness
- should be emanating from them. <a href="https://twitter.com/McLellan_Lab/status/1291077489566142464" target="_blank">And it would all be well
- deserved</a>.</p>
-
- <blockquote>
- <p>Update! I have been contacted by the <a href="https://twitter.com/McLellan_Lab/status/1291077489566142464" target="_blank">McLellan
- lab</a>, one of the
- groups behind the Proline discovery. They tell me the high-fiving is
- subdued because of the ongoing pandemic, but they are pleased to have
- contributed to the vaccines. They also stress the importance of many other
- groups, workers and volunteers.</p>
- </blockquote>
-
- <h2 id="the-end-of-the-protein-next-steps">The end of the protein, next steps</h2>
-
- <p>If we scroll through the rest of the source code, we encounter some small
- modifications at the end of the Spike protein:</p>
-
- <pre><code> V L K G V K L H Y T s
- Virus: GUG CUC AAA GGA GUC AAA UUA CAU UAC ACA UAA
- Vaccine: GUG CUG AAG GGC GUG AAA CUG CAC UAC ACA UGA UGA
- V L K G V K L H Y T s s
- ! ! ! ! ! ! ! !
- </code></pre>
-
- <p>At the end of a protein we find a ‘stop’ codon, denoted here by a lowercase
- ’s’. This is a polite way of saying that the protein should end here. The
- original virus uses the UAA stop codon, the vaccine uses two UGA stop
- codons, perhaps just for good measure.</p>
-
- <h2 id="the-3-untranslated-region">The 3’ Untranslated Region</h2>
-
- <p>Much like the ribosome needed some lead-in at the 5’ end, where we found the
- ‘five prime untranslated region’, at the end of a protein coding region we find a similar
- construct called the 3’ UTR.</p>
-
- <p>Many words could be written about the 3’ UTR, but here I quote <a href="https://en.wikipedia.org/wiki/Three_prime_untranslated_region" target="_blank">what the
- Wikipedia
- says</a>: “The 3’-untranslated region plays a crucial role in gene
- expression by influencing the localization, stability, export, and
- translation efficiency of an mRNA .. <strong>despite our current understanding of
- 3’-UTRs, they are still relative mysteries</strong>”.</p>
-
- <p>What we do know is that certain 3’-UTRs are very successful at promoting
- protein expression. According to the WHO document, the BioNTech/Pfizer
- vaccine 3’-UTR was picked from “the amino-terminal enhancer of split (AES)
- mRNA and the mitochondrial encoded 12S ribosomal RNA to confer RNA stability
- and high total protein expression”. To which I say, well done.</p>
-
- <p></p><center>
- <figure>
- <img src="/articles/vaccine.jpg">
- </figure>
-
- </center>
-
- <h2 id="the-aaaaaaaaaaaaaaaaaaaaaa-end-of-it-all">The AAAAAAAAAAAAAAAAAAAAAA end of it all</h2>
-
- <p>The very end of mRNA is polyadenylated. This is a fancy way of saying it
- ends on a lot of AAAAAAAAAAAAAAAAAAA. Even mRNA has had enough of 2020 it
- appears.</p>
-
- <p>mRNA can be reused many times, but as this happens, it also loses some of
- the A’s at the end. Once the A’s run out, the mRNA is no longer functional
- and gets discarded. In this way, the ‘poly-A’ tail is protection from
- degradation.</p>
-
- <p>Studies have been done to find out what the optimal number of A’s at the end
- is for mRNA vaccines. I read in the open literature that this peaked at 120
- or so.</p>
-
- <p>The BNT162b2 vaccine ends with:</p>
-
- <pre><code> ****** ****
- UAGCAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAGCAUAU GACUAAAAAA AAAAAAAAAA
- AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAA
- </code></pre>
-
- <p>This is 30 A’s, then a “10 nucleotide linker” (GCAUAUGACU), followed by another 70
- A’s.</p>
-
- <p>There are various theories why this linker is there. Some people tell me it
- has to do with DNA plasmid stability, I have also received this from an
- actual expert:</p>
-
- <p>“The 10-nucleotide linker within the poly(A) tail makes it easier to stitch
- together the synthetic DNA fragments that become the template for transcribing
- the mRNA. It also reduces slipping by T7 RNA polymerase so that the
- transcribed mRNA is more uniform in length”.</p>
-
- <h2 id="summarising">Summarising</h2>
-
- <p>With this, we now know the exact mRNA contents of the BNT162b2 vaccine, and
- for most parts we understand why they are there:</p>
-
- <ul>
- <li>The CAP to make sure the RNA looks like regular mRNA</li>
- <li>A known successful and optimized 5’ untranslated region (UTR)</li>
- <li>A codon optimized signal peptide to send the Spike protein to the right
- place (copied 100% from the original virus)</li>
- <li>A codon optimized version of the original spike, with two ‘Proline’
- substitutions to make sure the protein appears in the right form</li>
- <li>A known successful and optimized 3’ untranslated region</li>
- <li>A slightly mysterious poly-A tail with an unexplained ‘linker’ in there</li>
- </ul>
-
- <p>The codon optimization adds a lot of G and C to the mRNA. Meanwhile, using Ψ
- (1-methyl-3’-pseudouridylyl) instead of U helps evade our immune system, so
- the mRNA stays around long enough so we can actually help train the immune
- system.</p>
-
- <h2 id="further-reading-viewing">Further reading/viewing</h2>
-
- <p>In 2017 I held a two hour presentation on DNA, which you can <a href="https://berthub.eu/dna" target="_blank">view
- here</a>. Like this page it is aimed at computer
- people.</p>
-
- <p>In addition, I’ve been maintaining a page on ‘<a href="https://berthub.eu/amazing-dna" target="_blank">DNA for
- programmers</a>’ since 2001.</p>
-
- <p>You might also enjoy <a href="https://berthub.eu/articles/posts/immune-system/" target="_blank">this introduction to our amazing immune
- system</a>.</p>
-
- <p>Finally, <a href="https://berthub.eu/articles" target="_blank">this listing of my blog posts</a> has quite some
- DNA, SARS-CoV-2 and COVID related material.</p>
|