The idea behind ex-novo assisted 3D elucidation is to process a set of spectroscopic data obtained from a sample in order to describe all the possible molecular structures consistent with it while requiring a minimum amount of human intervention.
Our approach in this introductory article will deliberately be focused on giving an overview of the determination of stereochemistry/conformation of completely unknown compounds. These may have been isolated from natural products, or novel, synthesized molecules. As such, the process falls short of full 3D structure elucidation, but is an important point on the road to this larger objective.
Further articles will describe and illustrate how the intermediate challenges are dealt with, to explain how the whole block diagram gets defined, and how the component parts fit together. Significant advances have been made in this field in the past few years, and software has been developed to address more or less successfully many of the associated intermediate goals. Nowadays it is easy to find in the web a number of comprehensive reviews on the subject (1) (2).
The following terms describe the more common and relevant procedures associated with the understanding and use of spectroscopic data and their associated models:
Put simply, we are aiming for data sets encoded with the spatial coordinates of the essential molecular components (atoms) and their bonding information. But achieving this is far from trivial!
Instead of thinking of this endeavour as a methodical search for certainties, it could rather be regarded as an iterative process to add and verify positive hypotheses, based on a series of automatically extracted and user-contributed clues that converge upon a solution. The completeness, or success of this process may depend on the available – possible – data that can be derived for the particular chemical system.
We shall start with preprocessed spectroscopic data and then describe or extract the relevant experimental signals. Then we will associate these with the established molecular model, allowing for contributions from user experience and/or insights. The last step will comprise building a rough structure (with one or many fragments), and arranging and shaping them in such a way that a merit function is optimised. The merit function initially will sum up contributions derived from the interatomic distances (accurate or not) that could be deduced.
To be precise in setting the scene, let us make clear that no fragment-based screening based on chemical shift prediction is intended in this plan, even in the case of underdefined problems (those with very few protons). It is important to be explicit on this as, logically, a combination of 3D and fragment-based methods will be the way to solve the widest number of problems.
Structure can be sufficiently described by a minimum set of interatomic spatial distances. These are themselves dependent on chemical bond lengths, bond angles (between 3 atoms), and torsions (angles plus dihedral angles, between 4 atoms):
Our first task will be to obtain a set of (inaccurate) interatomic distances with these properties:
The information we want to extract is encoded in various kinds of spectral information recorded in different experiments, normally done with the same (chemically and physically stable) sample.
The state-of-the-art in NMR presents us with a wealth of experiments and sophisticated instrumentation. We shall shortly be describing what comprises a minimum set of spectra amenable to the start of an ex-novo elucidation.
Let us first take a look at the characteristics of the most important NMR experiments in this context (this is a less abridged account and some updated guides are avaliable in the web (3)):
Experiment | Information | Derive |
1D proton (1H NMR) | Chemical shifts of proton resonances. Relative intensity (area) of peak/s. Splitting pattern due to scalar spin-spin couplings between protons typically separated 2-4 bonds. | Chemically distinct loci within compound, corresponding chemical shift values, and relative nbr. of proton(s) in each of them. |
2D 1H-1H COSY – Homonuclear Correlated Spectroscopy (and variants) | Nuclei sharing a scalar (J) coupling (usually hydrogen, but could be any high-abundance homonuclear spins like 19F, 31P ) | Vicinal Hs (or other). C-C bonds, by inference. |
2D 1H-1H NOESY and ROESY – Nuclear Overhauser Enhancement Spectroscopy | Proton-proton correlation (mediated by dipolar, through-space coupling) | Interproton distance |
2D X-1H H2BC – Heteronuclear 2-bond Correlation | Medium-range correlation of 1H and a heteronucleus | Vicinal Hs and C-C (or corresp. heteronuclear) bonds. |
2D X-1H HSQC – Heteronuclear Single Quantum Coherence | Single quantum coherence between J-coupled protons and heteronuclei:- proton magnetization (transverse) set to zero during evolution phase- X nuclei BB-decoupled during proton acquisition | Hydrogen and heteronuclei one bond apart. C-H bonds. |
2D X-1H HMBC – Heteronuclear Multiple Bond Correlation | Zero and double quantum coherence between J-coupled protons and heteronuclei | n-bonds apart (2 – 4 or more) hydrogen and heteronuclei |
The bare minimum set of spectra that makes 3D elucidation feasible should provide:
A first approach could begin by acquiring a 2D HSQC X-1H spectrum, which ideally would show all signals corresponding to X atoms (usually C or N) directly bonded to one or more Hs. This would effectively serve to relate (and impose distances between) heteroatoms and Hs. However,
We should therefore obtain at least:
Referencing (or aligning) spectra may seem unimportant or trivial to mention, but because it is absolutely critical for 3D elucidation we feel we must comment on it.
It is very important to consider that in the realm of ex-novo elucidation the goal is not probing the expected existence or absence of a signal in a spectrum in order to confirm or disprove something. Rather it is about being faced with an experimental signal of unknown origin, and bearing in mind that that signal will mean something valuable inasmuch as it is either clear-cut and/or also clearly identified in other spectra, whether directly or on the basis of deductions from the model.
Referencing is so important that the very process of assisted or incremental peak picking, (see below), has an implicit and very convenient method to account for the very small misalignment that may happen in the values and/or the graphical representation of signals after the acquisition and preprocessing stages. In fact, this is not uncommon.
The user must therefore ensure that all the spectra are properly referenced with respect to a reliable reference spectrum (Mnova provides an ‘Absolute Reference’ procedure for referencing all spectra in the document by using the 1D proton spectrum (if present). This can be used to reference all 1H and X-nucleus spectra in 1D and 2D cases. The exact referencing problem is exacerbated in 2D spectra, where digital resolution is often coarse in one dimension, but can in some cases be effectively mitigated by using processing methods such as zero filling or linear prediction – the outcome of these improvements then prompting for a refined referencing.
This is a huge subject itself and this section intends just to touch on the main ideas.
The goal of peak-picking is to distinguish signals and afford chemical shift information of the interesting spin systems in the noisy NMR spectra, while ignoring irrelevant or confusing signals from impurities, artefacts, solvent signals, etc.
It is a mandatory and sensitive stage, and must be done very accurately, as it defines the atom(s)-to-chemical-shift assignments and the relations or connectivities between them.
The main difficulties are:
After finishing the peak-picking stage we can produce a list of some of the components of the molecule of interest. Atom enumeration is about assigning labels to the different chemical shifts found for the NMR active nuclei detected and recorded in the spectra.
An NMR spectrometer records signals produced collectively from an ensemble of physical nuclei in a sample. This sample must be prepared such that a sufficient concentration of the interesting compound/s is present and is chemically stable through the course of the measurements. Usually the compound is chemically pure to a level in excess of ca. 95%, and a fully deuterated solvent is used.
Under the assumption that the outcome of the sample preparation work is a stable, limited number of different compounds present at high concentration in the sample, one can expect that the detected, measurable chemical shifts correspond to (NMR active) nuclei present in these compounds.
So the assertion can be made that the minimum number of nuclei in a chemical species is equal to the number of observable multiplets. A more complete picture emerges when multiplet integrals are considered.
Connectivities are just relations between ‘enumerated’ atoms, established on the basis of the signals picked in spectra, or assignments made by the user. They are grouped in a connectivity table (CT)
We must also distinguish two kinds of connectivities that comprise a standard CT:
Secondary connectivities are intended to let the user introduce relations that cannot be reported in the available spectra. The user might know that a non-detectable relationship exists, and decide to make the corresponding assertion by introducing a secondary connectivity. Or the user could be interested in testing some tentative relation: – for example, to join molecule fragments, he/she would introduce some secondary connectivities, and then carry on with the elucidation to see if the outcome is sensible and in accord with the data.
Let us remind ourselves here that a set of interatomic distances (value and type) must exist before structure generation can occur.
To achieve that, we must ensure that most of the knowledge about molecular structure is included in our model, and deduce as many connectivities between pairs of atoms as we can. The strategy is based on taking an (automated) look at the current connectivities and applying a set of filters and consistency checks upon them, and rules to deduce new ones.
Let us illustrate this idea with some examples. The following illustration (Fig.1) shows the rule to deduce a primary C-C connectivity (strongly resembling a C-C single bond) and having a defined (tight) interatomic distance:
In Fig. 2 we show the rule to deduce two primary connectivities – one of type tight and other say semi-loose – from two existing ones:
We could also consider how secondary connectivities may be introduced by the user. For example, if an (H,H) TOCSY spectrum is available but its resolution only scarcely allows to identify some separate spin systems with high uncertainty, then the user could decide to try introducing certain secondary C-C connectivities (i.e. 1-bond C-C) between the corresponding protonated carbons of the possible spin systems and see if the outcome makes sense. And this is indeed a common example, as the homonuclear (H,H) COSY spectrum often has insufficient resolution to allow a reliable peak-picking and assignments to be performed when many proton resonate close to each other!
Again, structure generation is a huge subject on its own. An intentionally very brief overview is as follows:
Distance Geometry or DG involves calculations where consideration for the spatial coordinates alone is given for the atoms. How these atoms may be connected with bonds is not considered. In the context of NMR and this exercise, the most relevant input data are the internuclear separations and how these may be adjusted so they are consistent with known bond distances and those derived from nuclear Overhauser effect (NOE) measurements.
Some analytical expressions account within DG for the various distance types considered. The distance geometry algorithms try to arrange atoms in 3D space so that the distances derived from experimental data and contributed by the user fit as well as possible with nuclei spatial coordinates.
In conclusion, we have started to lay down the sequence of steps and experiments necessary for this unique approach to structure/conformation elucidation. The practicalities are fraught with difficulties, and we touch on these only briefly for now. But the outcome is satisfyingly (surprisingly?) effective. Successive articles in this series will deal with the steps in more detail and build up a complete picture of the process.
The author has benefitted to a large extent from numerous discussions with Drs. Craig Butts and Jeremy Harvey of School of Chemistry – Univ. of Bristol, and with Dr. Manuel Martín of NMR Service – Univ. of Santiago de Compostela. Their collaboration and continuing scientific input is gratefully acknowledged.
- Bross-Walch, N., Kühn, T., Moskau, D. and Zerbe, O. , Strategies and Tools for Structure Determination of Natural Products Using Modern Methods of NMR Spectroscopy. Chemistry & Biodiversity, 2 2005: 147–177. doi: 10.1002/cbdv.200590000 http://onlinelibrary.wiley.com/doi/10.1002/cbdv.200590000/pdf (accessed Jun 11, 2012).
- Mikhail Elyashberg, Kirill Blinov, Sergey Molodtsov, Yegor Smurnyy, Antony J Williams and Tatiana Churanova. Computer-assisted methods for molecular structure elucidation: realizing a spectroscopist’s dream. Journal of Cheminformatics 2009, 1:3 doi:10.1186/1758-2946-1-3 http://www.jcheminf.com/content/1/1/3 (accessed Jun 11, 2012).
- University of Oxford. Chemistry Research Laboratory. NMR facility. A chemist’s quick guide to NMR acronyms and experiments. http://www.chem.ox.ac.uk/spectroscopy/nmr/acropage.htm (accessed Jun 11, 2012).