Single crystal x-ray diffraction experiment backwards: from CIF to genuine set of raw data without performing actual experiment

Question

Single crystal x-ray diffraction stands among the most complex yet trustworthy analytical techniques when it comes to structure determination of solids. But how hard it would be to simulate(fake) the entire raw dataset, including all reflections, possible artifacts, experiment strategy etc. in order to fit a certain given structure? Is it even theoretically possible? Wouldn't it happen at some day that with all the computational power and trained neural networks XRD suddenly looses its good reputation?

Here is a brief report from the editorial section of Acta Cryst. E which overviews fraudulent content, which is still relatively easily detectable with the help of tools like CheckCIF even without looking at raw data. But can one commit a "perfect crime" by faking raw data entirely?

I have a vague apprehension this is going to be moved to Academia.SE or elsewhere. Long story short, yes, XRD is trustworthy, but not because it is hard to fake (which, arguably, is not all that hard). — Ivan Neretin, Jun 28 '17 at 10:28
I'm completely OK with that, though I was thinking that maybe anyone could point to an existing precedent, or state whether it is doable at all, and if yes, then how. Maybe I should've asked this on chemport.ru instead:) — andselisk, Jun 28 '17 at 10:37
@andselisk On terminology: In the experiment you refer to, the CCDs (and other sensors) on a diffractometer record the diffraction intensities, rather than the reflection intensities. Regarding your theoretical question ... a short answer is "yes". — Buttonwood, Jun 28 '17 at 11:18
@Buttonwood From my knowledge it's actually exactly the reflections intensities. I specifically just checked CrysAlisPro and PLATON manuals for that. I've heard people using diffraction intensity when it comes to powder XRD, but in single-crystal XRD it's reflections what you are working with, indexing, applying weight scheme etc. Answering yes, could you probably elaborate more in a form of answer in details? — andselisk, Jun 28 '17 at 11:55
Simulation of XRD patterns are straightforward (both powder and single crystal). Powder diffractions can be easily obtained even with e,g. Mercury. Would you elaborate the question: why do you think it is a challenge? — Greg, Jun 28 '17 at 12:36
@Greg I'm not asking about powder XRD. The title has "Single crystal x-ray diffraction" in it, I thought it was explicitly enough. — andselisk, Jun 28 '17 at 12:38
@andselisk Ok, ShelX lists the data under "reflection", yet the title of the question joins with "diffraction" ... why not "intensity data". Well, perhaps "diffraction" is more general (http://pd.chem.ucl.ac.uk/pdnn/diff2/structf.htm) here? While the original question is about single crystal diffraction, I intentionally included powder diffraction as the two share the principle to minimize the difference of experimentally accessible electron density map and calculated electron density map; the former recorded by CCD or point detector. — Buttonwood, Jun 28 '17 at 12:56
I guess I never thought that x-ray diffraction was particularly complex, and should be trusted no further than any of a number of other experimental techniques. Bottom line, if somebody sets out to consciously do wrong, they can for quite some time, depending on how important what they do is to other people. On the flip side, there are a variety of alloy phases where the exact crystal structure is still being debated, a 100 years after they were first discovered (some are really hard to pin down). — Jon Custer, Jun 28 '17 at 12:58
@andselisk I said BOTH powder and single crystal... It is still not clear what is your asking / why do think calculating reflection from crystal structure would be any more challenging than the inverse problem? — Greg, Jun 28 '17 at 13:09
I do recall a case of someone faking quite a few submissions to Acta Cryst E, so faking them is quite possible. I don't have a citation for that though, so can't give you a posted answer. — Canageek, Jun 28 '17 at 16:43
Analytical methods are not "trustworthy", this adjective applies to humans and collections based on human decisions. And of course can you fabricate data of any analytical method for a fake substance. Any exact (artificial) 3D electron density distribution has a clearly defined diffraction pattern. It's the opposite direction that is not as simple. — Karl, Jun 28 '17 at 19:17
@JonCuster alloys are rarely the subject of SC-XRD, at least in the industrial scale, I agree. Powder XRD and x-ray dynamic defectoscopy are used way more often instead. But I don't just want to focus on particular case, I would like to find a unified answer for both small molecules and proteins single crystal diffraction. — andselisk, Jun 29 '17 at 00:15
@Greg it's not just about just calculating the reflections, but faking the entire dataset, including adsorption correction, reverse-engineer experiment strategy for goniometer, broadening of diffraction beams, cryojet parameters, CCD frames etc. — andselisk, Jun 29 '17 at 00:20
@Karl True, but partially. Commercially available SC-diffractometers where you can just mount a crystal and in a span of few hours you get the structure submitted and registered exist for decade or so. It doesn't work in all cases, more often than not human intervention is required, but not necessarily required anymore in simple cases. And yes, it is the opposite direction I'm curios about. — andselisk, Jun 29 '17 at 00:25
@Canageek This is a good example, if I find a decent link to that case I'll add it to the question. But these fakes were quite obvious and they tended to serve more like an advertisement for checkCIF utility as they primarily contained improper atom assignments, ridiculous ORTEP plots and one didn't even need to have access to their raw data to prove these persons wrong. I'm asking about a "perfect crime" if I'm allowed to say so, where it's going to be impossible to find evidence even on a data collection level (which is 100% simulated). — andselisk, Jun 29 '17 at 00:30
Faking XRD data is not a perfect crime, but stupid, expecially if it is submitted to some database, where the fake will definitely get detected sooner or later. Or pointless, how would you know the structure is indeed 100% correct if you don't have an xray pattern? — Karl, Jun 29 '17 at 17:11
@Karl Submitting the dataset (which is several Gb in size) to database is not required as of now. You submit the result of your refinement - usually, a CIF file. Again, I'm speaking about single-crystal x-ray diffraction, not the powder one. You used the term "xray pattern", that arouse my suspicion that you don't fully understand the question. And let me remind you that a century ago Edison also was also labeling the idea of using AC as a mean of electricity delivery as stupid. — andselisk, Jun 29 '17 at 21:37
@andselisk I thought you wanted to fake the complete dataset? But nevermind, you can only do that perfectly if you have the correct crystal structure, which you can only have if you have an actual crystal and measured it. But then why fake it? — Karl, Jun 29 '17 at 21:57

Buttonwood · Answer 1 · 2017-06-28T13:18:19.003

Structure determination by X-ray diffraction, regardless if powder diffraction, or single crystal diffraction analysis, is a spatial mapping of electron density. After solving a crystallographic structure (phase problem), the subsequent structure refinement includes steps to complete the crystallographic model, and to minimise the differences in the Fourier maps mapping observed electron density ($F_O$), with electron density predicted by your current model ($F_C$). This minimisation is done until this said difference is below a threshold value, and if the experimenter is satisfied with the result. For the later, this includes the complete and reasonable attribution of atoms (does this electron density belong to an atom of Carbon, or Nitrogen?), checking if bond distances and angles included in the model are consistent with the body of data determined in other experiments. Some of the critera applied are interdependent to others.

If the crystallographic model is completed (in layman's terms, if there is a *.cif file), than it is easy to predict a theoretical powder diffraction pattern. Programs like CCDC Mercury just ask about the wavelength to consider and offer graphical and numerical output like

Such a theoretrical diffractogram is then compared with an experimentally recorded one; for example to check the newly prepared sample's phase identity (polymorphism) and purity.

In PXRD, spatial arrangement and conformations of subunits in the crystallographic model are optimised (refined) until yielding a model that chemically and mathematically is reasonable. This is the reason you see in publications a difference trace below the the superposition of experimental and predicted diffraction pattern:

(source)

For single crystal diffraction, the generation of a 3D diffraction model computationally might be more costly than for PXRD, on the other hand, as said, programs like ShelX routinely compare experimentally recorded $F_O$ with predicted / calculated diffraction intensities $F_C$ derived from the current set of atoms (their coordinates, occupancy, etc.) of the model. It is done internally. ShelX' *.lst listings for example include a section like:

Hence I answer your question, if diffraction pattern may be simulated / faked with a "yes it is possible, provided there is a good data set about how the atoms / molecules are spatially arranged". Initiatives like the recurrent CSP blind tests to predict crystal structures and databases like PCOD may provide a foundation for this.

Then how to spot an artifical and wrong diffraction model? Checkcif / Platon are tools to check for consistency and integrity of such models against the raw data (hkl-files), structure factors. The recent installation of checksums into *.cif files (currently under watch as Platon errors PLAT012, 013, and 014), may be seen as one element of protection if few entries were changed after completing the work described above. These bells probably remain silent if data were compromised very early on.

I'm sorry, but this not an answer. These are the steps you routinely perform to obtain CIF-file. You also conveniently jumped over the phase problem and proceeded with fancy outcome. Sure, having larger dataset (SC-XRD experiment) you can always derive a subset (simulated powder diffractogram), this is trivial. My question is, would it be possible to produce initial/raw dataset just as if it would be collected by single-crystal diffractometer, only having desired crystal structure - all that info that you find in CIF files -- coordinates, occupancies, thermal parameters, etc. — andselisk, Jun 28 '17 at 13:00
@andselisk As the listing + the next-to-last paragraph aim to show, yes it is possible. Quantum chemistry may give you reasonable distances of atoms within a molecule, and may optimise the packing (hence the contests mentioned). Thankfully not too often the atoms of a skeletton of organic molecules may be described with a SOF equal to 1 (or 11, in ShelX' notion), easing the situation. And with more than 500k entries in the CCDC, there are plenty examples of anisotropic displacement parameters, for many atoms even in different bonding environment and experimental temperature, available, too. — Buttonwood, Jun 28 '17 at 13:13
@Buttonwood Anisotropic displacement parameters is not something you may just copy from another file. Making the ellipsoids "look right" would probably be the hardest part of the job. Then again, it is not terribly hard. — Ivan Neretin, Jun 28 '17 at 14:11
@IvanNeretin I agree that ADPs are not just a simple scalar, and split into components (A in ADP). A speculative criterion were C(sp3), chain teriminal != near a rigid aryl. Yet for my computational colleagues around, interested in homo/lumo calculations, just to know the centre of the atom in question often is already enough (or, if facing disordered Me / $\ce{CF3}$ / tBu groups to drop one of the two populations at all). If not too distorted (under their visual inspection), they don't mind if the refinement included a set of EADP's, some RIGU, or the simple kept ANIS. Sigh. — Buttonwood, Jun 28 '17 at 14:56
@Buttonwood That's right, but we are talking about the reverse problem: to produce some realistically looking ADPs without any actual data. — Ivan Neretin, Jun 28 '17 at 15:22
@Buttonwood There is no need to use quantum chemistry for establishing the geometry. My assumption is that that ALL info you find in CIF file already given. The assignment is to generate a plausible "fake" dataset, which upon solving and refinement routines will again give our CIF. — andselisk, Jun 29 '17 at 00:09
@Buttonwood And I'm not going to downvote your answer, it's 100% correct and beautifully crafted, but it answers another question -- "How do I perform a single crystal diffraction experiment and use this data to prove the polycrystalline sample I synthesized is what I think it is?" — andselisk, Jun 29 '17 at 00:35

Single crystal x-ray diffraction experiment backwards: from CIF to genuine set of raw data without performing actual experiment

1 Answers1