How to quantify the difference between two xyz files for the same molecule?

Question

Sometimes I encounter xyz files with slightly different coordinates, for example, before and after optimization.

What is the easiest way to compare how different are they?

In the first approximation, I would define the difference as a minimum of the sum of squares of differences between individual coordinates, where the minimum is taken over translations/rotations of one of the representations.

The answer is also dependent on what you want to know / to compare. It is a different situation when you are interested in if two optimized geometries are the same, or if you try to compare different comforters. — Greg, Jan 18 '19 at 02:45

Karsten · Answer 1 · 2019-01-18T04:05:24.033

What you describe is to calculate the RMSD after superposition. It is not necessary to do a minimization, there is a classic algorithm to get the best superposition directly: Kabsch algorithm. Sometimes, only a subset of coordinates will be superimposed, either because coordinates are missing in one of the structure, or because some parts of the structure show larger differences and interfere with superimposing the similar parts of the structures.

In some cases, where the structures are very different from one another but there are domains of conserved structures, you could also compute differences of pairwise distances and plot them as a distance difference matrix. Distance differences within the conserved domains will be small, while distance differences from domain to domain will be large.

When you do a superposition and RMSD calculation for the first time, it is nice to be able to check the superposition. Many structure viewers have the capability to visualize the superposition while also calculating the RMSD, for example the "compare" command in jmol.

score 1 · Answer 2 · answered Jan 18 '19 at 07:12

As Karsten Theis said there is the Kabsch algorithm for this purpose, but this algorithm claime to make pairs atoms in two structures (for minimization RMSD between structures via rotation and translation). However, sometimes this is not easy. Though if your order of atoms doesn't change it will not a problem, otherwise there is ICP ICP algorithm or global optima ICP, but both algorithms consider atoms as points and hence have a lots disadvantages, because global minima between two point clouds is not global minima between two molecules in common.

score 1 · Answer 3 · answered Oct 31 '22 at 16:00

As already mentioned in the comments there exists a Python implementation of the RMSD calculation which can be found at https://github.com/charnley/rmsd.

If you want to instead get detailed overview over what changed, maybe my Python script at https://github.com/Krzmbrzl/xyz-diff might be suitable for you. It will effectively calculate a diff-matrix and output that (so you see the diff for every individual coordinate). Note that no prior aligning is performed, so the output will be the raw diff.

Example output:

O         +0.00000000   +0.00000000   -0.01143253  
O         +0.00000000   -0.00000000   +0.01143253  
Cu -> Co  -0.01804938   +0.00829356   +0.00000000  
Cu        +0.01804938   -0.00829356   +0.00000000  
N         -0.04042133   +0.03424848   +0.00000000  
N         +0.04042133   -0.03424848   +0.00000000  
N         -0.02848896   -0.00191986   +0.01590888  
N         -0.02848896   -0.00191986   -0.01590888  
N         +0.02848896   +0.00191986   +0.01590888  
N         +0.02848896   +0.00191986   -0.01590888  
H         -0.04673929   +0.02155138   +0.00000000  
H         +0.04673929   -0.02155138   +0.00000000  
H         -0.04417332   +0.03825913   +0.00254647  
H         -0.04417332   +0.03825913   -0.00254647  
H         +0.04417332   -0.03825913   +0.00254647  
H         +0.04417332   -0.03825913   -0.00254647  
H         -0.02376989   -0.00091570   +0.01582489  
H         -0.02376989   -0.00091570   -0.01582489  
H         +0.02376989   +0.00091570   +0.01582489  
H         +0.02376989   +0.00091570   -0.01582489  
H         -0.02343687   -0.00428525   +0.01206063  
H         -0.02343687   -0.00428525   -0.01206063  
H         +0.02343687   +0.00428525   +0.01206063  
H         +0.02343687   +0.00428525   -0.01206063  
H         -0.03131781   -0.00348489   +0.01817259  
H         -0.03131781   -0.00348489   -0.01817259  
H         +0.03131781   +0.00348489   +0.01817259  
H         +0.03131781   +0.00348489   -0.01817259

(in a terminal the output is color-coded, so large diffs can more easily be spotted)

How to quantify the difference between two xyz files for the same molecule?

3 Answers3