9

In computational chemistry, we need to provide an starting molecular structure to start our calculations. Is there any way to get that from a chemical database? Say, I want to run H2, it would call something like

url="www.chemicaldatabase.com"
H2=(url,H2)

or it can use smile/other standard symbols for chemical representation.

Osman Mamun
  • 2,477
  • 2
  • 14
  • 28
  • 3
    Would something like http://blog.matt-swain.com/post/16893587098/chemspipy-a-python-wrapper-for-the-chemspider work? – chipbuster Oct 26 '17 at 02:51
  • 6
    For most computations you need 3D structures, so you either need to pull 3D structures directly (example: PyMol and the protein database) or convert identifiers like SMILES into 3D structures, not sure if there's a way to do that. Maybe ObenBabel can do it? – DSVA Oct 26 '17 at 02:58
  • 1
    @chipbuster: Thanks for the link, but it doesn't give you any coordinates. – Osman Mamun Oct 26 '17 at 03:08
  • 1
    @DSVA: Yeah, it would be very hard to list 3D molecular geometry into a database as there are tons of isomers and rotamers involved. I'm running some well known molecular structure using DFT. I don't want to spend time making those initial structures. In ASE, they have a molecular library for some structures, but I am looking for a more detailed database. – Osman Mamun Oct 26 '17 at 03:08
  • 5
    Have a look at molget https://github.com/jensengroup/molget – Jan Jensen Oct 26 '17 at 07:14

2 Answers2

8

There are multiple approaches in Python.

My suggestion would be to use a cheminformatics library like Open Babel or RDKit to convert from SMILES (for example) to 3D coordinates.

If you want to grab from chemical databases, I can suggest two approaches:

  • CIRPy - Uses the NIH chemical resolver to convert from names, SMILES, etc. into 3D structures.
  • Webel - This is a web-based cheminformatics tool. I've used it in the past, but I'm not sure if it's still maintained.

There are other databases, including my PQR and PubChemQC that offer QM-optimized geometries.

The catch with databases is that you might want the geometry of a molecule that's not in the database. In that case, Open Babel or RDKit is a better solution.

One other caveat. Nothing I've indicated above does a very good job with metal-containing species. If you want ferrocene, that's a trickier problem with current solutions.

Geoff Hutchison
  • 27,740
  • 3
  • 79
  • 145
6

If you are after quantum chemistry Psi4 offers an all-in-one solution:

import psi4

mol = psi4.geometry("""
    pubchem:Water
""")

mol.print_out()

scf_e = psi4.energy("SCF", molecule=mol)
Daniel
  • 161
  • 2
  • I’m curious - since this searches PubChem, what happens if it finds multiple matches (eg ibuprofen) - does this return a list? – Geoff Hutchison Oct 28 '17 at 19:27
  • 1
    It plays the "I am feeling lucky" google card. – Daniel Oct 29 '17 at 15:37
  • One other related question - some PubChem searches return molecules with undefined or unknown stereochemistry. Same thing? Would you get a 2D molecule? – Geoff Hutchison Oct 29 '17 at 16:09
  • AFAIK an error should be thrown. Psi does not have the right tech to do 2D->3D guesses. We use this a lot in educational settings, but it probably does not pass muster in a professional one. – Daniel Oct 30 '17 at 14:16
  • Can psi4 uses SMILES as input? – BND Apr 04 '20 at 06:19