11

I’m interested in working with a large dataset consisting of CAS numbers and each corresponding molecule/chemical unit’s name. From my understanding, CAS Registry Numbers are assigned and stored by a division of the American Chemical Society.

However, it's not entirely clear to me how these numbers are controlled outside CAS's own database. Does ACS “own” these numbers?

Melanie Shebel
  • 6,704
  • 10
  • 45
  • 86
  • 3
    What do you mean by controlled? You can't assign these numbers yourself, but once assigned I don't think there is much legally that prohibits how you use the number. I'll await an answer though. In the CAS faqs it states: "A CAS Registry Number license is required anytime an organization will “publish” CAS Registry Numbers to the public or use them to support features of a platform that is publicly or commercially available." – Buck Thorn Mar 28 '24 at 13:03
  • 1
    Basically I think CAS sells licenses to a service that checks that the numbers you are using are correct. – Buck Thorn Mar 28 '24 at 13:18
  • When you have the numbers, as check the last digit in the CAS number is a check digit, obtained by multiplying each preceding digit by its position in the number; taken in reverse order, starting by multiplying the last digit by $1$, the second to last by $2$, etc. then adding the result and calculating modulo $10$ of the number produced. For naphthalene $(91 - 20 -3)$; the $3$ is the check digit. The sum is , $0 \times 1 + 2 \times 2 + 1 \times 3 + 9 \times 4 = 43 $, finally, $43 \text{ mod } 10 = 3$, where the $3$ is the remainder of a whole number of divisions of $10$ into $43$. – porphyrin Mar 28 '24 at 15:56

1 Answers1

12

CAS numbers itself don't contain chemical information, and initially set to help ACS' Chemical Abstract Service to organize their records. It is the only body to create and curate (both assign and withdraw) them.(1) The rules behind the format of a CAS number are public.

You have to pay for a look-up in their databases (i.e., Scifinder). Depending on your local law, the mere assignment of a catalogue number to a chemical(2) might not represent something creative (or: creative enough) to grant a granting copyright. A couple of years ago, this was a point of discussion between Wikipedia and its property boxes stating CAS numbers, on one hand, and CAS on the other (see, for instance a blog post by Peter Murray-Rust in 2008 here about the curation of these public entries(3)).

If you want to be on the safe side regardless the jurisdiction, the set of about 500k entries of common chemistry is a larger public database curated by CAS under the permissible Creative Commons CC BY-NC 4.0 license. A search by either (English) chemical name, SMILES, InChI, or CAS RN yields basic entries like chemical name, CAS RN, a 2D .sdf file and other identifiers. Below a screen photo about the open chain form of glucose (note the use of registry marks)

enter image description here

(screen photo of glucose in commenchemistry.cas.org)

There is no guarantee larger public chemistry-relevant databases as for instance PubChem, or ChemSpider have for every entry on their records the corresponding CAS registry number(s). Like vendors of chemicals, they are not obliged to use them (under the presumption CAS assigned at least one), or follow an other scheme.


(1) One of the motivations for InChI was to provide a non-proprietary identifier, an other that the string would relate to (at least some) chemistry of the record.

(2) There are caveats like 1317-80-2 about rutile, and 1370-70-0 about anatase -- both $\ce{TiO2}$ polymorphs -- vs. 13463-67-7 titania. Have a look on the many entries of the later «Deleted or Replaced CAS Registry Numbers», bottom right.

(3) The English edition of Wikipedia contains about 21k chemicals searchable by the structure explorer.

Buttonwood
  • 29,590
  • 2
  • 45
  • 108