Wikipedia:WikiProject Chemicals/Chembox validation

WikiProject Chemicals and WikiProject Pharmacology are validating the content in the infoboxes {{chembox}} and {{drugbox}}. Values in the infobox are compared with values reported in literature, and when the values match, the current revision is stored in the index for chembox and the index for drugbox, respectively. This is typically done for values that are 'immutable' (e.g., the boiling point of a chemical compound: the boiling point of water under standard conditions is 99.98°C, and there is no plausible reason to suspect it will change).

At the moment, we are verifying the CAS Registry number ('CASNo' in the {{chembox}}, 'CAS_number' in the {{drugbox}}), ChemSpiderID (ChemSpiderID), Unique Ingredient Identifier (UNII), InChI, KEGG, and ChEMBL by comparison with the data on http://commonchemistry.org (the CAS website), http://www.chemspider.com and http://fdasis.nlm.nih.gov/srs/srs.jsp (for the UNII) as well as from lists supplied by (CAS number, ChemSpiderID, InChI, UNII, ChEMBL and ChEBI) or downloaded from these websites (KEGG, DrugBank). In the meantime, we are trying to add, update and/or check as a number of other identifiers (InChI, InChIKey) by comparison of the data with the ChemSpider website http://www.chemspider.com.

CheMoBot is following changes to these articles, and is set up to update the infoboxes. When it detects changes to values, it will change parameters in the infobox accordingly. These parameters are used by the template to show what the status of the fields are in the box.

Boxes that contain verified values that are the same as the values in the verified revision are tagged with checkY at the bottom, and boxes where some of these values are changed are tagged with ☒N. Moreover, the individual identifiers are tagged with checkY or ☒N, as well. If the boxes contain changes to these verified fields, they are also categorized in Category:Chemboxes which contain changes to verified fields. Boxes that contain changes to other important fields are categorized in Category:Chemboxes which contain changes to watched fields. For an example, see this vandalism, quickly flagged by CheMoBot.

If you encounter a page with a {{chembox}} or {{drugbox}} that shows a ☒N, then please check if the current value is wrong (in which case, it can just be changed back to the value in the verified revision; the bot will do the rest), or if there is a mistake in the verified revision (if so, it may need an update of the index; if you need help with that, please ask the appropriate wikiproject).

Verification – tagging references[]

CheMoBot adds a template to a _Ref parameter (e.g. for CASNo, CASNo_Ref will be filled with {{cascite|correct|XXX}}) when the bot finds the field correct. The first parameter of the template is 'correct', or 'changed', and the box will show a tick or a cross accordingly on CASNo. The second parameter is a field that contains a reference for 'where' the parameter was verified. As we are at the moment verifying all fields against the CAS commonchemistry.org site, the bot replaces XXX with 'CAS' (i.e., {{cascite|correct|CAS}}). When using another place to verify the CASNo, please adapt this parameter accordingly and will try to retain this field throughout. When there will be significantly more verifications against non-commonchemistry.org-places, I will instruct the bot to fill the field standard with {{cascite|correct|??}} or something similar.

Method of work[]

Our approach is to start by checking that the CAS registry number and the structure match with the name. This will be used as a foundation upon which we can build a broader validation effort. Once we have the structure verified, we have the formula, and hence the molar mass, and we can also generate other machine representations such as SMILES, InChI and InChIKey.

First 1000[]

After our IRC meeting on January 13, 2009, we used an Excel file to validate the first 1000 entries from the CAS XML file. This is available to project members here, on the password-protected site. Meanwhile, User:Physchim62 validated the inorganics separately, and these can be found in the CAVer file.

The work[]

We are now beginning to work through the list of "problem articles" found by User:Beetstra, and listed at User:Beetstra/CASFoundCorrect. A description of the process will be added soon.

Notes[]

Fields to check/upload[]

Chemboxes

Check structure, CAS no., Formula, MolarMass.

Notes:

The workers[]

Please sign up to work on some of the articles listed at User:Beetstra/CASFoundCorrect. More information later.

The software[]

Problems found when validating the Excel file[]

Please note any "to be checked" entries here.

1–100[]

101–200[]

201–300[]

There are multiple isomers. File:RRR alpha-tocopherol.png shows the most common isomer. Tim Vickers (talk) 04:28, 10 September 2009 (UTC)[]

301–400[]

401–500[]

501–600[]

601–700[]

701–800[]

801–900[]

901–1000[]

Inorganics[]

The 677 "inorganics" (neutral compounds without C–C or C–H bonds) have now all been checked. 496 entries gave a perfect match, 74 entries had some sort of problem in the article (often minor and already fixed) and 100 entries had no appropriate corresponding article on Wikipedia. A full report will be available in due course.

Elements and ions[]

These will require special treatment: please contact Physchim62 for more details.