Interatomic Potential Representation
This page outlines a possible way to represent interatomic potentials in CML. The approach involves mixing CML with MathML to describe mathematical functions, their parameters and units. This may have further application beyond this representation.
Background
Many atomic scale simulation codes evaluate parametrised functions of interatomic separation to calculate the energy of a particular configuration of atoms. Such codes include GULP and DL_Poly. These potential functions can also be used in hybrid calculations with QM simulation codes.
Potential parameters are a key input to these simulation codes and thus should be represented in our record of simulation results. Furthermore, there is an existing tradition of the sharing of the potential parameters, so we should aim to enhance this by making it possible to use the CML-encoded potentials as input for the simulation codes. We also need to consider the case of potential fitting where the potential parameters behave rather like atomic coordinates in a typical optimisation run (GULP is the only widely used code for this).
To be able to unambiguously define a particular potential function we must be able to represent the following pieces of information:
- The functional form of the potential.
- The number of atoms involved in the interaction and the variable(s) used to calculate the energy (separation, bond angle, torsional angle, etc.).
- The types of the atoms involved in this particular interaction.
- Values for the parameters of this particular interaction.
- Which atoms will be involved (e.g. distance based cut-off, only applies to chemical bonds).
- Metadata (where does this potential come from, who fitted it, how was it tested, etc.).
- Units for the parameters, variables, and the energy contributed by the potential.
Aims
- To define a CML microformat that represents the information needed for the above. Initially this will be limited to two-body interactions but the approach must be extensible for higher order terms.
- To modify GULP to output in this format.
- To modify GULP to read this format as a potential library in addition to the current input format. GULP is not expected to be able to understand the functional form of the potential but it must be able to tell that it knows about the particular potential form.
- To write software to allow the format to be used with other codes. This will be able to read any functional form without writing down all possibilities ahead of time.
A microformat
I have a rough outline of a working microformat. My example follows these rules (I expect they will be modified):
- Each potential (representation of information listed above) is represented by a CML <potential> element. The potential will have the following child elements:
- One <metadataList> - holds metadata elements about this particular potential
- One <potentialType> element. This carries a number attribute who's value indicates the number of atoms involved in the interaction (so 2 for two body potentials) and a type attribute. This element also holds an <atomArray> of the atom identifiers involved in the interaction.
- A series of <parameter> elements. These have name and dictRef elements and one child <scalar> element.
- One <expression> element with one empty <scalar> child, one child <math> element in the MathML namespace and one (or more) <arg> elements.
- The potential element carries a namespaced dictRef attribute. This refers to a entry in the simulation code's dictionary that defines (if only implicitly) the functional form, arguments and contribution of the potential. Thus a simulation code has the choice of assigning potentials by recognising it's own dictRefs or by interoperating the <expression> child element. The information carried by the dictRef / <expression> is equivalent to the information typically carried by a potential name in existing input file formats.
- The other (non <expression>) child elements carry the information that will vary from instance to instance of the potential:
- The values and units of the parameters are held in the <scalar> children, the unique name of the parameter is encoded by the dictRef. The parameter name provides a link between the CML parameters and <ci>s in the MathML fragment. Any cut off distances should be encoded as parameters and represented in the MathML.
- The atoms which this potential acts on are encoded in the <atomArray> in the <potentialType> element.
The following is an example of this syntax for a Buckingham potential with functional form E = A.e(-Roh/R)-C/R6. A, Roh and C are parameters of the potential, a simulation code will work out the energy, E, for a given atomic separation, R. The potential acts between oxygen and aluminium atoms with a seperation (rmax) less then 10 angstroms.
<potential id="1" dictRef="mypotentialDict:buckingham">
<!-- Or should this dict ref tell us what the potential is used for?-->
<potentialType number="2" type="gulp:general">
<atomArray>
<atom elementType="Al"/><atom elementType="O"/>
</atomArray>
</potentialType>
<metadataList>
<metadata name="gulp:potentaltype" content="Buckingham"/>
<metadata name="gulp:buckinghamatom" content="O"/>
<metadata name="gulp:buckinghamatom" content="Al"/>
<metadata name="gulp:buckinghamatomtype" content="s"/>
<metadata name="gulp:buckinghamatomtype" content="s"/>
<metadata name="gulp:buckinghamecuttmax" content="1.000000000000e1"/>
<metadata name="gulp:buckinghamecuttmin" content="0.000000000000e0"/>
<metadata name="gulp:potentialtype"
content="General interatomic potential"/>
</metadataList>
<parameter name="a" dictRef="gulp:buckingham.a">
<scalar units="gulpunits:eV">2.409505000000e3</scalar>
</parameter>
<parameter name="roh" dictRef="gulp:buckingham.roh">
<scalar units="gulpunits:Ang^-1">2.649000000000e-1</scalar>
</parameter>
<parameter name="c" dictRef="gulp:buckingham.c">
<scalar units="gulpunits:eV^-6">-10.000000000000e0</scalar>
</parameter>
<parameter name="rmax" dictRef="gulp:buckingham.rmax">
<scalar units="gulpunits:Ang">10.000000000000e0</scalar>
</parameter>
<expression dictRef="mypotentialDict:buckingham">
<scalar units="gulpunits:eV"/>
<m:math>
<m:declare type="function">
<m:ci>E</m:ci>
<m:lambda>
<m:bvar><m:ci>R</m:ci></m:bvar>
<m:piecewise>
<m:piece>
<m:cn>0</m:cn>
<m:apply><m:lt/><m:ci>R</m:ci><m:ci>rmax</m:ci>
</m:piece>
<m:otherwise>
<m:apply>
<m:minus/>
<m:ci>
<m:apply>
<m:times/>
<m:ci>a</m:ci>
<m:ci>
<m:apply>
<m:exp/>
<m:ci>
<m:apply>
<m:divide/>
<m:ci>
<m:apply>
<m:minus/>
<m:ci>R</m:ci>
</m:apply>
</m:ci>
<m:ci>roh</m:ci>
</m:apply>
</m:ci>
</m:apply>
</m:ci>
</m:apply>
</m:ci>
<m:ci>
<m:apply>
<m:divide/>
<m:ci>c</m:ci>
<m:ci>
<m:apply>
<m:power/>
<m:ci>R</m:ci>
<m:cn>6</m:cn>
</m:apply>
</m:ci>
</m:apply>
</m:ci>
</m:apply>
</m:otherwise>
</m:piecewise>
</m:lambda>
</m:declare>
</m:math>
<arg name="R" dictRef="gulp:buckingham.R">
<scalar units="gulpunits:Ang"/>
</arg>
</expression>
</potential>
The PotentialList element
Multiple potentials are typically used within a single computation. Additionally, it often only makes sense for potentials to be distributed as a set for the simulation of a particular class of systems. To accomidate this <potential> elements should be grouped as children of the <potentialList> element. This can also hold a <metadataList> with information regarding the source of the potentials.
A note on MathML and (X)HTML
One obvious reason to use MathML rather then inventing some other markup is that there is a reasonable chance of being able to display it in web browsers (after all, both Firefox and IE are MathML aware). Currently this is not as trivial as it should be but with a little effort it can be made to work. The first step is to make sure that the HTML document is well formed XML, and that the MathML fragment is declared in the correct namespace. The second issue is that you need to include some browser specific information and browsers typically only know about presentation MathML (not the semantically rich content MathML used above). The solution is to use stylesheets provided by W3C to do the conversion (see http://www.w3.org/Math/XSL/, they have a permissive licence). I'm using a recent version of Firefox and I've found that the XSLT needs to be on the same server as the HTML document for the transformation to be performed, apparently this is true for IE too. For Firefox users will also need to install some additional fonts provided by Wolfram and others (see http://www.mozilla.org/projects/mathml/fonts), apparently they are working on getting free alternatives so that the fonts can be bundled with the browser. Finally, the MathML above still fails to render nicely: the trick seems to be to replace:
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML">
<m:declare type="function">
<m:ci>E</m:ci>
<m:lambda>
<m:bvar><m:ci>R</m:ci></m:bvar>
<m:apply>
...
with
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML">
<m:apply>
<m:eq/>
<m:ci>E</m:ci>
<m:apply>
...
Software
I have two pieces of proof of concept code. First, I have a pure Fortran parser based on FoX's SAX library. This is unable to deal with the MathML but is sufficient to read data into a simulation codes. I also have patches against the current GULP development branch that allows GULP simulations to make use of potentials encoded as described above. Second, I have a Python application that reads the format including the MathML. This is capable of evaluating the potential function without having a list of known potentials. This can output the potential in graphical form or as a DL_Poly TABLE file (containing the value of the potential, and its derivatives on a grid of atomic separations).
Future work
- GULP needs to output this format. However, the MathML is rather verbose and writing it from Fortran is far too tedious. I'll explore external entities or Xinclude to make this easer (and save disk space).
- GULP needs to be taught about reading more potentials (by creating dictRef -> internal variable name mappings with some minimal validation).
- We need to work out how to encode potentials in code specific dictionaries.
- My Python MathML interpreter only recognises a subset of the default operators. If there are no libraries that can be used to replace this a complete rewrite is needed. With a little planning it should be possible to also interpret OpenMath? markup.
- With some work the Python tool could be used as the back end to a web-based potential library. Rather like this one that used to be hosted at http://www.dfrl.ucl.ac.uk/Potentials/.
