Practical procedure using CASM

Create prim.json file -> this file contains structural information about the primitive cell (we usually use exp cell) and initialize project:

casm init

Data structure:

Basis:
	Coordinate -> coordiante for each site
	occupant_dof -> [Na,Va]  "Va" for vacancy
	…
	Coordinate_mode -> cartesian or fractional
Description
Lattice_vectors
Title

An example: (built from a NaSiCON structure, only a few sites are included as an example)

	{
	  "basis" : [
	    {
	        "coordinate" : [ 0.500000, 0.500000, 0.500000],
	        "occupant_dof" : ["Na","Va"]
	        },
	        {
	        "coordinate" : [ 0.000000, 0.000000, 0.000000],
	        "occupant_dof" : ["Na","Va"]
	        },
	        {
	        "coordinate" : [ 0.889670, 0.610330, 0.250000],
	        "occupant_dof" : ["Na","Va"]
	        },
	        {
	        "coordinate" : [ 0.610330, 0.250000, 0.889670],
	        "occupant_dof" : ["Na","Va"]
	        },
	        {
	        "coordinate" : [ 0.250000, 0.889670, 0.610330],
	        "occupant_dof" : ["Na","Va"]
	        },
	        {
	        "coordinate" : [ 0.389670, 0.750000, 0.110330],
	        "occupant_dof" : ["Na","Va"]
	        },
	        {
	        "coordinate" : [ 0.750000, 0.110330, 0.389670],
	        "occupant_dof" : ["Na","Va"]
	        },
	        {
	        "coordinate" : [ 0.110330, 0.389670, 0.750000],
	        "occupant_dof" : ["Na","Va"]
	        },
	        {
	        "coordinate" : [ 0.352810, 0.352810, 0.352810],
	        "occupant_dof" : ["Zr"]
	        }
	  ],
	  "coordinate_mode" : "Fractional",
	  "description" : "Si-based NASICON ",
	  "lattice_vectors" : [
	    [4.593150 ,2.651856  , 7.393667],
	    [-4.593150, 2.651856 , 7.393667],
	    [-0.000000, -5.303713, 7.393667]
	  ],
	  "title" : "NASICON_prim"
	}

Create composition axes This is to define the composition that used for phase diagram 2 coupled axes are used for 2D cases, see "useful emails" for reason why we need this 2 coupled axes .casm/composition_axes.json

{
	  "current_axes" : "coupled",
	  "custom_axes" : {
	    "coupled" : {
	      "a" : [
	        [ 2.000000000000 ],
	        [ 6.000000000000 ],
	        [ 4.000000000000 ],
	        [ 0.000000000000 ],
	        [ 6.000000000000 ],
	        [ 24.000000000000 ]
	      ],
	      "b" : [
	        [ 2.000000000000 ],
	        [ 6.000000000000 ],
	        [ 4.000000000000 ],
	        [ 0.000000000000 ],
	        [ -6.000000000000 ],
	        [ 24.000000000000 ]
	      ],
	      "components" : [ "Na", "Va", "Zr", "Si", "P", "O" ],
	      "independent_compositions" : 2,
	      "origin" : [
	        [ 8.000000000000 ],
	        [ 0.000000000000 ],
	        [ 4.000000000000 ],
	        [ 6.000000000000 ],
	        [ 0.000000000000 ],
	        [ 24.000000000000 ]
	      ]
	    }
	  }
	}

Composition = Origin + (End-Origin)x (End = a or b here) Then compute composition axes

casm composition -c

Import calculated DFT results Use vasp.relax.report to generate properties.calc.json in each directories Generate a file list containing all the path to POSCAR reports_path_primitive.txt Import results into .casm/config_list.json if you want to update new results, make sure you exclude old paths in reports_path_primitive.txt otherwise it will have duplications in database

casm import --batch reports_path.txt --ideal --data --min-energy

Choose chemical reference (per species = per atom)

	'[
	        {"Na": 8.0, "Zr": 4.0, "Si": 6.0, "P": 0.0, "O": 24.0, "energy_per_species": -7.39616323214285714285},
	        {"Na": 2.0, "Zr": 4.0, "Si": 0.0, "P": 6.0, "O": 24.0, "energy_per_species": -7.93335068250000000000},
	        {"Na": 8.0, "Zr": 0.0, "Si": 6.0, "P": 0.0, "O": 24.0, "energy_per_species":  0.00000000000000000000}
	]'

Pass the piece above directly to the command!!

casm ref --set '[{"Na": 8.0, "Zr": 4.0, "Si": 6.0, "P": 0.0, "O": 24.0, "energy_per_species": -7.39616323214285714285}, {"Na": 2.0, "Zr": 4.0, "Si": 0.0, "P": 6.0, "O": 24.0, "energy_per_species": -7.93335068250000000000}, {"Na": 1.0, "energy_per_species": -1.308547}, {"Zr": 1.0, "energy_per_species": -8.547687}]'

The chemical reference can be updated later

casm update

Here I used the lowest energy structure should be used

casm ref --set '[{"Na": 8.0, "Zr": 4.0, "Si": 6.0, "P": 0.0, "O": 24.0, "energy_per_species": -11.732566428571428}, {"Na": 2.0, "Zr": 4.0, "Si": 0.0, "P": 6.0, "O": 24.0, "energy_per_species": -12.525085}, {"Na": 1.0, "energy_per_species": -4.2040927}, {"Zr": 1.0, "energy_per_species": -30.6929575}]'

Create basis function (it's better to use chebychev basis function) basis_sets/bset.default/bspecs.json Occupation can be changed to other properties like spin etc.. Orbit_branch_specs: set the size of cluster for generating basis function, usually decrease with the increment of order

	{
	    "basis_functions" : {
	      "site_basis_functions" : "occupation"
	    },
	    "orbit_branch_specs" : {
	      "2" : {"max_length" : 10.0000},
	      "3" : {"max_length" : 6.00000},
	      "4" : {"max_length" : 5.00000}
	    }
	}

Then compile to get basis function -> it might take 30 mins!

casm bset -u

Prepare fitting ECI Create a folder e.g. fit_1 Select candidates for fitting and save to "train"

casm select --set is_calculated -o train

Create casm-learn input file fit.json using lasso algorithm Candidate list file: "filename" (train here) should exist in this folder

	{
	  "estimator": {
	    "method": "Lasso",
	    "kwargs": {
	      "alpha": 0.0001,
	      "max_iter": 1000000.0
	    }
	  },
	  "feature_selection": {
	    "method": "SelectFromModel",
	    "kwargs": null
	  },
	  "problem_specs": {
	    "data": {
	      "y": "formation_energy",
	      "X": "corr",
	      "kwargs": null,
	      "type": "selection",
	      "filename": "train"
	    },
	    "cv": {
	      "penalty": 0.0,
	      "method": "LeaveOneOut"
	    }
	  },
	  "n_halloffame": 25
	}

Fit ECI

casm-learn -s fit.json

Problem specs file will be generated "fit_specs.pkl" storing the training data, weights, and cross-validation train/test sets and "fit_halloffame.pkl" storing the selected candidates Then adjust fit.json and repeat fitting until the fitting is satisfied (use least feature to reproduce most results). See "casm-learn --settings-format"

casm-learn --settings-format

Generation: eci.json and use it for monte carlo

casm-learn -s fit.json  --select 0

Plot convex hull Query energies from database:

casm query -k  'comp(a)' 'formation_energy'    'clex(formation_energy)' 'hull_dist(ALL,atom_frac)'  'clex_hull_dist(ALL,atom_frac)' -c train  -o data.dat

Query hull from database

casm query  -k  'comp(a)' 'formation_energy' 'clex(formation_energy)'  'on_hull(ALL,comp)' 'on_clex_hull(ALL,comp)' 'comp_n(Na)' -c train   -o hull.dat

You'll see that the difference between cluster expansion (clex) convex hull is far away from DFT convex hull. To fix this , firstly, fix the correlation (cluster expansion coefficient, see useful emails Point term) and fit the weight. Then use:

filename=$1
./clean.sh
rm ${filename%.*}_*
casm-learn -s $filename
casm-learn -s $filename --checkhull
casm-learn -s $filename --select 0
casm-learn -s $filename --hall --indiv 0 --format json > ${filename%.*}-eci.json
#casm query -k  'comp(a)' 'formation_energy'    'clex(formation_energy)' 'hull_dist(ALL,atom_frac)'  'clex_hull_dist(ALL,atom_frac)' -c ALL  -o data.dat
#casm query  -k  'comp(a)' 'formation_energy' 'clex(formation_energy)'  'on_hull(ALL,comp)' 'on_clex_hull(ALL,comp)' 'comp_n(Na)' -c ALL  -o hull.dat
casm query -k  'comp(a)' 'formation_energy'    'clex(formation_energy)' 'hull_dist(ALL,atom_frac)'  'clex_hull_dist(ALL,atom_frac)' -c train   -o data.dat
casm query  -k  'comp(a)' 'formation_energy' 'clex(formation_energy)'  'on_hull(ALL,comp)' 'on_clex_hull(ALL,comp)' 'comp_n(Na)' -c train   -o hull.dat
python plot_convex_refactor.py
mv Convex_hull.pdf ${filename%.*}.pdf
mv hull.dat ${filename%.*}_hull.dat
mv data.dat ${filename%.*}_fit.dat
echo ${filename%.*}
open ${filename%.*}.pdf

To do fitting. Tuning the weight of train file until the error (CV) become small. In addition, the ECI should follow the general trend: pair is dominant then triplet, quadruplet etc. First of all, use following command to query corr to train_weight.dat

casm query -k "formation_energy corr" -c train -o casm_learn_input

Next, add a column called "weight" and put all the point term Then, using following fit.json to do fitting

    {
     "estimator": {
     "method": "Lasso",
     "kwargs": {
     "alpha": 0.0001,
     "max_iter": 1000000.0
     }
     },
     "feature_selection": {
     "method": "SelectFromModel",
     "kwargs": null
     },
     "problem_specs": {
     "data": {
     "y": "formation_energy",
     "X": "corr",
     "kwargs": null,
     "type": "selection",
     "filename": "train"
     },
     "cv": {
     "penalty": 0.0,
     "method": "LeaveOneOut"
        },
        "weight":{
            "method":"wCustom"
        }
     },
     "n_halloffame": 25
}