Skip to content

Commit 4c06bdb

Browse files
generalfit.py: number_of_functions_per_element parameter
update documentation
1 parent 950774a commit 4c06bdb

5 files changed

Lines changed: 107 additions & 29 deletions

File tree

docs/pacemaker/inputfile.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ data: # dataset specification section
4848
# query_limit: 1000 # limiting number of entries to query from `structdb`
4949
# ignored if reading from cache
5050

51-
# cache_ref_df: True # whether to store the queried or modified dataset into file, default - True
51+
# cache_ref_df: False # whether to store the queried or modified dataset into file, default - True
5252
# filename: some.pckl.gzip # force to read reference pickled dataframe from given file
5353
# ignore_weights: False # whether to ignore energy and force weighting columns in dataframe
5454
# datapath: ../data # path to folder with cache files with pickled dataframes
@@ -66,7 +66,7 @@ Example of creating the **subselection of fitting dataframe** and saving it is g
6666

6767
Example of generating **custom energy/forces weights** is given in `notebooks/data_custom_weights.ipynb`
6868

69-
### Querying data
69+
### Querying data (using structDB only)
7070
You can just query and preprocess data, without running potential fitting.
7171
Here is the minimalistic input YAML:
7272

@@ -197,7 +197,8 @@ potential:
197197

198198
## possible keywords: ALL, UNARY, BINARY, TERNARY, QUATERNARY, QUINARY,
199199
## element combinations as (Al,Al), (Al, Ni), (Al, Ni, Zn), etc...
200-
functions:
200+
functions:
201+
# number_of_functions_per_element: 700 # specify the total number of functions per element to keep
201202
UNARY: {
202203
nradmax_by_orders: [15, 3, 2, 2, 1],
203204
lmax_by_orders: [ 0, 2, 2, 1, 1],

docs/pacemaker/quickstart.md

Lines changed: 42 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,35 @@ process.
77
In this section we will describe the format of the fitting dataset, we will run a fit with an example dataset and
88
overview the output produced by `pacemaker`. Input parameters are detailed in the [section](inputfile.md#Input_file) below.
99

10-
## Fitting dataset preparation
10+
## Automatic DFT data collection
11+
12+
You can collect DFT calculations (currently only for VASP from `vasprun.xml` or `OUTCAR` files) by using `pace_collect`
13+
utility. For example, if your data is in `my_dft_calculation/` folder and subfolders, and single atoms reference energies
14+
are -0.123 eV for Al and -0.456 eV for Cu, then run command
15+
```
16+
pace_collect -wd path/to/my_dft_calculation --free-atom-energy Al:-0.123 Cu:-0.456
17+
```
18+
that will scan through all folders and subfolders and collect DFT free energies (that are force-consistent) and forces
19+
and make a single atom corrections. Resulting dataset will be stored into `collected.pckl.gzip` file.
20+
21+
If you need more flexibility for DFT dataset manipulation,
22+
please check [Manual fitting dataset preparation](#markdown-header-manual-fitting-dataset-preparation).
23+
24+
## Automatic input file generation
25+
26+
In order to fit an ACE potential, one need to create a configurational file with relevant settings.
27+
`pacemaker` utilizes `.yaml` format for configurations.
28+
29+
In order to interactively generate default `pacemaker` input file `input.yaml`, please run
30+
```
31+
pacemaker -t
32+
```
33+
and enter requested information, such as dataset filename, test set size (optional), list of elements, cutoff,
34+
number of functions. Doing so will produce an `input.yaml` file with the most general
35+
settings that can be adjusted for a particular task. Detailed overview of the input file parameters can be found in the
36+
[section](#input-file-overview) below.
37+
38+
## Manual fitting dataset preparation
1139

1240
In order to use your data for fitting with `pacemaker` one would need to provide it in the form of `pandas` DataFrame.
1341
An example DataFrame can be red as:
@@ -97,12 +125,7 @@ or use the utility `pace_collect` from a top-level directory to collect VASP cal
97125
The resulting dataframe can be used for fitting with `pacemaker`.
98126

99127
## Creating an input file
100-
101-
In order to fit an ACE potential to the data prepared following the previous section, one need to create a configurational
102-
file with relevant settings. `pacemaker` utilizes `.yaml` format for configurations. An input file template can be created
103-
by running `pacemaker --template` (or `pacemaker -t`). Doing so will produce an `input.yaml` file with the most general
104-
settings that can be adjusted for a particular task. Detailed overview of the input file parameters can be found in the
105-
[section](#input-file-overview) below.
128+
106129
In this example we will use template as it is, however one would need to provide a path to the
107130
example dataset `exmpl_df.pckl.gzip`. This can be done by changing `filename` parameter in the `data` section of the
108131
`input.yaml`:
@@ -129,8 +152,8 @@ nohup pacemaker input.yaml &
129152
```
130153
For more `pacemaker` command options see the corresponding [section](#pacemaker-commands).
131154

132-
Default behavior of pacemaker is to utilize a GPU accelerated fitting of ACE using `tensorpotential`. However, GPU
133-
parallelization is not supported at the moment. Therefore, if your machine has a multi GPU setup one would need to select
155+
Default behavior of pacemaker is to utilize a GPU accelerated fitting of ACE using `tensorpotential`. However,
156+
parallelization over multiple GPU is not supported at the moment. Therefore, if your machine has a multi GPU setup one would need to select
134157
a single one before running `pacemaker`. This can be done by executing `export CUDA_VISIBLE_DEVICES=ind` in the shell
135158
replacing `ind` with the GPU index (i.g. 0, 1, ...) or -1 to disable GPU usage.
136159
Note, that `tensorpotential` can be used without a GPU as well.
@@ -277,6 +300,16 @@ For more information see [here](https://lammps.sandia.gov/doc/Build_cmake.html).
277300
3. Build LAMMPS using `cmake --build .` or `make`
278301

279302

303+
Please note, that there is a KOKKOS implementation of PACE for LAMMPS as `pair_style pace/kk`, but you need to compile
304+
LAMMPS with this support, see official documentation [here](https://docs.lammps.org/Build_extras.html#kokkos).
305+
This implementation allows to run calculations on GPU which give the speedup of **up to x100** on modern GPU architectures
306+
in comparison to single-core CPU. In that case you should modify LAMMPS input script as
307+
```
308+
## in.lammps
309+
310+
pair_style pace product
311+
pair_coeff * * output_potential.yace Al Ni
312+
```
280313

281314
## More examples
282315

src/pyace/generalfit.py

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,8 +217,34 @@ def __init__(self,
217217
else:
218218
self.target_bbasisconfig = construct_bbasisconfiguration(potential_config,
219219
initial_basisconfig=self.initial_bbasisconfig)
220+
if ("functions" in potential_config and
221+
"number_of_functions_per_element" in potential_config['functions']):
222+
num_block = len(self.target_bbasisconfig.funcspecs_blocks)
223+
number_of_functions_per_element = potential_config["functions"]["number_of_functions_per_element"]
224+
target_bbasis = ACEBBasisSet(self.target_bbasisconfig)
225+
nelements = target_bbasis.nelements
226+
ladder_step = number_of_functions_per_element * nelements // num_block
227+
expected_number_of_functions = ladder_step * num_block
228+
log.info(
229+
"""Target potential contains {total_number_of_functions} functions,"""
230+
""" but is limited to maximum {number_of_functions_per_element}"""
231+
""" functions per element for {nelements} elements ({num_block} blocks)""".format(
232+
total_number_of_functions=self.target_bbasisconfig.total_number_of_functions,
233+
number_of_functions_per_element=number_of_functions_per_element,
234+
nelements=nelements,
235+
num_block=num_block))
236+
237+
initial_basisconfig = self.target_bbasisconfig.copy()
238+
clean_bbasisconfig(initial_basisconfig)
239+
current_bbasisconfig = extend_multispecies_basis(initial_basisconfig, self.target_bbasisconfig,
240+
"power_order", ladder_step)
241+
self.target_bbasisconfig = current_bbasisconfig
242+
log.info("Resulted potential contains {} functions".format(
243+
self.target_bbasisconfig.total_number_of_functions))
244+
220245
log.info("Target potential shape constructed from dictionary, it contains {} functions".format(
221246
self.target_bbasisconfig.total_number_of_functions))
247+
222248
elif isinstance(potential_config, str):
223249
self.target_bbasisconfig = BBasisConfiguration(potential_config)
224250
log.info("Target potential loaded from file '{}', it contains {} functions".format(potential_config,
@@ -236,6 +262,8 @@ def __init__(self,
236262
("Non-supported type: {}. Only dictionary (configuration), " +
237263
"str (YAML file name) or BBasisConfiguration are supported").format(
238264
type(potential_config)))
265+
# save target_potential.yaml
266+
self.target_bbasisconfig.save(TARGET_POTENTIAL_YAML)
239267

240268
if FIT_LADDER_STEP_KW in fit_config and not self.ladder_scheme:
241269
if self.initial_bbasisconfig is None:
@@ -374,6 +402,9 @@ def test_metric_callback(self, metrics_dict, extended_display_step=None):
374402
self.metrics_aggregator.test_metric_callback(metrics_dict, extended_display_step=extended_display_step)
375403

376404
def save_fitting_data_info(self):
405+
# columns to save: w_energy, w_forces, NUMBER_OF_ATOMS, PROTOTYPE_NAME, prop_id,structure_id, gen_id, if any
406+
# columns_to_save = ["PROTOTYPE_NAME", "NUMBER_OF_ATOMS", "prop_id", "structure_id", "gen_id", "pbc"] + \
407+
# [ENERGY_CORRECTED_COL, EWEIGHTS_COL, FWEIGHTS_COL]
377408
columns_to_drop = ["tp_atoms", "atomic_env"]
378409
fitting_data_columns = self.fitting_data.columns
379410

@@ -467,7 +498,7 @@ def cycle_fitting(self, bbasisconfig: BBasisConfiguration) -> BBasisConfiguratio
467498
log.warning(
468499
("Number of finished fit cycles ({}) >= number of expected fit cycles ({}). " +
469500
"Use another potential or remove `{}` from potential metadata")
470-
.format(finished_fit_cycles, fit_cycles, "_" + FIT_FIT_CYCLES_KW))
501+
.format(finished_fit_cycles, fit_cycles, "_" + FIT_FIT_CYCLES_KW))
471502
return current_bbasisconfig
472503

473504
fitting_attempts_list = []

src/pyace/multispecies_basisextension.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
TERNARY = "TERNARY"
2626
QUATERNARY = "QUATERNARY"
2727
QUINARY = "QUINARY"
28-
KEYWORDS = [ALL, UNARY, BINARY, TERNARY, QUATERNARY, QUINARY]
28+
KEYWORDS = [ALL, UNARY, BINARY, TERNARY, QUATERNARY, QUINARY, 'number_of_functions_per_element']
2929

3030
NARY_MAP = {UNARY: 1, BINARY: 2, TERNARY: 3, QUATERNARY: 4, QUINARY: 5}
3131
PERIODIC_ELEMENTS = chemical_symbols = [
@@ -225,7 +225,8 @@ def species_key_to_bonds(key):
225225
return bonds
226226

227227

228-
def create_multispecies_basis_config(potential_config: Dict, unif_mus_ns_to_lsLScomb_dict: Dict = None,
228+
def create_multispecies_basis_config(potential_config: Dict,
229+
unif_mus_ns_to_lsLScomb_dict: Dict = None,
229230
func_coefs_initializer="zero",
230231
initial_basisconfig: BBasisConfiguration = None) -> BBasisConfiguration:
231232
"""
@@ -636,7 +637,7 @@ def create_species_block(elements_vec: List, block_spec_dict: Dict,
636637
if func_coefs_initializer == "zero":
637638
coefs = [0] * ndensity
638639
elif func_coefs_initializer == "random":
639-
coefs = np.random.randn(ndensity)
640+
coefs = np.random.randn(ndensity)*1e-4
640641
else:
641642
raise ValueError(
642643
"Unknown func_coefs_initializer={}. Could be only 'zero' or 'random'".format(

src/pyace/preparedata.py

Lines changed: 25 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@
2828

2929
log = logging.getLogger(__name__)
3030

31+
REF_PROP_NAME = '1-body-000001:static'
32+
REF_GENERIC_PROTOTYPE_NAME = '1-body-000001'
33+
3134
# ## QUERY DATA
3235
LATTICE_COLUMNS = ["_lat_ax", "_lat_ay", "_lat_az",
3336
"_lat_bx", "_lat_by", "_lat_bz",
@@ -129,17 +132,7 @@ def query_data(config: Dict, seed=None, query_limit=None, db_conn_string=None):
129132
if REF_ENERGY_KW not in config:
130133
try:
131134
# TODO: generalize query of reference property
132-
REF_PROP_NAME = '1-body-000001:static'
133-
REF_GENERIC_PROTOTYPE_NAME = '1-body-000001'
134-
ref_prop = storage.query(StaticProperty).join(StructureEntry, GenericEntry).filter(
135-
Property.CALCULATOR == reference_calculator,
136-
Property.NAME == REF_PROP_NAME,
137-
StructureEntry.COMPOSITION.like(config["element"] + "-%"),
138-
StructureEntry.NUMBER_OF_ATOMS == 1,
139-
GenericEntry.PROTOTYPE_NAME == REF_GENERIC_PROTOTYPE_NAME
140-
).one()
141-
# free atom reference energy
142-
ref_energy = ref_prop.energy / ref_prop.n_atom
135+
ref_energy = query_reference_energy(config["element"], reference_calculator, storage)
143136
except NoResultFound as e:
144137
log.error(("No reference energy for {} was found in database. " +
145138
"Either add property named `{}` with generic named `{}` to database or use `{}` " +
@@ -214,6 +207,20 @@ def query_data(config: Dict, seed=None, query_limit=None, db_conn_string=None):
214207
return df_total, ref_energy
215208

216209

210+
def query_reference_energy(element, reference_calculator, storage):
211+
from structdborm import StructureEntry, StaticProperty, GenericEntry, Property
212+
ref_prop = storage.query(StaticProperty).join(StructureEntry, GenericEntry).filter(
213+
Property.CALCULATOR == reference_calculator,
214+
Property.NAME == REF_PROP_NAME,
215+
StructureEntry.COMPOSITION.like(element + "-%"),
216+
StructureEntry.NUMBER_OF_ATOMS == 1,
217+
GenericEntry.PROTOTYPE_NAME == REF_GENERIC_PROTOTYPE_NAME
218+
).one()
219+
# free atom reference energy
220+
ref_energy = ref_prop.energy / ref_prop.n_atom
221+
return ref_energy
222+
223+
217224
class StructuresDatasetWeightingPolicy:
218225
def generate_weights(self, df):
219226
raise NotImplementedError
@@ -639,7 +646,7 @@ def get_fit_dataframe(self, force_query=None, weights_policy=None, ignore_weight
639646

640647
class EnergyBasedWeightingPolicy(StructuresDatasetWeightingPolicy):
641648

642-
def __init__(self, nfit=20000,
649+
def __init__(self, nfit=None,
643650
cutoff=None,
644651
DElow=1.0,
645652
DEup=10.0,
@@ -705,6 +712,10 @@ def __str__(self):
705712
reftype=self.reftype, seed=self.seed)
706713

707714
def generate_weights(self, df):
715+
if self.nfit is None:
716+
self.nfit = len(df)
717+
log.info("Set nfit to the dataset size {}".format(self.nfit))
718+
708719
if self.reftype == "bulk":
709720
log.info("Reducing to bulk data")
710721
df = df[df.pbc]
@@ -1019,7 +1030,8 @@ def generate_weights(self, df):
10191030
if col_to_drop in df.columns:
10201031
df.drop(columns=col_to_drop, inplace=True)
10211032

1022-
mdf = pd.merge(df, self.weights_df[[WEIGHTS_ENERGY_COLUMN, WEIGHTS_FORCES_COLUMN]], left_index=True, right_index=True)
1033+
mdf = pd.merge(df, self.weights_df[[WEIGHTS_ENERGY_COLUMN, WEIGHTS_FORCES_COLUMN]], left_index=True,
1034+
right_index=True)
10231035
if not (mdf[FORCES_COLUMN].map(len) == mdf[WEIGHTS_FORCES_COLUMN].map(len)).all():
10241036
error_msg = ("Shape of the `{}` column doesn't correspond to the shape of "
10251037
"`forces` column in original dataframe").format(WEIGHTS_FORCES_COLUMN)

0 commit comments

Comments
 (0)