Policies
A policy tells atomref how to answer the question “what value should I use for
this element?”
That may sound simple, but in practice scientific datasets are often incomplete. A policy makes the decision process explicit instead of hiding it in algorithm code.
Terms used in the policy layer
A few terms appear repeatedly in the API and docs:
- quantity — the operational property family being requested.
- domain — the lookup key space. In the current runtime that means
element, so lookups are keyed by element symbol. - dataset — a curated named table inside one quantity.
- policy — the ordered rule set used to resolve missing values.
The quantity and dataset live in the curated registry. The policy is the selection logic that sits on top of them.
Resolution order
In the current implementation every lookup follows the same ordered path:
- Blocked key (optional)
- Override
- Base dataset
- Transfer models, in the order you listed them
- Fallback
- Missing
Each step has a specific meaning.
Blocked key
Some quantity wrappers need to declare that certain domain keys should never be
resolved, even if a transfer model could otherwise invent a number. The current
X–H helper uses this for H, because xh_bond_length is keyed by the parent
atom X in X–H, not by hydrogen itself.
Override
An override is a value you provide directly for a specific element. It wins over everything else and is useful when you want to pin one or two elements without changing the whole dataset.
Base dataset
The base dataset is the preferred source. For example, the default covalent
policy starts from the Cordero covalent radii (cordero2008), and the
default vdW policy starts from the Alvarez van der Waals radii
(alvarez2013).
Transfer
A transfer model is used only when the base dataset has no value for the requested element.
Built-in transfer models are:
SubstitutionTransfer— take a value directly from another dataset or policy,LinearTransfer— infer a target-equivalent value from another dataset or policy through a fitted linear model.
LinearTransfer already accepts a tuple of predictors in the API, but the
current runtime intentionally supports exactly one predictor source. That keeps
the implementation simple now while leaving room for later multi-predictor
linear models.
Transfer sources can be:
- a packaged dataset reference (
DatasetRef), - a custom
ElementScalarSet, - a generic
ValuePolicy, - a wrapper policy such as
RadiiPolicyorXHPolicy.
When a transfer source is itself a policy, atomref uses the values selected by
that policy. This lets higher-level workflows express things like “infer X–H
lengths from my chosen covalent-radii policy” instead of hard-coding a specific
support dataset.
Nested policy safeguards for LinearTransfer
When a predictor source is itself a policy, two different questions matter:
- Which nested predictor values are trustworthy enough to train the linear fit?
- Which nested predictor value is acceptable for the final requested element?
atomref keeps those two decisions separate. By default:
fit_sources=("base", "override")andfit_max_depth=0,prediction_sources=("base", "override", "transfer_substitution", "transfer_linear")andprediction_max_depth=1.
That means the fitted relationship is trained only on direct predictor values by default, while one additional nested completion step is still allowed at prediction time.
This is a good default for workflows such as:
- sparse target X–H data from
csd_legacy_xh_cno, - a partial covalent-radii predictor policy with direct
s,pvalues, - one inner transfer from a broader support set such as
cordero2008to make the predictor usable fordorfelements.
In that setup, the outer X–H fit still uses direct predictor anchors, while the
final requested element may use one nested predictor transfer. If you really do
want fit training to use nested predictor values as well, you can opt in
explicitly by widening fit_sources and/or increasing fit_max_depth.
Fallback
A fallback is a constant last-resort value. It is useful when an algorithm must receive some number even if both the base dataset and transfer sources are missing a value.
Missing
If nothing above can produce a value and no fallback was configured, the result
is simply missing. In that case get_* returns None, while lookup_*
returns a LookupResult with source="missing" and explanatory notes.
Placeholder values and is_placeholder
Some support datasets use placeholder numbers to stand in for “unknown but keep this legacy table dense enough for downstream heuristics”.
LookupResult.is_placeholder answers one narrow question:
Is the returned numeric value itself marked as a placeholder by the source that supplied it?
It does not mean “a transfer happened”. Examples:
- a base lookup can have
is_placeholder=Trueif the base dataset contains a placeholder value, - a substitution transfer can also have
is_placeholder=Trueif it copied a placeholder from the transfer source, - a linear transfer is computed, not copied, so
is_placeholderis normallyFalse.
Transfer depth and cycle detection
LookupResult.transfer_depth counts how many transfer steps were needed to
produce the returned value:
- direct base and override values have depth
0, - one substitution or linear restoration has depth
1, - nested transfer chains increase the depth further.
This makes nested-policy behavior inspectable without trying to infer it from notes alone.
Because policies may now depend on other policies, the resolver also performs
cycle detection. A cyclic reference such as policy A depending on policy B while
policy B depends back on policy A raises PolicyError instead of recurring
indefinitely. The same protection applies when recursion goes through wrapper
policies such as RadiiPolicy or XHPolicy.
Target datasets and support datasets
atomref separates what a dataset is used for from what it scientifically
represents.
That is why the package stores:
- the operational quantity,
- the lookup domain,
- the scientific semantic class,
- the package-level usage role.
This distinction matters for datasets such as Rahm isodensity atomic radii
(rahm2016). They are useful support data for restoring missing van der Waals
radii, but they are not the same thing as a condensed-phase structural vdW
radius set. In atomref, that difference is recorded in the metadata instead of
being hidden.
Examples
A standard dataset-backed transfer:
import atomref as ar
policy = ar.RadiiPolicy(
kind="van_der_waals",
base_set="alvarez2013",
transfers=(
ar.LinearTransfer(
predictors=(ar.DatasetRef("atomic_radius", "rahm2016"),),
),
),
overrides={"Xe": 2.10},
)
A policy-backed transfer source:
import atomref as ar
xh_policy = ar.XHPolicy(
base_set="csd_legacy_xh_cno",
transfers=(
ar.LinearTransfer(
predictors=(ar.DEFAULT_COVALENT_POLICY,),
min_points=3,
),
),
)
With that X–H policy:
C,N, andOuse the curated ConQuest defaults,- missing parent elements may be inferred from the selected covalent-radii policy, not just from one hard-coded support dataset,
- if the predictor policy itself needed a transfer to produce a covalent radius,
the resulting
LookupResultstill records that provenance inresolved_from,notes, andtransfer_depth.