Random Sampler
A constraint-based random sampler for generating valid samples.
Overview
Supports continuous, integer, and categorical features
Allows custom constraints
Parallel sampling supported
API Reference
- class mlsampler.engine.random.RandomSampler(config: SamplerConfig)
Bases:
BaseSamplerRandom constraint-based sampler.
This sampler generates samples based on feature metadata inferred from training data. Users can register constraints that are applied during sample generation.
- Parameters:
config (SamplerConfig) – Configuration object containing feature metadata and sampling settings.
Notes
Two types of constraints are supported:
- Validation constraints:
Return a boolean. If False, the sample is rejected.
- Constructive constraints:
Return a modified numpy array.
Internally, constraints are unified to return either: - np.ndarray (valid sample) - None (invalid sample)
Sampling is performed with retry logic up to max_retries. Parallel generation is supported via joblib.
- property constraints
Get a registered constraint class
- reset_constraints()
Clear all registered constraints.
- sample(n_samples: int) ndarray
Generate samples satisfying all registered constraints.
- Parameters:
n_samples (int) – Total number of samples to generate.
- Returns:
Array of shape (n_samples, n_features).
- Return type:
np.ndarray
- set_constraints(constraint_fn: str | Callable[[ndarray], bool], reset=False, **kwargs)
Set a constraint to the sampler. :param constraint_fn: Type of constraint. Supported types:
callable: user-defined function that takes a row and returns a boolean or a new row. cols must be provided in kwargs.
“sum”: constraint based on the sum of selected columns. sum_value and cols must be provided in kwargs.
“sumint”: similar to “sum” but ensures the sum is an integer. sum_value and cols must be provided in kwargs.
“multihot”: constraint ensuring a specified number of columns in a set are active.`n_hot` and cols must be provided in kwargs.
“random”: constraint selecting a random subset of columns. cols, min_used, and max_used can be provided in kwargs.
“range”: constraint setting values within a specified range. cols, low, and high must be provided in kwargs.
“categories”: constraint selecting from a list of categorical values. cols and values must be provided in kwargs.
“step”: constraint selecting values that are multiples of a step. cols and step, low, and high must be provided in kwargs.
“stepsum”: constraint ensuring the sum of values is a multiple of a step.
- Parameters:
reset (bool, default=True) – If True, clears existing constraints before adding the new one.
**kwargs – Additional parameters specific to the constraint type.
Examples
>>> from mlsampler import RandomSampler >>> sampler = RandomSampler.setup(X_train) >>> # Custom function constraint >>> sampler.set_constraints(lambda x: (0 < x[0] < 1) and (0 < x[1] < 1)) >>> # Sum constraint >>> sampler.set_constraints("sum", sum_value=1, cols=[2, 3, 4], max_used=2)