Random Sampler

A constraint-based random sampler for generating valid samples.

Overview

  • Supports continuous, integer, and categorical features

  • Allows custom constraints

  • Parallel sampling supported

API Reference

class mlsampler.engine.random.RandomSampler(config: SamplerConfig)

Bases: BaseSampler

Random constraint-based sampler.

This sampler generates samples based on feature metadata inferred from training data. Users can register constraints that are applied during sample generation.

Parameters:

config (SamplerConfig) – Configuration object containing feature metadata and sampling settings.

Notes

Two types of constraints are supported:

  • Validation constraints:

    Return a boolean. If False, the sample is rejected.

  • Constructive constraints:

    Return a modified numpy array.

Internally, constraints are unified to return either: - np.ndarray (valid sample) - None (invalid sample)

Sampling is performed with retry logic up to max_retries. Parallel generation is supported via joblib.

property constraints

Get a registered constraint class

reset_constraints()

Clear all registered constraints.

sample(n_samples: int) ndarray

Generate samples satisfying all registered constraints.

Parameters:

n_samples (int) – Total number of samples to generate.

Returns:

Array of shape (n_samples, n_features).

Return type:

np.ndarray

set_constraints(constraint_fn: str | Callable[[ndarray], bool], reset=False, **kwargs)

Set a constraint to the sampler. :param constraint_fn: Type of constraint. Supported types:

  • callable: user-defined function that takes a row and returns a boolean or a new row. cols must be provided in kwargs.

  • “sum”: constraint based on the sum of selected columns. sum_value and cols must be provided in kwargs.

  • “sumint”: similar to “sum” but ensures the sum is an integer. sum_value and cols must be provided in kwargs.

  • “multihot”: constraint ensuring a specified number of columns in a set are active.`n_hot` and cols must be provided in kwargs.

  • “random”: constraint selecting a random subset of columns. cols, min_used, and max_used can be provided in kwargs.

  • “range”: constraint setting values within a specified range. cols, low, and high must be provided in kwargs.

  • “categories”: constraint selecting from a list of categorical values. cols and values must be provided in kwargs.

  • “step”: constraint selecting values that are multiples of a step. cols and step, low, and high must be provided in kwargs.

  • “stepsum”: constraint ensuring the sum of values is a multiple of a step.

Parameters:
  • reset (bool, default=True) – If True, clears existing constraints before adding the new one.

  • **kwargs – Additional parameters specific to the constraint type.

Examples

>>> from mlsampler import RandomSampler
>>> sampler = RandomSampler.setup(X_train)
>>> # Custom function constraint
>>> sampler.set_constraints(lambda x: (0 < x[0] < 1) and (0 < x[1] < 1))
>>> # Sum constraint
>>> sampler.set_constraints("sum", sum_value=1, cols=[2, 3, 4], max_used=2)