How to do Monte Carlo testing with Parameter Sweep

Overview

This guide introduces two important features of the parameter sweep tool: (1) the ability to prescribe random model values and (2) the ability to take advantage of parallel computing resources for large runs. It is recommended that new users familiarize themselves with the beginner parameter sweep guide before proceeding.

How To

As before, we begin by importing or explicitly programming any functions relating to flowsheet building/specification, simulation, and optimization setup steps. We will use the same RO with energy recovery flowsheet for this example.

# replace this with your own flowsheet module, e.g.
# import my_flowsheet_module as mfm
import watertap.flowsheets.RO_with_energy_recovery.RO_with_energy_recovery as RO_flowsheet

Once this is done, we import the parameter sweep tool and two different random sampling classes

from parameter_sweep import parameter_sweep, UniformSample, NormalSample

The parameter sweep tool currently offers 6 classes that can be broadly categorized under the following 3 categroies:

FIXED: Final result is a mesh-grid matrix of parameter sweep values from the individual variable vectors.
- LinearSample: Draw samples that are evenly spaced within a specified interval.
- GeomSample: Draw samples that are evenly spaced within a specified interval on a negative log scale.
- ReverseGeomSample: Draw samples that are evenly spaced within a specified interval on a log scale
- PredeterminedFixedSample: Samples are predetermined by an external tool and supplied as arrays or list. This class wrangles data into a format compatible with parameter sweep that uses fixed sampling.
RANDOM: Final result is an array that combines individual variable vectors into a matrix. The individual sweep variable vectors must be of the same length.
- UniformSample: Draw samples uniformly from a given upper and lower range.
- NormalSample: Draw samples from a normal distribution given a mean and standard deviation.
- PredeterminedRandomSample: Samples are predetermined by an external tool and supplied as arrays or list.. This class wrangles data into a format compatible with parameter sweep that uses random sampling.
RANDOM_LHS: Final result is similar to the RANDOM category.
- LatinHypercubeSample: Draw samples using a Latin Hypercube algorithm which may yield a more complete exploration of high-dimensional parameter spaces. Note that currently this sample type may not be combined with other sampling types.

Note that sampling types within FIXED can be specified with a different number of samples. However, the number of samples must be the same for all sweep variables within RANDOM and RANDOM_LHS.

We will use the same setup steps as before to set up the generating functions for our model, sweep params, and outputs:

def build_model(**kwargs):

    # replace these function calls with
    # those in your own flowsheet module

    # set up system
    m = RO_flowsheet.build()
    RO_flowsheet.set_operating_conditions(m)
    RO_flowsheet.initialize_system(m)

    # simulate
    RO_flowsheet.solve(m)

    # set up the model for optimization
    RO_flowsheet.optimize_set_up(m)

    return m

Once the model has been setup, we specify the variables to randomly sample using a dictionary

def build_sweep_params(model, num_samples=1):
    sweep_params = dict()
    sweep_params["Spacer_porosity"] = UniformSample(
        model.fs.RO.feed_side.spacer_porosity, 0.95, 0.99, num_samples
    )
    sweep_params["A_comp"] = NormalSample(
        model.fs.RO.A_comp, 4.0e-12, 0.5e-12, num_samples
    )
    sweep_params["B_comp"] = NormalSample(
        model.fs.RO.B_comp, 3.5e-8, 0.5e-8, num_samples
    )
    return sweep_params

where the spacer_porosity attribute will be randomly selected from a uniform distribution of values in the range \([0.95, 0.99]\) and model values A_comp and B_comp will be drawn from normal distributions centered at \(4.0\times10^{-12}\) and \(3.5\times10^{-8}\) with standard deviations of \(12-14\%\), respectively. For this example, we’ll extract flowsheet outputs associated with cost, the levelized cost of water (LCOW) and energy consumption (EC), defined via another dictionary

def build_outputs(model,  **kwargs):
    outputs = dict()
    outputs['EC'] = model.fs.costing.specific_energy_consumption
    outputs['LCOW'] = model.fs.costing.LCOW
    return outputs

With the generating functions defined and suitably initialized, we can call the parameter_sweep function as before, where we exercise five new keyword arguments: (1) the ability to pass in custom optimization routines to be executed for each sample, (2) the ability to save per-process results for parallel debugging, (3) the specification of the number of samples to draw, (4) the ability to set a seed for the randomly-generated values which allows consistency to be enforced between runs, and (5) the ability to pass a keyword arg into the build_sweep_params function. The function passed in to optimize_function should return a Pyomo results object (i.e., the return value from calling the solve method).

# Define the local results directory, num_samples, and seed (if desired)
debugging_data_dir = "local_results"
num_samples = 25
seed = None

# Run the parameter sweep
global_results = parameter_sweep(
    build_model,
    build_sweep_params,
    build_outputs,
    csv_results_file_name="monte_carlo_results.csv",
    optimize_function=RO_flowsheet.optimize,
    debugging_data_dir=debugging_data_dir,
    num_samples=num_samples,
    seed=seed,
    build_sweep_params_kwargs=dict(num_samples=num_samples),
)

Note that num_samples must be provided for any of the random sample classes. For the very small problem size and simple model used here, parallel hardware is almost certainly not necessary. However, for larger total numbers of samples or more computationally demanding models, a significant speedup may be attained on a multi-core workstation or high performance computing (HPC) cluster. To distribute the workload between more than one worker, simply call the scipt using the mpirun command from the command line

mpirun -n 4 python mc_sweep.py

which will parallelize the requested parameter sweep between 4 computational units, where mc_sweep.py contains the collection of code snippets shown above ending with the call to parameter_sweep. Note that there is no requirement that the number of samples be evenly divisible by the number of workers. In the example shown here with 25 samples and 4 workers, worker 0 processes 7 samples while workers 1-3 process 6 each (you can verify this by examining the four output files in the local_results directory). In most cases, evenly distributing the workload in this way ensures that each worker finishes at roughly the same time. When each worker has finished, their inidividual results are aggregated into a single result file, monte_carlo_results.csv.

Module Documentation

parameter_sweep