Monte Carlo testing with the Parameter Sweep

Overview

This guide introduces two important features of the parameter sweep tool: (1) the ability to prescribe random model values and (2) the ability to take advantage of parallel computing resources for large runs. It is recommended that new users familiarize themselves with the beginner parameter sweep guide before proceeding.

How To

As before, we begin by importing or explicitly programming any functions relating to flowsheet building/specification, simulation, and optimization setup steps. We will use the same RO with energy recovery flowsheet for this example.

# replace this with your own flowsheet module, e.g.
# import my_flowsheet_module as mfm
import watertap.examples.flowsheets.RO_with_energy_recovery.RO_with_energy_recovery as RO_flowsheet

Once this is done, we import the parameter sweep tool and two different random sampling classes

from watertap.tools.parameter_sweep import parameter_sweep, UniformSample, NormalSample

The parameter sweep tool currently offers three random classes:

UniformSample: Draw samples uniformly from a given upper and lower range.
NormalSample: Draw samples from a normal distribution given a mean and standard deviation.
LatinHypercubeSample: Draw samples using a Latin Hypercube algorithm which may yield a more complete exploration of high-dimensional parameter spaces. Note that currently this sample type may not be combined with other sampling types.

We will use the same setup steps as before which returns a flowsheet model m, and performs some initialization

# replace these function calls with
# those in your own flowsheet module

# set up system
m = RO_flowsheet.build()
RO_flowsheet.set_operating_conditions(m)
RO_flowsheet.initialize_system(m)

# simulate
RO_flowsheet.solve(m)

# set up the model for optimization
RO_flowsheet.optimize_set_up(m)

...

Once the model has been setup, we specify the variables to randomly sample using a dictionary

sweep_params = dict()
sweep_params['Spacer_porosity'] = UniformSample(m.fs.RO.spacer_porosity, 0.95, 0.99)
sweep_params['A_comp'] = NormalSample(m.fs.RO.A_comp, 4.0e-12, 0.5e-12)
sweep_params['B_comp'] = NormalSample(m.fs.RO.B_comp, 3.5e-8, 0.5e-8)

where the spacer_porosity attribute will be randomly selected from a uniform distribution of values in the range \([0.95, 0.99]\) and model values A_comp and B_comp will be drawn from normal distributions centered at \(4.0\times10^{-12}\) and \(3.5\times10^{-8}\) with standard deviations of \(12-14\%\), respectively. For this example, we’ll extract flowsheet outputs associated with cost, the levelized cost of water (LCOW) and energy consumption (EC), defined via another dictionary

outputs = dict()
outputs['EC'] = m.fs.costing.specific_energy_consumption
outputs['LCOW'] = m.fs.costing.LCOW

With the flowsheet defined and suitably initialized, along with the definitions for sweep_params and outputs on hand, we can call the parameter_sweep function as before, where we exercise four new keyword arguments: (1) the ability to pass in custom optimization routines to be executed for each sample, (2) the ability to save per-process results for parallel debugging, (3) the specification of the number of samples to draw, and (4) the ability to set a seed for the randomly-generated values which allows consistency to be enforced between runs. The function passed in to optimize_function should return a Pyomo results object (i.e., the return value from calling the solve method).

# Define the local results directory, num_samples, and seed (if desired)
debugging_data_dir = 'local_results'
num_samples = 25
seed = None

# Run the parameter sweep
global_results = parameter_sweep(m, sweep_params, outputs, csv_results_file='monte_carlo_results.csv',
    optimize_function=RO_flowsheet.optimize, debugging_data_dir=debugging_data_dir, num_samples=num_samples, seed=seed)

Note that num_samples must be provided for any of the random sample classes. For the very small problem size and simple model used here, parallel hardware is almost certainly not necessary. However, for larger total numbers of samples or more computationally demanding models, a significant speedup may be attained on a multi-core workstation or high performance computing (HPC) cluster. To distribute the workload between more than one worker, simply call the scipt using the mpirun command from the command line

mpirun -n 4 python mc_sweep.py

which will parallelize the requested parameter sweep between 4 computational units, where mc_sweep.py contains the collection of code snippets shown above ending with the call to parameter_sweep. Note that there is no requirement that the number of samples be evenly divisible by the number of workers. In the example shown here with 25 samples and 4 workers, worker 0 processes 7 samples while workers 1-3 process 6 each (you can verify this by examining the four output files in the local_results directory). In most cases, evenly distributing the workload in this way ensures that each worker finishes at roughly the same time. When each worker has finished, their inidividual results are aggregated into a single result file, monte_carlo_results.csv.

For more information, consult the technical reference for the parameter sweep tool.

Function Documentation

class watertap.tools.parameter_sweep.FixedSample(pyomo_object, *args, **kwargs)[source]

class watertap.tools.parameter_sweep.LatinHypercubeSample(pyomo_object, *args, **kwargs)[source]

class watertap.tools.parameter_sweep.LinearSample(pyomo_object, *args, **kwargs)[source]

class watertap.tools.parameter_sweep.NormalSample(pyomo_object, *args, **kwargs)[source]

class watertap.tools.parameter_sweep.RandomSample(pyomo_object, *args, **kwargs)[source]

class watertap.tools.parameter_sweep.SamplingType(value)[source]: An enumeration.

class watertap.tools.parameter_sweep.UniformSample(pyomo_object, *args, **kwargs)[source]

watertap.tools.parameter_sweep.parameter_sweep(model, sweep_params, outputs=None, csv_results_file=None, h5_results_file=None, optimize_function=<function _default_optimize>, optimize_kwargs=None, reinitialize_function=None, reinitialize_kwargs=None, reinitialize_before_sweep=False, mpi_comm=None, debugging_data_dir=None, interpolate_nan_outputs=False, num_samples=None, seed=None)[source]

This function offers a general way to perform repeated optimizations of a model for the purposes of exploring a parameter space while monitoring multiple outputs. If provided, writes single CSV file to results_file with all inputs and resulting outputs.

Parameters

model – A Pyomo ConcreteModel containing a watertap flowsheet, for best results it should be initialized before being passed to this function.
sweep_params – A dictionary containing the values to vary with the format sweep_params['Short/Pretty-print Name'] = (model.fs.variable_or_param[index], lower_limit, upper_limit, num_samples). A uniform number of samples num_samples will be take between the lower_limit and upper_limit.
outputs – An optional dictionary containing “short names” as keys and and Pyomo objects on model whose values to report as values. E.g., outputs['Short/Pretty-print Name'] = model.fs.variable_or_expression_to_report. If not provided, i.e., outputs = None, the default behavior is to save all model variables, parameters, and expressions which provides very thorough results at the cost of large file sizes.
csv_results_file (optional) – The path and file name where the results are to be saved; subdirectories will be created as needed.
h5_results_file (optional) – The file name without the extension where the results are to be saved; The path is identified from the arguments of csv_results_file. This filename is used when creating the H5 file and the companion text file which contains the variable names contained within the H5 file.
optimize_function (optional) – A user-defined function to perform the optimization of flowsheet model and loads the results back into model. The first argument of this function is model. The default uses the default IDAES solver, raising an exception if the termination condition is not optimal.
optimize_kwargs (optional) – Dictionary of kwargs to pass into every call to optimize_function. The first arg will always be model, e.g., optimize_function(model, **optimize_kwargs). The default uses no kwargs.
reinitialize_function (optional) – A user-defined function to perform the re-initialize the flowsheet model if the first call to optimize_function fails for any reason. After reinitialize_function, the parameter sweep tool will immediately call optimize_function again.
reinitialize_kwargs (optional) – Dictionary or kwargs to pass into every call to reinitialize_function. The first arg will always be model, e.g., reinitialize_function(model, **reinitialize_kwargs). The default uses no kwargs.
reinitialize_before_sweep (optional) – Boolean option to reinitialize the flow sheet model before every parameter sweep realization. The default is False. Note the parameter sweep model will try to reinitialize the solve regardless of the option if the run fails.
mpi_comm (optional) – User-provided MPI communicator for parallel parameter sweeps. If None COMM_WORLD will be used. The default is sufficient for most users.
debugging_data_dir (optional) – Save results on a per-process basis for parallel debugging purposes. If None no debugging data will be saved.
interpolate_nan_outputs (optional) – When the parameter sweep has finished, interior values of np.nan will be replaced with a value obtained via a linear interpolation of their surrounding valid neighbors. If true, a second output file with the extension “_clean” will be saved alongside the raw (un-interpolated) values.
num_samples (optional) – If the user is using sampling techniques rather than a linear grid of values, they need to set the number of samples
seed (optional) – If the user is using a random sampling technique, this sets the seed

Returns

A list were the first N columns are the values of the parameters passed: by sweep_params and the remaining columns are the values of the simulation identified by the outputs argument.

Return type

save_data