zoo.common package

Submodules

zoo.common.nncontext module

class zoo.common.nncontext.ZooContext[source]

Bases: object

Train a PyTorch model using distributed PyTorch.

Launches a set of actors which connect via distributed PyTorch and coordinate gradient updates to train the provided model. If Ray is not initialized, TorchTrainer will automatically initialize a local Ray cluster for you. Be sure to run ray.init(address=”auto”) to leverage multi-node training.

Parameters:
  • training_operator_cls (type) – Custom training operator class that subclasses the TrainingOperator class. This class will be copied onto all remote workers and used to specify training components and custom training and validation operations.
  • initialization_hook (function) – A function to call on all training workers when they are first initialized. This could be useful to set environment variables for all the worker processes.
  • config (dict) – Custom configuration value to be passed to all operator constructors.
  • num_workers (int) – the number of workers used in distributed training. If 1, the worker will not be wrapped with DistributedDataParallel. TorchTrainer will scale down the number of workers if enough resources are not available, and will scale back up once they are. The total number of workers will never exceed num_workers amount.
class zoo.common.nncontext.ZooContextMeta[source]

Bases: type

log_output

Whether to redirect Spark driver JVM’s stdout and stderr to the current python process. This is useful when running Analytics Zoo in jupyter notebook. Default to be False. Needs to be set before initializing SparkContext.

zoo.common.nncontext.check_version()[source]
zoo.common.nncontext.getOrCreateSparkContext(conf=None, appName=None)[source]

Get the current active SparkContext or create a new SparkContext. :param conf: An instance of SparkConf. If not specified, a new SparkConf with

Analytics Zoo and BigDL configurations would be created and used.
Parameters:appName – The name of the application if any.
Returns:
A dictionary of metrics for validation.
You can provide custom metrics by passing in a custom training_operator_cls.
zoo.common.nncontext.get_analytics_zoo_conf()[source]
zoo.common.nncontext.get_optimizer_version(bigdl_type='float')[source]

Get DistriOptimizer version. return optimizerVersion

zoo.common.nncontext.init_env(conf)[source]
zoo.common.nncontext.init_nncontext(conf=None, spark_log_level='WARN', redirect_spark_log=True)[source]
zoo.common.nncontext.init_spark_conf(conf=None)[source]
zoo.common.nncontext.init_spark_on_k8s(master, container_image, num_executors, executor_cores, executor_memory='2g', driver_memory='1g', driver_cores=4, extra_executor_memory_for_ray=None, extra_python_lib=None, spark_log_level='WARN', redirect_spark_log=True, jars=None, conf=None, python_location=None)[source]

Returns the local TrainingOperator object.

Be careful not to perturb its state, or else you can cause the system to enter an inconsistent state.

Parameters:
  • num_steps (int) – Number of batches to compute update steps on per worker. This corresponds also to the number of times TrainingOperator.validate_batch is called per worker.
  • profile (bool) – Returns time stats for the evaluation procedure.
  • reduce_results (bool) – Whether to average all metrics across all workers into one dict. If a metric is a non-numerical value (or nested dictionaries), one value will be randomly selected among the workers. If False, returns a list of dicts.
  • info – Optional dictionary passed to the training operator for validate and validate_batch.
zoo.common.nncontext.init_spark_on_local(cores=2, conf=None, python_location=None, spark_log_level='WARN', redirect_spark_log=True)[source]

Saves the Trainer state to the provided checkpoint path.

Parameters:checkpoint (str) – Path to target checkpoint file.
zoo.common.nncontext.init_spark_on_yarn(hadoop_conf, conda_name, num_executors, executor_cores, executor_memory='2g', driver_cores=4, driver_memory='1g', extra_executor_memory_for_ray=None, extra_python_lib=None, penv_archive=None, additional_archive=None, hadoop_user_name='root', spark_yarn_archive=None, spark_log_level='WARN', redirect_spark_log=True, jars=None, conf=None)[source]

Returns the local TrainingOperator object.

Be careful not to perturb its state, or else you can cause the system to enter an inconsistent state.

Returns:The local TrainingOperator object.
Return type:TrainingOperator
zoo.common.nncontext.init_spark_standalone(num_executors, executor_cores, executor_memory='2g', driver_cores=4, driver_memory='1g', master=None, extra_executor_memory_for_ray=None, extra_python_lib=None, spark_log_level='WARN', redirect_spark_log=True, conf=None, jars=None, python_location=None, enable_numa_binding=False)[source]

Returns the local TrainingOperator object.

Be careful not to perturb its state, or else you can cause the system to enter an inconsistent state.

Returns:The local TrainingOperator object.
Return type:TrainingOperator
zoo.common.nncontext.load_conf(conf_str, split_char=None)[source]

Saves the Trainer state to the provided checkpoint path.

Parameters:checkpoint (str) – Path to target checkpoint file.
zoo.common.nncontext.set_optimizer_version(optimizerVersion, bigdl_type='float')[source]

Set DistriOptimizer version. param optimizerVersion: should be “OptimizerV1” or “OptimizerV2”.

zoo.common.nncontext.stop_spark_standalone()[source]

Stop the Spark standalone cluster created from init_spark_standalone (master not specified).

zoo.common.utils module

class zoo.common.utils.JTensor(storage, shape, bigdl_type='float', indices=None)[source]

Bases: bigdl.util.common.JTensor

classmethod from_ndarray(a_ndarray, bigdl_type='float')[source]

Convert a ndarray to a DenseTensor which would be used in Java side.

class zoo.common.utils.Sample(features, labels, bigdl_type='float')[source]

Bases: bigdl.util.common.Sample

classmethod from_ndarray(features, labels, bigdl_type='float')[source]

Convert a ndarray of features and labels to Sample, which would be used in Java side. :param features: an ndarray or a list of ndarrays :param labels: an ndarray or a list of ndarrays or a scalar :param bigdl_type: “double” or “float”

>>> import numpy as np
>>> from bigdl.util.common import callBigDlFunc
>>> from numpy.testing import assert_allclose
>>> np.random.seed(123)
>>> sample = Sample.from_ndarray(np.random.random((2,3)), np.random.random((2,3)))
>>> sample_back = callBigDlFunc("float", "testSample", sample)
>>> assert_allclose(sample.features[0].to_ndarray(), sample_back.features[0].to_ndarray())
>>> assert_allclose(sample.label.to_ndarray(), sample_back.label.to_ndarray())
>>> expected_feature_storage = np.array(([[0.69646919, 0.28613934, 0.22685145], [0.55131477, 0.71946895, 0.42310646]]))
>>> expected_feature_shape = np.array([2, 3])
>>> expected_label_storage = np.array(([[0.98076421, 0.68482971, 0.48093191], [0.39211753, 0.343178, 0.72904968]]))
>>> expected_label_shape = np.array([2, 3])
>>> assert_allclose(sample.features[0].storage, expected_feature_storage, rtol=1e-6, atol=1e-6)
>>> assert_allclose(sample.features[0].shape, expected_feature_shape)
>>> assert_allclose(sample.labels[0].storage, expected_label_storage, rtol=1e-6, atol=1e-6)
>>> assert_allclose(sample.labels[0].shape, expected_label_shape)
zoo.common.utils.append_suffix(prefix, path)[source]
zoo.common.utils.callZooFunc(bigdl_type, name, *args)[source]

Call API in PythonBigDL

zoo.common.utils.convert_to_safe_path(input_path, follow_symlinks=True)[source]
zoo.common.utils.get_file_list(path, recursive=False)[source]
zoo.common.utils.get_remote_file_to_local(remote_path, local_path, over_write=False)[source]
zoo.common.utils.is_local_path(path)[source]
zoo.common.utils.load_from_file(load_func, path)[source]
zoo.common.utils.put_local_file_to_remote(local_path, remote_path, over_write=False)[source]
zoo.common.utils.save_file(save_func, path, **kwargs)[source]
zoo.common.utils.set_core_number(num)[source]
zoo.common.utils.to_list_of_numpy(elements)[source]

Module contents