zoo.common package¶
Submodules¶
zoo.common.nncontext module¶
-
class
zoo.common.nncontext.ZooContext[source]¶ Bases:
objectTrain a PyTorch model using distributed PyTorch.
Launches a set of actors which connect via distributed PyTorch and coordinate gradient updates to train the provided model. If Ray is not initialized, TorchTrainer will automatically initialize a local Ray cluster for you. Be sure to run ray.init(address=”auto”) to leverage multi-node training.
Parameters: - training_operator_cls (type) – Custom training operator class that subclasses the TrainingOperator class. This class will be copied onto all remote workers and used to specify training components and custom training and validation operations.
- initialization_hook (function) – A function to call on all training workers when they are first initialized. This could be useful to set environment variables for all the worker processes.
- config (dict) – Custom configuration value to be passed to all operator constructors.
- num_workers (int) – the number of workers used in distributed training. If 1, the worker will not be wrapped with DistributedDataParallel. TorchTrainer will scale down the number of workers if enough resources are not available, and will scale back up once they are. The total number of workers will never exceed num_workers amount.
-
class
zoo.common.nncontext.ZooContextMeta[source]¶ Bases:
type-
log_output¶ Whether to redirect Spark driver JVM’s stdout and stderr to the current python process. This is useful when running Analytics Zoo in jupyter notebook. Default to be False. Needs to be set before initializing SparkContext.
-
-
zoo.common.nncontext.getOrCreateSparkContext(conf=None, appName=None)[source]¶ Get the current active SparkContext or create a new SparkContext. :param conf: An instance of SparkConf. If not specified, a new SparkConf with
Analytics Zoo and BigDL configurations would be created and used.Parameters: appName – The name of the application if any. Returns: - A dictionary of metrics for validation.
- You can provide custom metrics by passing in a custom
training_operator_cls.
-
zoo.common.nncontext.get_optimizer_version(bigdl_type='float')[source]¶ Get DistriOptimizer version. return optimizerVersion
-
zoo.common.nncontext.init_nncontext(conf=None, spark_log_level='WARN', redirect_spark_log=True)[source]¶
-
zoo.common.nncontext.init_spark_on_k8s(master, container_image, num_executors, executor_cores, executor_memory='2g', driver_memory='1g', driver_cores=4, extra_executor_memory_for_ray=None, extra_python_lib=None, spark_log_level='WARN', redirect_spark_log=True, jars=None, conf=None, python_location=None)[source]¶ Returns the local TrainingOperator object.
Be careful not to perturb its state, or else you can cause the system to enter an inconsistent state.
Parameters: - num_steps (int) – Number of batches to compute update steps on
per worker. This corresponds also to the number of times
TrainingOperator.validate_batchis called per worker. - profile (bool) – Returns time stats for the evaluation procedure.
- reduce_results (bool) – Whether to average all metrics across all workers into one dict. If a metric is a non-numerical value (or nested dictionaries), one value will be randomly selected among the workers. If False, returns a list of dicts.
- info – Optional dictionary passed to the training operator for validate and validate_batch.
- num_steps (int) – Number of batches to compute update steps on
per worker. This corresponds also to the number of times
-
zoo.common.nncontext.init_spark_on_local(cores=2, conf=None, python_location=None, spark_log_level='WARN', redirect_spark_log=True)[source]¶ Saves the Trainer state to the provided checkpoint path.
Parameters: checkpoint (str) – Path to target checkpoint file.
-
zoo.common.nncontext.init_spark_on_yarn(hadoop_conf, conda_name, num_executors, executor_cores, executor_memory='2g', driver_cores=4, driver_memory='1g', extra_executor_memory_for_ray=None, extra_python_lib=None, penv_archive=None, additional_archive=None, hadoop_user_name='root', spark_yarn_archive=None, spark_log_level='WARN', redirect_spark_log=True, jars=None, conf=None)[source]¶ Returns the local TrainingOperator object.
Be careful not to perturb its state, or else you can cause the system to enter an inconsistent state.
Returns: The local TrainingOperator object. Return type: TrainingOperator
-
zoo.common.nncontext.init_spark_standalone(num_executors, executor_cores, executor_memory='2g', driver_cores=4, driver_memory='1g', master=None, extra_executor_memory_for_ray=None, extra_python_lib=None, spark_log_level='WARN', redirect_spark_log=True, conf=None, jars=None, python_location=None, enable_numa_binding=False)[source]¶ Returns the local TrainingOperator object.
Be careful not to perturb its state, or else you can cause the system to enter an inconsistent state.
Returns: The local TrainingOperator object. Return type: TrainingOperator
-
zoo.common.nncontext.load_conf(conf_str, split_char=None)[source]¶ Saves the Trainer state to the provided checkpoint path.
Parameters: checkpoint (str) – Path to target checkpoint file.
zoo.common.utils module¶
-
class
zoo.common.utils.JTensor(storage, shape, bigdl_type='float', indices=None)[source]¶ Bases:
bigdl.util.common.JTensor
-
class
zoo.common.utils.Sample(features, labels, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.Sample-
classmethod
from_ndarray(features, labels, bigdl_type='float')[source]¶ Convert a ndarray of features and labels to Sample, which would be used in Java side. :param features: an ndarray or a list of ndarrays :param labels: an ndarray or a list of ndarrays or a scalar :param bigdl_type: “double” or “float”
>>> import numpy as np >>> from bigdl.util.common import callBigDlFunc >>> from numpy.testing import assert_allclose >>> np.random.seed(123) >>> sample = Sample.from_ndarray(np.random.random((2,3)), np.random.random((2,3))) >>> sample_back = callBigDlFunc("float", "testSample", sample) >>> assert_allclose(sample.features[0].to_ndarray(), sample_back.features[0].to_ndarray()) >>> assert_allclose(sample.label.to_ndarray(), sample_back.label.to_ndarray()) >>> expected_feature_storage = np.array(([[0.69646919, 0.28613934, 0.22685145], [0.55131477, 0.71946895, 0.42310646]])) >>> expected_feature_shape = np.array([2, 3]) >>> expected_label_storage = np.array(([[0.98076421, 0.68482971, 0.48093191], [0.39211753, 0.343178, 0.72904968]])) >>> expected_label_shape = np.array([2, 3]) >>> assert_allclose(sample.features[0].storage, expected_feature_storage, rtol=1e-6, atol=1e-6) >>> assert_allclose(sample.features[0].shape, expected_feature_shape) >>> assert_allclose(sample.labels[0].storage, expected_label_storage, rtol=1e-6, atol=1e-6) >>> assert_allclose(sample.labels[0].shape, expected_label_shape)
-
classmethod