zoo.util package¶

Submodules¶

zoo.util.engine module¶

zoo.util.engine.check_spark_source_conflict(spark_home, pyspark_path)[source]¶

zoo.util.engine.compare_version(version1, version2)[source]¶

Compare version strings. Return 1 if version1 is after version2;

-1 if version1 is before version2;

0 if two versions are the same.

zoo.util.engine.exist_pyspark()[source]¶

zoo.util.engine.get_analytics_zoo_classpath()[source]¶: Get and return the jar path for analytics-zoo if exists.

zoo.util.engine.is_spark_below_2_2()[source]¶: Check if spark version is below 2.2.

zoo.util.engine.prepare_env()[source]¶

zoo.util.nest module¶

zoo.util.nest.flatten(seq)[source]¶

zoo.util.nest.is_sequence(s)[source]¶

zoo.util.nest.pack_sequence_as(structure, flat_sequence)[source]¶

zoo.util.nest.ptensor_to_numpy(seq)[source]¶

zoo.util.spark module¶

class zoo.util.spark.SparkRunner(spark_log_level='WARN', redirect_spark_log=True)[source]¶

Bases: object

create_sc(submit_args, conf)[source]¶

init_spark_on_k8s(master, container_image, num_executors, executor_cores, executor_memory='2g', driver_memory='1g', driver_cores=4, extra_executor_memory_for_ray=None, extra_python_lib=None, conf=None, jars=None, python_location=None)[source]¶

init_spark_on_local(cores, conf=None, python_location=None)[source]¶

init_spark_on_yarn(hadoop_conf, conda_name, num_executors, executor_cores, executor_memory='2g', driver_cores=4, driver_memory='1g', extra_executor_memory_for_ray=None, extra_python_lib=None, penv_archive=None, additional_archive=None, hadoop_user_name='root', spark_yarn_archive=None, conf=None, jars=None)[source]¶

init_spark_standalone(num_executors, executor_cores, executor_memory='2g', driver_cores=4, driver_memory='1g', master=None, extra_executor_memory_for_ray=None, extra_python_lib=None, conf=None, jars=None, python_location=None, enable_numa_binding=False)[source]¶

standalone_env = None¶

static stop_spark_standalone()[source]¶

zoo.util.spark.enrich_conf_for_spark(conf, driver_cores, driver_memory, num_executors, executor_cores, executor_memory, extra_executor_memory_for_ray=None)[source]¶

zoo.util.spark.gen_submit_args(driver_cores, driver_memory, num_executors, executor_cores, executor_memory, extra_python_lib=None, jars=None)[source]¶

zoo.util.tf module¶

zoo.util.tf.export_tf(sess, folder, inputs, outputs, generate_backward=False, allow_non_differentiable_input=True)[source]¶

Export the frozen tensorflow graph as well as the inputs/outputs information to the folder for inference.

This function will 1. freeze the graph (replace all variables with constants) 2. strip all unused node as specified by inputs and outputs 3. add placeholder nodes as needed 4. write the frozen graph and inputs/outputs names to the folder

Note: There should not be any queuing operation between inputs and outputs

Parameters:	sess – tensorflow session holding the variables to be saved folder – the folder where graph file and inputs/outputs information are saved inputs – a list of tensorflow tensors that will be fed during inference outputs – a list of tensorflow tensors that will be fetched during inference
Returns:

zoo.util.tf.process_grad(grad)[source]¶

zoo.util.tf.strip_unused(input_graph_def, input_tensor_names, output_tensor_names, placeholder_type_enum)[source]¶

Removes unused nodes from a GraphDef.

Parameters:	input_graph_def – A graph with nodes we want to prune. input_tensor_names – A list of the nodes we use as inputs. output_tensor_names – A list of the output nodes. placeholder_type_enum – The AttrValue enum for the placeholder data type, or a list that specifies one value per input node name.
Returns:	A GraphDef with all unnecessary ops removed. and a map containing the old input names to the new input names
Raises:	`ValueError` – If any element in input_node_names refers to a tensor instead of an operation. `KeyError` – If any element in input_node_names is not found in the graph.

zoo.util.tf_graph_util module¶

zoo.util.utils module¶

zoo.util.utils.detect_python_location()[source]¶

zoo.util.utils.get_conda_python_path()[source]¶

zoo.util.utils.get_executor_conda_zoo_classpath(conda_path)[source]¶

zoo.util.utils.get_node_ip()[source]¶: This function is ported from ray to get the ip of the current node. In the settings where Ray is not involved, calling ray.services.get_node_ip_address would introduce Ray overhead.

zoo.util.utils.get_zoo_bigdl_classpath_on_driver()[source]¶

zoo.util.utils.pack_conda_main(conda_name, tmp_path)[source]¶

zoo.util.utils.pack_penv(conda_name, output_name)[source]¶

zoo.util.utils.set_python_home()[source]¶

zoo.util.utils.to_sample_rdd(x, y, sc, num_slices=None)[source]¶: Convert x and y into RDD[Sample] :param sc: SparkContext :param x: ndarray and the first dimension should be batch :param y: ndarray and the first dimension should be batch :param num_slices: The number of partitions for x and y. :return: