zoo.orca.data.pandas package

Submodules

zoo.orca.data.pandas.preprocessing module

zoo.orca.data.pandas.preprocessing.read_csv(file_path, **kwargs)[source]

Read csv files to SparkXShards of pandas DataFrames.

Parameters:file_path – A csv file path, a list of multiple csv file paths, or a directory

containing csv files. Local file system, HDFS, and AWS S3 are supported. :param kwargs: You can specify read_csv options supported by pandas. :return: An instance of SparkXShards.

zoo.orca.data.pandas.preprocessing.read_file_spark(file_path, file_type, **kwargs)[source]
zoo.orca.data.pandas.preprocessing.read_json(file_path, **kwargs)[source]

Read json files to SparkXShards of pandas DataFrames.

Parameters:file_path – A json file path, a list of multiple json file paths, or a directory

containing json files. Local file system, HDFS, and AWS S3 are supported. :param kwargs: You can specify read_json options supported by pandas. :return: An instance of SparkXShards.

zoo.orca.data.pandas.preprocessing.read_parquet(file_path, columns=None, **kwargs)[source]

Read parquet files to SparkXShards of pandas DataFrames.

Parameters:file_path – Parquet file path, a list of multiple parquet file paths, or a directory

containing parquet files. Local file system, HDFS, and AWS S3 are supported. :param columns: list of column name, default=None. If not None, only these columns will be read from the file. :param kwargs: Any additional kwargs. :return: An instance of SparkXShards.

Module contents