zoo.orca.data.pandas package¶
Submodules¶
zoo.orca.data.pandas.preprocessing module¶
-
zoo.orca.data.pandas.preprocessing.read_csv(file_path, **kwargs)[source]¶ Read csv files to SparkXShards of pandas DataFrames.
Parameters: file_path – A csv file path, a list of multiple csv file paths, or a directory containing csv files. Local file system, HDFS, and AWS S3 are supported. :param kwargs: You can specify read_csv options supported by pandas. :return: An instance of SparkXShards.
-
zoo.orca.data.pandas.preprocessing.read_json(file_path, **kwargs)[source]¶ Read json files to SparkXShards of pandas DataFrames.
Parameters: file_path – A json file path, a list of multiple json file paths, or a directory containing json files. Local file system, HDFS, and AWS S3 are supported. :param kwargs: You can specify read_json options supported by pandas. :return: An instance of SparkXShards.
-
zoo.orca.data.pandas.preprocessing.read_parquet(file_path, columns=None, **kwargs)[source]¶ Read parquet files to SparkXShards of pandas DataFrames.
Parameters: file_path – Parquet file path, a list of multiple parquet file paths, or a directory containing parquet files. Local file system, HDFS, and AWS S3 are supported. :param columns: list of column name, default=None. If not None, only these columns will be read from the file. :param kwargs: Any additional kwargs. :return: An instance of SparkXShards.