zoo.pipeline.api.keras.datasets package¶

Submodules¶

zoo.pipeline.api.keras.datasets.boston_housing module¶

zoo.pipeline.api.keras.datasets.boston_housing.load_data(path='boston_housing.npz', dest_dir='/tmp/.zoo/dataset', test_split=0.2)[source]¶

Loads the Boston Housing dataset, the source url of download

is copied from keras.datasets

# Arguments

dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency

so that the less frequent words would be abandoned

oov_char: index to pad the abandoned words, if None, one abandoned word: would be taken place with its next word and total length -= 1
test_split: the ratio to split part of dataset to test data,: the remained data would be train data

# Returns

Tuple of Numpy arrays: (x_train, y_train), (x_test, y_test).

zoo.pipeline.api.keras.datasets.boston_housing.shuffle_by_seed(arr_list, seed=0)[source]¶

zoo.pipeline.api.keras.datasets.imdb module¶

zoo.pipeline.api.keras.datasets.imdb.download_imdb(dest_dir)[source]¶

Download pre-processed IMDB movie review data

:argument: dest_dir: destination directory to store the data
:return: The absolute path of the stored data

zoo.pipeline.api.keras.datasets.imdb.get_word_index(dest_dir='/tmp/.zoo/dataset', filename='imdb_word_index.pkl')[source]¶

Retrieves the dictionary mapping word indices back to words.

# Arguments: dest_dir: where to cache the data (relative to ~/.zoo/dataset). filename: dataset file name
# Returns: The word index dictionary.

zoo.pipeline.api.keras.datasets.imdb.load_data(dest_dir='/tmp/.zoo/dataset', nb_words=None, oov_char=2)[source]¶

Load IMDB dataset.

:argument

dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency

so that the less frequent words would be abandoned

oov_char: index to pad the abandoned words, if None, one abandoned word: would be taken place with its next word and total length -= 1

:return

the train, test separated IMDB dataset.

zoo.pipeline.api.keras.datasets.imdb.shuffle_by_seed(arr_list, seed=0)[source]¶

zoo.pipeline.api.keras.datasets.mnist module¶

zoo.pipeline.api.keras.datasets.mnist.extract_images(f)[source]¶

Extract the images into a 4D uint8 numpy array [index, y, x, depth].

Param: f: A file object that can be passed into a gzip reader.
Returns: data: A 4D unit8 numpy array [index, y, x, depth].
Raise: ValueError: If the bytestream does not start with 2051.

zoo.pipeline.api.keras.datasets.mnist.extract_labels(f)[source]¶

zoo.pipeline.api.keras.datasets.mnist.load_data(location='/tmp/.zoo/dataset/mnist')[source]¶

zoo.pipeline.api.keras.datasets.mnist.read_data_sets(train_dir, data_type='train')[source]¶

Parse or download mnist data if train_dir is empty.

Param: train_dir: The directory storing the mnist data
Param: data_type: Reading training set or testing set.It can be either “train” or “test”
Returns

` (ndarray, ndarray) representing (features, labels) features is a 4D unit8 numpy array [index, y, x, depth] representing each pixel valued from 0 to 255. labels is 1D unit8 nunpy array representing the label valued from 0 to 9. `

zoo.pipeline.api.keras.datasets.reuters module¶

zoo.pipeline.api.keras.datasets.reuters.download_reuters(dest_dir)[source]¶

Download pre-processed reuters newswire data

:argument: dest_dir: destination directory to store the data
:return: The absolute path of the stored data

zoo.pipeline.api.keras.datasets.reuters.get_word_index(dest_dir='/tmp/.zoo/dataset', filename='reuters_word_index.pkl')[source]¶

Retrieves the dictionary mapping word indices back to words.

# Arguments: dest_dir: where to cache the data (relative to ~/.zoo/dataset). filename: dataset file name
# Returns: The word index dictionary.

zoo.pipeline.api.keras.datasets.reuters.load_data(dest_dir='/tmp/.zoo/dataset', nb_words=None, oov_char=2, test_split=0.2)[source]¶

Load reuters dataset.

:argument

dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency

so that the less frequent words would be abandoned

oov_char: index to pad the abandoned words, if None, one abandoned word: would be taken place with its next word and total length -= 1
test_split: the ratio to split part of dataset to test data,: the remained data would be train data

:return

the train, test separated reuters dataset.

zoo.pipeline.api.keras.datasets.reuters.shuffle_by_seed(arr_list, seed=0)[source]¶

analytics-zoo 0.9.0.dev0 documentation

zoo.pipeline.api.keras.datasets package¶

Submodules¶

zoo.pipeline.api.keras.datasets.boston_housing module¶

zoo.pipeline.api.keras.datasets.imdb module¶

zoo.pipeline.api.keras.datasets.mnist module¶

zoo.pipeline.api.keras.datasets.reuters module¶

Module contents¶