zoo.pipeline.api.keras.datasets package

Submodules

zoo.pipeline.api.keras.datasets.boston_housing module

zoo.pipeline.api.keras.datasets.boston_housing.load_data(path='boston_housing.npz', dest_dir='/tmp/.zoo/dataset', test_split=0.2)[source]
Loads the Boston Housing dataset, the source url of download
is copied from keras.datasets
# Arguments

dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency

so that the less frequent words would be abandoned
oov_char: index to pad the abandoned words, if None, one abandoned word
would be taken place with its next word and total length -= 1
test_split: the ratio to split part of dataset to test data,
the remained data would be train data
# Returns
Tuple of Numpy arrays: (x_train, y_train), (x_test, y_test).
zoo.pipeline.api.keras.datasets.boston_housing.shuffle_by_seed(arr_list, seed=0)[source]

zoo.pipeline.api.keras.datasets.imdb module

zoo.pipeline.api.keras.datasets.imdb.download_imdb(dest_dir)[source]

Download pre-processed IMDB movie review data

:argument
dest_dir: destination directory to store the data
:return
The absolute path of the stored data
zoo.pipeline.api.keras.datasets.imdb.get_word_index(dest_dir='/tmp/.zoo/dataset', filename='imdb_word_index.pkl')[source]

Retrieves the dictionary mapping word indices back to words.

# Arguments
dest_dir: where to cache the data (relative to ~/.zoo/dataset). filename: dataset file name
# Returns
The word index dictionary.
zoo.pipeline.api.keras.datasets.imdb.load_data(dest_dir='/tmp/.zoo/dataset', nb_words=None, oov_char=2)[source]

Load IMDB dataset.

:argument

dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency

so that the less frequent words would be abandoned
oov_char: index to pad the abandoned words, if None, one abandoned word
would be taken place with its next word and total length -= 1
:return
the train, test separated IMDB dataset.
zoo.pipeline.api.keras.datasets.imdb.shuffle_by_seed(arr_list, seed=0)[source]

zoo.pipeline.api.keras.datasets.mnist module

zoo.pipeline.api.keras.datasets.mnist.extract_images(f)[source]

Extract the images into a 4D uint8 numpy array [index, y, x, depth].

Param:f: A file object that can be passed into a gzip reader.
Returns:data: A 4D unit8 numpy array [index, y, x, depth].
Raise:ValueError: If the bytestream does not start with 2051.
zoo.pipeline.api.keras.datasets.mnist.extract_labels(f)[source]
zoo.pipeline.api.keras.datasets.mnist.load_data(location='/tmp/.zoo/dataset/mnist')[source]
zoo.pipeline.api.keras.datasets.mnist.read_data_sets(train_dir, data_type='train')[source]

Parse or download mnist data if train_dir is empty.

Param:train_dir: The directory storing the mnist data
Param:data_type: Reading training set or testing set.It can be either “train” or “test”
Returns:

` (ndarray, ndarray) representing (features, labels) features is a 4D unit8 numpy array [index, y, x, depth] representing each pixel valued from 0 to 255. labels is 1D unit8 nunpy array representing the label valued from 0 to 9. `

zoo.pipeline.api.keras.datasets.reuters module

zoo.pipeline.api.keras.datasets.reuters.download_reuters(dest_dir)[source]

Download pre-processed reuters newswire data

:argument
dest_dir: destination directory to store the data
:return
The absolute path of the stored data
zoo.pipeline.api.keras.datasets.reuters.get_word_index(dest_dir='/tmp/.zoo/dataset', filename='reuters_word_index.pkl')[source]

Retrieves the dictionary mapping word indices back to words.

# Arguments
dest_dir: where to cache the data (relative to ~/.zoo/dataset). filename: dataset file name
# Returns
The word index dictionary.
zoo.pipeline.api.keras.datasets.reuters.load_data(dest_dir='/tmp/.zoo/dataset', nb_words=None, oov_char=2, test_split=0.2)[source]

Load reuters dataset.

:argument

dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency

so that the less frequent words would be abandoned
oov_char: index to pad the abandoned words, if None, one abandoned word
would be taken place with its next word and total length -= 1
test_split: the ratio to split part of dataset to test data,
the remained data would be train data
:return
the train, test separated reuters dataset.
zoo.pipeline.api.keras.datasets.reuters.shuffle_by_seed(arr_list, seed=0)[source]

Module contents