zoo.pipeline.api.keras.datasets package

Submodules

zoo.pipeline.api.keras.datasets.boston_housing module

zoo.pipeline.api.keras.datasets.boston_housing.load_data(path='boston_housing.npz', dest_dir='/tmp/.zoo/dataset', test_split=0.2)[source]
Loads the Boston Housing dataset, the source url of download

is copied from keras.datasets

# Arguments

dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency

so that the less frequent words would be abandoned

oov_char: index to pad the abandoned words, if None, one abandoned word

would be taken place with its next word and total length -= 1

test_split: the ratio to split part of dataset to test data,

the remained data would be train data

# Returns

Tuple of Numpy arrays: (x_train, y_train), (x_test, y_test).

zoo.pipeline.api.keras.datasets.boston_housing.shuffle_by_seed(arr_list, seed=0)[source]

zoo.pipeline.api.keras.datasets.imdb module

zoo.pipeline.api.keras.datasets.imdb.download_imdb(dest_dir)[source]

Download pre-processed IMDB movie review data

:argument

dest_dir: destination directory to store the data

:return

The absolute path of the stored data

zoo.pipeline.api.keras.datasets.imdb.get_word_index(dest_dir='/tmp/.zoo/dataset', filename='imdb_word_index.pkl')[source]

Retrieves the dictionary mapping word indices back to words.

# Arguments

dest_dir: where to cache the data (relative to ~/.zoo/dataset). filename: dataset file name

# Returns

The word index dictionary.

zoo.pipeline.api.keras.datasets.imdb.load_data(dest_dir='/tmp/.zoo/dataset', nb_words=None, oov_char=2)[source]

Load IMDB dataset.

:argument

dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency

so that the less frequent words would be abandoned

oov_char: index to pad the abandoned words, if None, one abandoned word

would be taken place with its next word and total length -= 1

:return

the train, test separated IMDB dataset.

zoo.pipeline.api.keras.datasets.imdb.shuffle_by_seed(arr_list, seed=0)[source]

zoo.pipeline.api.keras.datasets.mnist module

zoo.pipeline.api.keras.datasets.mnist.extract_images(f)[source]

Extract the images into a 4D uint8 numpy array [index, y, x, depth].

Param

f: A file object that can be passed into a gzip reader.

Returns

data: A 4D unit8 numpy array [index, y, x, depth].

Raise

ValueError: If the bytestream does not start with 2051.

zoo.pipeline.api.keras.datasets.mnist.extract_labels(f)[source]
zoo.pipeline.api.keras.datasets.mnist.load_data(location='/tmp/.zoo/dataset/mnist')[source]
zoo.pipeline.api.keras.datasets.mnist.read_data_sets(train_dir, data_type='train')[source]

Parse or download mnist data if train_dir is empty.

Param

train_dir: The directory storing the mnist data

Param

data_type: Reading training set or testing set.It can be either “train” or “test”

Returns

` (ndarray, ndarray) representing (features, labels) features is a 4D unit8 numpy array [index, y, x, depth] representing each pixel valued from 0 to 255. labels is 1D unit8 nunpy array representing the label valued from 0 to 9. `

zoo.pipeline.api.keras.datasets.reuters module

zoo.pipeline.api.keras.datasets.reuters.download_reuters(dest_dir)[source]

Download pre-processed reuters newswire data

:argument

dest_dir: destination directory to store the data

:return

The absolute path of the stored data

zoo.pipeline.api.keras.datasets.reuters.get_word_index(dest_dir='/tmp/.zoo/dataset', filename='reuters_word_index.pkl')[source]

Retrieves the dictionary mapping word indices back to words.

# Arguments

dest_dir: where to cache the data (relative to ~/.zoo/dataset). filename: dataset file name

# Returns

The word index dictionary.

zoo.pipeline.api.keras.datasets.reuters.load_data(dest_dir='/tmp/.zoo/dataset', nb_words=None, oov_char=2, test_split=0.2)[source]

Load reuters dataset.

:argument

dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency

so that the less frequent words would be abandoned

oov_char: index to pad the abandoned words, if None, one abandoned word

would be taken place with its next word and total length -= 1

test_split: the ratio to split part of dataset to test data,

the remained data would be train data

:return

the train, test separated reuters dataset.

zoo.pipeline.api.keras.datasets.reuters.shuffle_by_seed(arr_list, seed=0)[source]

Module contents