zoo.pipeline.api.keras.datasets package¶
Submodules¶
zoo.pipeline.api.keras.datasets.boston_housing module¶
-
zoo.pipeline.api.keras.datasets.boston_housing.load_data(path='boston_housing.npz', dest_dir='/tmp/.zoo/dataset', test_split=0.2)[source]¶ - Loads the Boston Housing dataset, the source url of download
is copied from keras.datasets
- # Arguments
dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency
so that the less frequent words would be abandoned
- oov_char: index to pad the abandoned words, if None, one abandoned word
would be taken place with its next word and total length -= 1
- test_split: the ratio to split part of dataset to test data,
the remained data would be train data
- # Returns
Tuple of Numpy arrays: (x_train, y_train), (x_test, y_test).
zoo.pipeline.api.keras.datasets.imdb module¶
-
zoo.pipeline.api.keras.datasets.imdb.download_imdb(dest_dir)[source]¶ Download pre-processed IMDB movie review data
- :argument
dest_dir: destination directory to store the data
- :return
The absolute path of the stored data
-
zoo.pipeline.api.keras.datasets.imdb.get_word_index(dest_dir='/tmp/.zoo/dataset', filename='imdb_word_index.pkl')[source]¶ Retrieves the dictionary mapping word indices back to words.
- # Arguments
dest_dir: where to cache the data (relative to ~/.zoo/dataset). filename: dataset file name
- # Returns
The word index dictionary.
-
zoo.pipeline.api.keras.datasets.imdb.load_data(dest_dir='/tmp/.zoo/dataset', nb_words=None, oov_char=2)[source]¶ Load IMDB dataset.
- :argument
dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency
so that the less frequent words would be abandoned
- oov_char: index to pad the abandoned words, if None, one abandoned word
would be taken place with its next word and total length -= 1
- :return
the train, test separated IMDB dataset.
zoo.pipeline.api.keras.datasets.mnist module¶
-
zoo.pipeline.api.keras.datasets.mnist.extract_images(f)[source]¶ Extract the images into a 4D uint8 numpy array [index, y, x, depth].
- Param
f: A file object that can be passed into a gzip reader.
- Returns
data: A 4D unit8 numpy array [index, y, x, depth].
- Raise
ValueError: If the bytestream does not start with 2051.
-
zoo.pipeline.api.keras.datasets.mnist.read_data_sets(train_dir, data_type='train')[source]¶ Parse or download mnist data if train_dir is empty.
- Param
train_dir: The directory storing the mnist data
- Param
data_type: Reading training set or testing set.It can be either “train” or “test”
- Returns
` (ndarray, ndarray) representing (features, labels) features is a 4D unit8 numpy array [index, y, x, depth] representing each pixel valued from 0 to 255. labels is 1D unit8 nunpy array representing the label valued from 0 to 9. `
zoo.pipeline.api.keras.datasets.reuters module¶
-
zoo.pipeline.api.keras.datasets.reuters.download_reuters(dest_dir)[source]¶ Download pre-processed reuters newswire data
- :argument
dest_dir: destination directory to store the data
- :return
The absolute path of the stored data
-
zoo.pipeline.api.keras.datasets.reuters.get_word_index(dest_dir='/tmp/.zoo/dataset', filename='reuters_word_index.pkl')[source]¶ Retrieves the dictionary mapping word indices back to words.
- # Arguments
dest_dir: where to cache the data (relative to ~/.zoo/dataset). filename: dataset file name
- # Returns
The word index dictionary.
-
zoo.pipeline.api.keras.datasets.reuters.load_data(dest_dir='/tmp/.zoo/dataset', nb_words=None, oov_char=2, test_split=0.2)[source]¶ Load reuters dataset.
- :argument
dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency
so that the less frequent words would be abandoned
- oov_char: index to pad the abandoned words, if None, one abandoned word
would be taken place with its next word and total length -= 1
- test_split: the ratio to split part of dataset to test data,
the remained data would be train data
- :return
the train, test separated reuters dataset.