zoo.pipeline.api.keras.datasets package¶
Submodules¶
zoo.pipeline.api.keras.datasets.boston_housing module¶
-
zoo.pipeline.api.keras.datasets.boston_housing.load_data(path='boston_housing.npz', dest_dir='/tmp/.zoo/dataset', test_split=0.2)[source]¶ - Loads the Boston Housing dataset, the source url of download
- is copied from keras.datasets
- # Arguments
dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency
so that the less frequent words would be abandoned- oov_char: index to pad the abandoned words, if None, one abandoned word
- would be taken place with its next word and total length -= 1
- test_split: the ratio to split part of dataset to test data,
- the remained data would be train data
- # Returns
- Tuple of Numpy arrays: (x_train, y_train), (x_test, y_test).
zoo.pipeline.api.keras.datasets.imdb module¶
-
zoo.pipeline.api.keras.datasets.imdb.download_imdb(dest_dir)[source]¶ Download pre-processed IMDB movie review data
- :argument
- dest_dir: destination directory to store the data
- :return
- The absolute path of the stored data
-
zoo.pipeline.api.keras.datasets.imdb.get_word_index(dest_dir='/tmp/.zoo/dataset', filename='imdb_word_index.pkl')[source]¶ Retrieves the dictionary mapping word indices back to words.
- # Arguments
- dest_dir: where to cache the data (relative to ~/.zoo/dataset). filename: dataset file name
- # Returns
- The word index dictionary.
-
zoo.pipeline.api.keras.datasets.imdb.load_data(dest_dir='/tmp/.zoo/dataset', nb_words=None, oov_char=2)[source]¶ Load IMDB dataset.
- :argument
dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency
so that the less frequent words would be abandoned- oov_char: index to pad the abandoned words, if None, one abandoned word
- would be taken place with its next word and total length -= 1
- :return
- the train, test separated IMDB dataset.
zoo.pipeline.api.keras.datasets.mnist module¶
-
zoo.pipeline.api.keras.datasets.mnist.extract_images(f)[source]¶ Extract the images into a 4D uint8 numpy array [index, y, x, depth].
Param: f: A file object that can be passed into a gzip reader. Returns: data: A 4D unit8 numpy array [index, y, x, depth]. Raise: ValueError: If the bytestream does not start with 2051.
-
zoo.pipeline.api.keras.datasets.mnist.read_data_sets(train_dir, data_type='train')[source]¶ Parse or download mnist data if train_dir is empty.
Param: train_dir: The directory storing the mnist data Param: data_type: Reading training set or testing set.It can be either “train” or “test” Returns: ` (ndarray, ndarray) representing (features, labels) features is a 4D unit8 numpy array [index, y, x, depth] representing each pixel valued from 0 to 255. labels is 1D unit8 nunpy array representing the label valued from 0 to 9. `
zoo.pipeline.api.keras.datasets.reuters module¶
-
zoo.pipeline.api.keras.datasets.reuters.download_reuters(dest_dir)[source]¶ Download pre-processed reuters newswire data
- :argument
- dest_dir: destination directory to store the data
- :return
- The absolute path of the stored data
-
zoo.pipeline.api.keras.datasets.reuters.get_word_index(dest_dir='/tmp/.zoo/dataset', filename='reuters_word_index.pkl')[source]¶ Retrieves the dictionary mapping word indices back to words.
- # Arguments
- dest_dir: where to cache the data (relative to ~/.zoo/dataset). filename: dataset file name
- # Returns
- The word index dictionary.
-
zoo.pipeline.api.keras.datasets.reuters.load_data(dest_dir='/tmp/.zoo/dataset', nb_words=None, oov_char=2, test_split=0.2)[source]¶ Load reuters dataset.
- :argument
dest_dir: where to cache the data (relative to ~/.zoo/dataset). nb_words: number of words to keep, the words are already indexed by frequency
so that the less frequent words would be abandoned- oov_char: index to pad the abandoned words, if None, one abandoned word
- would be taken place with its next word and total length -= 1
- test_split: the ratio to split part of dataset to test data,
- the remained data would be train data
- :return
- the train, test separated reuters dataset.