Probably the easiest way to make your data work with the CNN example code is to make a modified version of read_cifar10()
and use it instead:
-
Write out a binary file containing the contents of your numpy array.
import numpy as np images_and_labels_array = np.array([[...], ...], # [[1,12,34,24,53,...,102], # [12,112,43,24,52,...,98], # ...] dtype=np.uint8) images_and_labels_array.tofile("/tmp/images.bin")
This file is similar to the format used in CIFAR10 datafiles. You might want to generate multiple files in order to get read parallelism. Note that
ndarray.tofile()
writes binary data in row-major order with no other metadata; pickling the array will add Python-specific metadata that TensorFlow’s parsing routines do not understand. -
Write a modified version of
read_cifar10()
that handles your record format.def read_my_data(filename_queue): class ImageRecord(object): pass result = ImageRecord() # Dimensions of the images in the dataset. label_bytes = 1 # Set the following constants as appropriate. result.height = IMAGE_HEIGHT result.width = IMAGE_WIDTH result.depth = IMAGE_DEPTH image_bytes = result.height * result.width * result.depth # Every record consists of a label followed by the image, with a # fixed number of bytes for each. record_bytes = label_bytes + image_bytes assert record_bytes == 22501 # Based on your question. # Read a record, getting filenames from the filename_queue. No # header or footer in the binary, so we leave header_bytes # and footer_bytes at their default of 0. reader = tf.FixedLengthRecordReader(record_bytes=record_bytes) result.key, value = reader.read(filename_queue) # Convert from a string to a vector of uint8 that is record_bytes long. record_bytes = tf.decode_raw(value, tf.uint8) # The first bytes represent the label, which we convert from uint8->int32. result.label = tf.cast( tf.slice(record_bytes, [0], [label_bytes]), tf.int32) # The remaining bytes after the label represent the image, which we reshape # from [depth * height * width] to [depth, height, width]. depth_major = tf.reshape(tf.slice(record_bytes, [label_bytes], [image_bytes]), [result.depth, result.height, result.width]) # Convert from [depth, height, width] to [height, width, depth]. result.uint8image = tf.transpose(depth_major, [1, 2, 0]) return result
-
Modify
distorted_inputs()
to use your new dataset:def distorted_inputs(data_dir, batch_size): """[...]""" filenames = ["/tmp/images.bin"] # Or a list of filenames if you # generated multiple files in step 1. for f in filenames: if not gfile.Exists(f): raise ValueError('Failed to find file: ' + f) # Create a queue that produces the filenames to read. filename_queue = tf.train.string_input_producer(filenames) # Read examples from files in the filename queue. read_input = read_my_data(filename_queue) reshaped_image = tf.cast(read_input.uint8image, tf.float32) # [...] (Maybe modify other parameters in here depending on your problem.)
This is intended to be a minimal set of steps, given your starting point. It may be more efficient to do the PNG decoding using TensorFlow ops, but that would be a larger change.