tf.data with multiple inputs / outputs in Keras

tensorflow keras tensorflow-datasets

For the application, such as pair text similarity, the input data is similar to: pair_1, pair_2. In these problems, we usually have multiple input data. Previously, I implemented my models successfully:

model.fit([pair_1, pair_2], labels, epochs=50)

I decided to replace my input pipeline with tf.data API. To this end, I create a Dataset similar to:

dataset = tf.data.Dataset.from_tensor_slices((pair_1, pair2, labels))

It compiles successfully but when start to train it throws the following exception:

AttributeError: 'tuple' object has no attribute 'ndim'

My Keras and Tensorflow version respectively are 2.1.6 and 1.11.0. I found a similar issue in Tensorflow repository: tf.keras multi-input models don't work when using tf.data.Dataset.

Does anyone know how to fix the issue?

Here is some main part of the code:

(q1_test, q2_test, label_test) = test
(q1_train, q2_train, label_train) = train

    def tfdata_generator(sent1, sent2, labels, is_training):
        '''Construct a data generator using tf.Dataset'''

        dataset = tf.data.Dataset.from_tensor_slices((sent1, sent2, labels))
        if is_training:
            dataset = dataset.shuffle(1000)  # depends on sample size

        dataset = dataset.repeat()
        dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)

        return dataset

train_dataset = tfdata_generator(q1_train, q2_train, label_train, is_training=True, batch_size=_BATCH_SIZE)
test_dataset = tfdata_generator(q1_test, q2_test, label_test, is_training=False, batch_size=_BATCH_SIZE)


inps1 = keras.layers.Input(shape=(50,))
inps2 = keras.layers.Input(shape=(50,))

embed = keras.layers.Embedding(input_dim=nb_vocab, output_dim=300, weights=[embedding], trainable=False)
embed1 = embed(inps1)
embed2 = embed(inps2)

gru = keras.layers.CuDNNGRU(256)
gru1 = gru(embed1)
gru2 = gru(embed2)

concat = keras.layers.concatenate([gru1, gru2])

preds = keras.layers.Dense(1, 'sigmoid')(concat)

model = keras.models.Model(inputs=[inps1, inps2], outputs=preds)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
print(model.summary())

model.fit(
    train_dataset.make_one_shot_iterator(),
    steps_per_epoch=len(q1_train) // _BATCH_SIZE,
    epochs=50,
    validation_data=test_dataset.make_one_shot_iterator(),
    validation_steps=len(q1_test) // _BATCH_SIZE,
    verbose=1)

Maybe the error is related to nesting tuple inside another tuple? It does not recognize the inner tuple as a Tensor object? Can you try feeding it something like (pair1, pair2, labels) and then feed the pairs yourself to the fit to see if that works?

I modified my example code, which should work now. Instead of tuples, you can pass a dictionary with the keys: "input_1" and "input_2" .

@lhlmgr Can I do the same things with from_tensor_slices()?

@AmirHadifar yes, see my edit

Try dataset = tf.data.Dataset.from_tensor_slices(((pair_1, pair2), labels))

lhlmgr

I'm not using Keras but I would go with an tf.data.Dataset.from_generator() - like:

def _input_fn():
  sent1 = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.int64)
  sent2 = np.array([20, 25, 35, 40, 600, 30, 20, 30], dtype=np.int64)
  sent1 = np.reshape(sent1, (8, 1, 1))
  sent2 = np.reshape(sent2, (8, 1, 1))

  labels = np.array([40, 30, 20, 10, 80, 70, 50, 60], dtype=np.int64)
  labels = np.reshape(labels, (8, 1))

  def generator():
    for s1, s2, l in zip(sent1, sent2, labels):
      yield {"input_1": s1, "input_2": s2}, l

  dataset = tf.data.Dataset.from_generator(generator, output_types=({"input_1": tf.int64, "input_2": tf.int64}, tf.int64))
  dataset = dataset.batch(2)
  return dataset

...

model.fit(_input_fn(), epochs=10, steps_per_epoch=4)

This generator can iterate over your e.g text-files / numpy arrays and yield on every call a example. In this example, I assume that the word of the sentences are already converted to the indices in the vocabulary.

Edit: Since OP asked, it should be also possible with Dataset.from_tensor_slices():

def _input_fn():
  sent1 = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.int64)
  sent2 = np.array([20, 25, 35, 40, 600, 30, 20, 30], dtype=np.int64)
  sent1 = np.reshape(sent1, (8, 1))
  sent2 = np.reshape(sent2, (8, 1))

  labels = np.array([40, 30, 20, 10, 80, 70, 50, 60], dtype=np.int64)
  labels = np.reshape(labels, (8))

  dataset = tf.data.Dataset.from_tensor_slices(({"input_1": sent1, "input_2": sent2}, labels))
  dataset = dataset.batch(2, drop_remainder=True)
  return dataset

Thanks for your response. my dataset is relatively small and I prefer to keep all of that in memory do you have any suggestion to fix the issue with ** from_tensor_slices**

Hi Amir. 2 questions, and sorry if they are kind of .. stupid: One of the guys at the issue on github, mentioned: 'So the new features of feeding the iterator directly to model.fit() is valid only when you are using tf.Keras not the standalone Keras.' (he had the same error like you, and fixed it, by including the "correct" keras.) The other question is, you postet two times from_tensor_slices() one with a tuple and one with a triplet, which one is line you use?

I used tf.keras API. You are right, but in the both situation, tuple or triplet, not worked.

Thanks this saved me a lot of effort

Thanks for your response but something is not working for me on tensorflow 2.3.2. If it is not too much to ask, could you please update your answer to include the model so that it is copy/paste testing? thanks again

pfm

One way to solve your issue could be to use the zip dataset to combine your various inputs:

sent1 = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.float32)
sent2 = np.array([20, 25, 35, 40, 600, 30, 20, 30], dtype=np.float32)
sent1 = np.reshape(sent1, (8, 1, 1))
sent2 = np.reshape(sent2, (8, 1, 1))

labels = np.array([40, 30, 20, 10, 80, 70, 50, 60], dtype=np.float32)
labels = np.reshape(labels, (8, 1))

dataset_12 = tf.data.Dataset.from_tensor_slices((sent_1, sent_2))
dataset_label = tf.data.Dataset.from_tensor_slices(labels)

dataset = tf.data.Dataset.zip((dataset_12, dataset_label)).batch(2).repeat()
model.fit(dataset, epochs=10, steps_per_epoch=4)

will print: Epoch 1/10 4/4 [==============================] - 2s 503ms/step...

Thank you @pfm. It sounds a good idea. I'll accept it if nobody gives another elegant way to solve the issue.

@pfm I have a similar issue, could you help me here

Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now

tf.data with multiple inputs / outputs in Keras

Follow WeChat

Want to stay one step ahead of the latest teleworks?

Platform

Support

Contact US