How to stack multiple lstm in keras?

tensorflow deep-learning keras lstm keras-layer

I am using deep learning library keras and trying to stack multiple LSTM with no luck. Below is my code

model = Sequential()
model.add(LSTM(100,input_shape =(time_steps,vector_size)))
model.add(LSTM(100))

The above code returns error in the third line Exception: Input 0 is incompatible with layer lstm_28: expected ndim=3, found ndim=2

The input X is a tensor of shape (100,250,50). I am running keras on tensorflow backend

Amir

You need to add return_sequences=True to the first layer so that its output tensor has ndim=3 (i.e. batch size, timesteps, hidden state).

Please see the following example:

# expected input data shape: (batch_size, timesteps, data_dim)
model = Sequential()
model.add(LSTM(32, return_sequences=True,
               input_shape=(timesteps, data_dim)))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32))  # return a single vector of dimension 32
model.add(Dense(10, activation='softmax'))

From: https://keras.io/getting-started/sequential-model-guide/ (search for "stacked lstm")

is there any best practice when it comes to choosing the number of neurons in the lstm? I'm trying to maximize the model performance! :)

Should we set return_state= True as well? What is the role of it?

In LSTMs if you choose too many neurons you will overfit, if you choose too few you will underfit. The right number depends on the patterns in your data and the size of your dataset (and probably numerous other factors). Start with something small, perhaps in the 32-128 range, to keep training time fast during debugging. Then test larger values until your results start to worsen.

Thank you a ton. Been stuck on this issue since last night. Finally got it resolved because of your answer.

return_state returns the entire state of the LSTM layer to next. The default is False and I keep it that way. I have yet to find a reason to set it to True (as opposed to frequently using return_sequence=True)

shantanu pathak

Detail explanation to @DanielAdiwardana 's answer. We need to add return_sequences=True for all LSTM layers except the last one.

Setting this flag to True lets Keras know that LSTM output should contain all historical generated outputs along with time stamps (3D). So, next LSTM layer can work further on the data.

If this flag is false, then LSTM only returns last output (2D). Such output is not good enough for another LSTM layer.

# expected input data shape: (batch_size, timesteps, data_dim)
model = Sequential()
model.add(LSTM(32, return_sequences=True,
               input_shape=(timesteps, data_dim)))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32))  # return a single vector of dimension 32
model.add(Dense(10, activation='softmax'))

On side NOTE :: last Dense layer is added to get output in format needed by the user. Here Dense(10) means one-hot encoded output for classification task with 10 classes. It can be generalised to have 'n' neurons for classification task with 'n' classes.

In case you are using LSTM for regression (or time series) then you may have Dense(1). So that only one numeric output is given.

The size of the last Dense layer is not a function of whether time series (sequence data) are used or not, the size of the output layer is determined by what output you desire. For prediction models this may indeed just be a scalar, but for classification you obviously look to output a one-hot vector which equals the size of the one-hot vector that the user created for the targets or tensorflow created when using sparse categorical crossentropy.

Elvin Aghammadzada

An example code like this should work:

regressor = Sequential()

regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (33, 1)))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.2))

regressor.add(Dense(units = 1))

regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

regressor.fit(X_train, y_train, epochs = 10, batch_size = 4096)

Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now

How to stack multiple lstm in keras?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

Platform

Support

Contact US