One important note is that while we got rid of the Lambda layer that reshaped the input for us to work with the LSTM's. So we're actually specifying an input shape on the curve 1D here. This requires us to update the windowed_datasetet helper function that we've been working with all along. We'll simply use tf.expand_ dims in the helper function to expand the dimensions of the series before we process it. Also similar to last week, the code will attempt lots of different learning rates changing them epoch by epoch and plotting the results. With this data and the convolutional and LASTM-based network, we'll get a plot like this. It clearly bottoms are around 10 to the minus five after which it looks a bit unstable, so we'll take that to be our desired learning rates. Thus when we define the optimizer will set the learning rate to be 1e-5 as shown here. When we train for 500 epochs we'll get this curve. It's a huge improvement over earlier. The peak has lost its plateau but it's still not quite right, it's not getting high enough relative to the data. Now of course noise is a factor and we can see crazy fluctuations in the peak caused by the noise, but I think our model could possibly do a bit better than this. Our MAE is below five, but I would bet that outside of that first peak is probably a lot lower than that. One solution might be to train a little bit longer. Even though our MAE loss curves look flat at 500 epochs, we can see when we zoom in that they're slowly diminishing. The network is still learning albeit slowly. Now one method would be to make your LASTMs bidirectional like this. When training, this looks really good giving very low loss in MAE values sometimes even less than one. But unfortunately it's overfittingng when we plot the predictions against the validation set, we don't see much improvement and in fact our MAE has gone down. So it's still a step in the right direction and consider an architecture like this one as you go forward, but perhaps you might need to tweak some of the parameters to avoid overfittingng. Some of the problems are clearly visualize when we plot the loss against the MAE, there's a lot of noise and instability in there. One common cause for small spikes like that is a small batch size introducing further random noise. I won't go into the details here, but if you check out Andrea's videos and his course on optimizing for gradient descent, there's some really great stuff in there. One hint was to explore the batch size and to make sure it's appropriate for my data. So in this case it's worth experimenting with different batch sizes. So for example experimented with different batch sizes both larger and smaller than the original 32, and when I tried 16 you can see the impact here on the validation set, and here on the training loss and MAE data. So by combining CNNs and LSTMs we've been able to build our best model yet, despite some rough edges that could be refined. In the next video, we'll step through a notebook that trains with this model so that you can see it for yourself, and also we'll maybe learn how to do some fine-tuning to improve the model even further. After that, we'll take the model architecture and apply it to some real-world data instead of the synthetic ones that you've been using all along.