I have recently been revisiting my study of Deep Learning, and I thought of doing some experiments with Wave prediction using LSTMs. This is nothing new, just more of a log of some tinkering done using TensorFlow.

**The Problem**

The basic input to the model is a 2-D vector – each number corresponding to the value attained by the corresponding wave. Each wave in turn is: (a constant + a sine wave + a cosine wave). The waves themselves have different magnitudes, initial phases and frequencies. The goal is to predict the values that will be attained a certain (I chose 23) steps ahead on the curve.

So first off, heres the **wave-generation code**:

##Producing Training/Testing inputs+output from numpy import array, sin, cos, pi from random import random #Random initial angles angle1 = random() angle2 = random() #The total 2*pi cycle would be divided into 'frequency' #number of steps frequency1 = 300 frequency2 = 200 #This defines how many steps ahead we are trying to predict lag = 23 def get_sample(): """ Returns a [[sin value, cos value]] input. """ global angle1, angle2 angle1 += 2*pi/float(frequency1) angle2 += 2*pi/float(frequency2) angle1 %= 2*pi angle2 %= 2*pi return array([array([ 5 + 5*sin(angle1) + 10*cos(angle2), 7 + 7*sin(angle2) + 14*cos(angle1)])]) sliding_window = [] for i in range(lag - 1): sliding_window.append(get_sample()) def get_pair(): """ Returns an (current, later) pair, where 'later' is 'lag' steps ahead of the 'current' on the wave(s) as defined by the frequency. """ global sliding_window sliding_window.append(get_sample()) input_value = sliding_window[0] output_value = sliding_window[-1] sliding_window = sliding_window[1:] return input_value, output_value

Essentially, you just need to call `get_pair`

to get an ‘input, output’ pair – the output being 23 time intervals ahead on the curve. Each have the NumPy dimensionality of [1, 2]. The first value ‘1’ means that the batch size is 1 – we will feed one input at a time while training/testing.

Now, I don’t pass the input directly into the LSTM. I try to improve the LSTM’s *understanding* of the input, by providing its first and second derivatives as well. So, if the input at time *t* is *x(t)*, the derivative is *x'(t) = (x(t) – x(t-1))*. Following the analogy, *x”(t) = (x'(t) – x'(t-1))*. Here’s the code for that:

#Input Params input_dim = 2 #To maintain state last_value = array([0 for i in range(input_dim)]) last_derivative = array([0 for i in range(input_dim)]) def get_total_input_output(): """ Returns the overall Input and Output as required by the model. The input is a concatenation of the wave values, their first and second derivatives. """ global last_value, last_derivative raw_i, raw_o = get_pair() raw_i = raw_i[0] l1 = list(raw_i) derivative = raw_i - last_value l2 = list(derivative) last_value = raw_i l3 = list(derivative - last_derivative) last_derivative = derivative return array([l1 + l2 + l3]), raw_o

So the overall input to the model becomes a concatenated version of *x(t), x'(t), x”(t)*. The obvious question to ask would be- Why not do this in the TensorFlow Graph itself? I did try it, and for some reason (which I don’t understand yet), there seems to seep in some noise into the Variables that act as memory units to maintain state.

But anyways, here’s the code for that too:

#Imports import tensorflow as tf from tensorflow.models.rnn.rnn import * #Input Params input_dim = 2 ##The Input Layer as a Placeholder #Since we will provide data sequentially, the 'batch size' #is 1. input_layer = tf.placeholder(tf.float32, [1, input_dim]) ##First Order Derivative Layer #This will store the last recorded value last_value1 = tf.Variable(tf.zeros([1, input_dim])) #Subtract last value from current sub_value1 = tf.sub(input_layer, last_value1) #Update last recorded value last_assign_op1 = last_value1.assign(input_layer) ##Second Order Derivative Layer #This will store the last recorded derivative last_value2 = tf.Variable(tf.zeros([1, input_dim])) #Subtract last value from current sub_value2 = tf.sub(sub_value1, last_value2) #Update last recorded value last_assign_op2 = last_value2.assign(sub_value1) ##Overall input to the LSTM #x and its first and second order derivatives as outputs of #earlier layers zero_order = last_assign_op1 first_order = last_assign_op2 second_order = sub_value2 #Concatenated total_input = tf.concat(1, [zero_order, first_order, second_order])

If you have an idea of what might be going wrong, do leave a comment! In any case, the core model follows.

**The Model**

So heres the the **TensorFlow model**:

**1)** The Imports:

#Imports import tensorflow as tf from tensorflow.models.rnn.rnn import *

**2)** Our **input layer**, as always, will be a `Placeholder`

instance with the appropriate type and dimensions:

#Input Params input_dim = 2 ##The Input Layer as a Placeholder #Since we will provide data sequentially, the 'batch size' #is 1. input_layer = tf.placeholder(tf.float32, [1, input_dim*3])

**3)** We then define out **LSTM layer**. If you are new to Recurrent Neural Networks or LSTMs, here are two excellent resources:

- This blog post by Christopher Olah
- This deeplearning.net post. It defines the math behind the LSTM cell pretty succinctly.

If you like to see implementation-level details too, then heres the relevant portion of the TensorFlow source for you.

Now the LSTM layer:

##The LSTM Layer-1 #The LSTM Cell initialization lstm_layer1 = rnn_cell.BasicLSTMCell(input_dim*3) #The LSTM state as a Variable initialized to zeroes lstm_state1 = tf.Variable(tf.zeros([1, lstm_layer1.state_size])) #Connect the input layer and initial LSTM state to the LSTM cell lstm_output1, lstm_state_output1 = lstm_layer1(input_layer, lstm_state1, scope="LSTM1") #The LSTM state will get updated lstm_update_op1 = lstm_state1.assign(lstm_state_output1)

We only use 1 LSTM layer. Providing a scope to the LSTM layer call (on line 8) helps in avoiding variable-scope conflicts if you have multiple LSTM layers.

The LSTM layer is followed by a simple linear regression layer, whose output becomes the final output.

##The Regression-Output Layer1 #The Weights and Biases matrices first output_W1 = tf.Variable(tf.truncated_normal([input_dim*3, input_dim])) output_b1 = tf.Variable(tf.zeros([input_dim])) #Compute the output final_output = tf.matmul(lstm_output1, output_W1) + output_b1

We have finished defining the model itself. But now, we need to initialize the **training components**. These help fine-tune the parameters/state of the model to make it ready for deployment. We won’t be using these components post training (ideally).

**4)** First, a `Placeholder`

for the **correct output** associated with the input:

##Input for correct output (for training) correct_output = tf.placeholder(tf.float32, [1, input_dim])

Then, the error will be computed using the LSTM output and the correct output as the *Sum-of-Squares* loss.

##Calculate the Sum-of-Squares Error error = tf.pow(tf.sub(final_output, correct_output), 2)

Finally, we initialize an `Optimizer`

to adjust the weights for the LSTM layer. I tried Gradient Descent, RMSProp as well as Adam Optimization. Adam works best for this model. Gradient Descent works really bad on LSTMs for some reason (that I can’t grasp right now). If you want to read more about **Adam-Optimization**, read this paper. I decided on the learning rate of 0.0006 after a lot of trial-and-error, and it seems to work best for the number of iterations I use (100k).

##The Optimizer #Adam works best train_step = tf.train.AdamOptimizer(0.0006).minimize(error)

**5)** Finally, we initialize the Session and all required Variables as always.

##Session sess = tf.Session() #Initialize all Variables sess.run(tf.initialize_all_variables())

**The Training**

Here’s the rudimentary code I used for training the model:

##Training actual_output1 = [] actual_output2 = [] network_output1 = [] network_output2 = [] x_axis = [] for i in range(80000): input_v, output_v = get_total_input_output() _, _, network_output = sess.run([lstm_update_op1, train_step, final_output], feed_dict = { input_layer: input_v, correct_output: output_v}) actual_output1.append(output_v[0][0]) actual_output2.append(output_v[0][1]) network_output1.append(network_output[0][0]) network_output2.append(network_output[0][1]) x_axis.append(i) import matplotlib.pyplot as plt plt.plot(x_axis, network_output1, 'r-', x_axis, actual_output1, 'b-') plt.show() plt.plot(x_axis, network_output2, 'r-', x_axis, actual_output2, 'b-') plt.show()

Training takes almost a minute on my Intel i5 machine.

Consider the first wave. Initially, the network output is far from the correct one(The red one is the LSTM output):

But by the end, it fits pretty well:

Similar trends are seen for the second wave:

**Testing**

In practical scenarios, the state at which you end training would rarely be the state at which you deploy. Therefore, prior to testing, I ‘fastforward’ both the waves first. Then, I flush the contents of the LSTM cell (mind you, the learned matrix parameters for the individual functions don’t change).

##Testing for i in range(200): get_total_input_output() #Flush LSTM state sess.run(lstm_state1.assign(tf.zeros([1, lstm_layer1.state_size])))

And here’s the rest of the testing code:

actual_output1 = [] actual_output2 = [] network_output1 = [] network_output2 = [] x_axis = [] for i in range(1000): input_v, output_v = get_total_input_output() _, network_output = sess.run([lstm_update_op1, final_output], feed_dict = { input_layer: input_v, correct_output: output_v}) actual_output1.append(output_v[0][0]) actual_output2.append(output_v[0][1]) network_output1.append(network_output[0][0]) network_output2.append(network_output[0][1]) x_axis.append(i) import matplotlib.pyplot as plt plt.plot(x_axis, network_output1, 'r-', x_axis, actual_output1, 'b-') plt.show() plt.plot(x_axis, network_output2, 'r-', x_axis, actual_output2, 'b-') plt.show()

Its pretty similar to the training one, except for one small difference: I don’t run the training op anymore. Therefore, those components of the Graph don’t work at all.

Here’s the correct output with the model’s output for the first wave:

Thats all for now! I am not a deep learning expert, and I still experimenting with RNNs, so do leave comments/suggestions if you have any! Cheers!