1

I have a tensorflow LSTM model for predicting the sentiment. I build the model with the maximum sequence length 150. (Maximum number of words) While making predictions, i have written the code as below:

batchSize = 32
maxSeqLength = 150

def getSentenceMatrix(sentence):
    arr = np.zeros([batchSize, maxSeqLength])
    sentenceMatrix = np.zeros([batchSize,maxSeqLength], dtype='int32')
    cleanedSentence = cleanSentences(sentence)
    cleanedSentence = ' '.join(cleanedSentence.split()[:150])
    split = cleanedSentence.split()
    for indexCounter,word in enumerate(split):
        try:
            sentenceMatrix[0,indexCounter] = wordsList.index(word)
        except ValueError:
            sentenceMatrix[0,indexCounter] = 399999 #Vector for unkown words
    return sentenceMatrix

input_text = "example data"
inputMatrix = getSentenceMatrix(input_text)\

In the code i'm truncating my input text to 150 words and ignoring remaining data.Due to this my predictions are wrong.

cleanedSentence = ' '.join(cleanedSentence.split()[:150])

I know that if we have lesser length than sequence length we can pad with zero's. What we need to do if we have more length. Can anyone suggest me the best way to do this. Thanks in advance.

  • Your maximum sequence length is 150. If the sequence is smaller than it is padded with zeros. Then l, by logic, you should pad all the sequences to maximum length. If suppose I have 10 sequences in which the maximum sequence length is 23. Then, I will pad all the 10 sequences to a length of 23. – Shubham Panchal Feb 20 '19 at 03:33
  • My question is what if i have sentences with length more than 150 words. – Sujitha Chinnu Feb 22 '19 at 07:12
  • But, you will have at least one such sentence which has the greatest length of all. Make that the max_sequence_length and pad other sequences using that. – Shubham Panchal Feb 22 '19 at 10:18
  • It is effecting my accuracy. Most of my training data has 150-200 words. If i trained with max_sequence_length as 1000, i'm not getting the better accuracy. – Sujitha Chinnu Feb 25 '19 at 08:27
  • This answer could help you with variable-length inputs to LSTM. – Esmailian May 02 '19 at 23:16

0 Answers0