This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

June 18, 2024

LSTM Sentiment Analysis with Keras and Pytorch

Training an LSTM to Predict Sentiment

Photo by Madison Oren on Unsplash

Introduction

Humans easily understand emotion conveyed through text, but how can we train computers to do the same? Turns out, it's not that difficult, especially with the right framework.

In this article, I'll demonstrate how to train an LSTM model on a sentiment analysis dataset using both Pytorch and Keras, the leading Python frameworks for deep learning work.

What are LSTMs?

Put simply, LSTM (Long Short Term Memory) models are designed to take in sequential input and remember certain information through the use of input, output, and forget gates. Here's a neat resource to learn more about how LSTMs and similar recurrent models work.

While the introduction of transformers in the 2017 paper Attention is All You Need made these models somewhat obsolete, for most problems that deal with short text classification or generation they're still more than adequate. Plus, they're simpler to write code for than transformers.

The Dataset

For this project, we'll be using a dataset from Kaggle called Sentiment140 which has 1.6 million tweets. For now, we'll only use a portion of the dataset.

Step 1: Download and unzip the file, the rename it to data.csv and move it to your project directory.

Then, make sure you have the pandas library installed to load the .csv file into a DataFrame:

pip install pandas

To read the file:

# train_sentiment_keras.py
import pandas as pd
import random

df = pd.read_csv("data.csv", encoding="latin1")
print(df.head())

"""   0  ...  @switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer.  You shoulda got David Carr of Third Day to do it. ;D
0  0  ...  is upset that he can't update his Facebook by ...                                                                  
1  0  ...  @Kenichan I dived many times for the ball. Man...                                                                  
2  0  ...    my whole body feels itchy and like its on fire      """


The dataset has two columns of interest: the labels and the text. Incidentally, these are the first and last columns. Let's save the data into a list:

...
data = list(zip(df[df.columns[0]], df[df.columns[-1]])) 
# combines the first and last columns of the DataFrame using zip.
print(random.sample(data, 5))
"""
[(0, "crap can't find my miley cyrus cd to play in the car to annoy my mum     #LoveEverybody"), (4, 'going to pink concert tonight!! '), 
(4, '@AdamSevani haha, I watched Step Up 2 the other day. HahaThe part where you lift yur shirt. Made my day a LOT better. '), 
(0, '@KnightOnline does this include coin that has gone when servers were took offline? i lost 10gb from a succesful trade when they went down '), 
(4, "@ebony1075 not me, lol. Didn't even know there was a match  You ok? Did I miss much last night?????")]
"""

The first value in each list entry is the label. Here, 0 is negative sentiment and 4 is positive sentiment. We'll change these labels to 0 and 1 later. The second value is the text we're training on. The goal is to classify the text into the correct sentiment category.

Now we're ready to train the model!

Keras

Keras is a highly stable deep learning framework used by many beginners who want to quickly train models that don't require advanced customization. The code snippets in this section are much shorter than in the PyTorch section :) Keras runs off of another library called tensorflow. Learn more about Keras here.

Installation

To install the required libraries:

pip install keras tensorflow scikit-learn

Once that's done, we're ready to tokenize the text and build the model. "Tokenization" is the process by which text in a dataset is split up into chunks, such as words or common sequences of characters. "Hello I am Bob" might become ["hello", "i", "am", "bob"].

Each unique token is assigned a numerical label. "hello" might be assigned the label 256. Models like LSTMs require an Embedding layer that transforms these labels into embeddings of some size.

Text Preprocessing and Tokenization

...

data = random.sample(data, 100_000)  # only use a subset of the data to train

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import re
from sklearn.preprocessing import LabelEncoder


def preprocess_text(text):
    text = text.lower() # make the text only lowercase
    text = re.sub(r's+', ' ', text) # get rid of excessive spaces
    return text


num_words = 10000 # the maximum number of words the tokenizer should have.

max_seq_length = 100 # maximum sequence length (of tokens) the model can handle, anything above this is cut off

tokenizer = Tokenizer(num_words=num_words)  # initialize a tokenizer to chunk the text

labels = [x[0] for x in data] # gets all the labels (for later)
all_text_data = [x[1] for x in data]  # gets all the text

tokenizer.fit_on_texts(all_text_data) # train the tokenizer

sequences = tokenizer.texts_to_sequences(all_text_data)
print(sequences[:5])

"""
[[29, 540], 
[210, 4, 51, 27, 762], 
[177, 18, 1, 172, 86, 249, 1073], 
[308, 140, 2, 226, 21, 7], 
[1488, 1488, 167, 1, 731, 21, 53]]
"""

The following code gets a random subset of the data for training (random.sample), processes the text by removing numbers and extraneous spaces (preprocess_text), and builds a tokenizer to convert all the the text to numbers. We can turn these lists back to strings like so:

...
original_text = tokenizer.sequences_to_texts(sequences[:5])
print(original_text)

"""
['good idea', 
'made a back up account', 
'tired but i cant miss lost xo', 
'hi nice to tweet with you', 
'heh heh yes i agree with u']
"""

We also need to pad the sequences so that they're all the same length. This pads the beginning of each sequence:

...
# Pad sequences to ensure uniform length
X = pad_sequences(sequences, maxlen=max_seq_length)

Also, let's save the tokenizer to a file for later:

...
import io
import json

tokenizer_json = tokenizer.to_json()  # saves the tokenizer
with io.open('tokenizer.json', 'w', encoding='utf-8') as f:
    f.write(json.dumps(tokenizer_json, ensure_ascii=False))

Now that we have the text in a machine-readable format, let's update the labels so that "0" is 0 and "4" is 1.

...
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(labels) # gets the input labels and converts them to a more standard form
# ^^ this also works if you have more than two label types
print(y[:10])

"""originally: [4 4 0 0 4 4 0 0 0 0]"""
"""now:        [1 1 0 0 1 1 0 0 0 0]"""

Model Setup + Training

That was (surprisingly) the hard part, since using Keras is pretty simple.

Model setup:

...
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, SpatialDropout1D
# Define the LSTM model
model = Sequential()
model.add(Embedding(input_dim=num_words, output_dim=128))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2, activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

print(model.summary())
"""
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ embedding (Embedding)           │ ?                      │   0 (unbuilt) │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ spatial_dropout1d               │ ?                      │   0 (unbuilt) │
│ (SpatialDropout1D)              │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm (LSTM)                     │ ?                      │   0 (unbuilt) │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ ?                      │   0 (unbuilt) │
└─────────────────────────────────┴────────────────────────┴───────────────┘
...
"""

The following is an LSTM model. First, there is the Embedding layer like we talked about, which turns the numbers representing text tokens into embedding vectors (of size 128, in this case).

After going through the embedding layer, all the embedding vectors are inputs for the LSTM. The output of the LSTM is spat out into a "Dense" layer, which is a linear layer that shrinks the output down to 2 values (the # of classes in our dataset). Softmax then gives a probability per class. If we have a softmax output like [0.9, 0.1] for a given input, the model is 90% confident that the input is negative (class 0)


Train/Val split and Training

One last step before training the model is splitting the data into training and validation categories. The validation data is used to make sure the model doesn't overfit during training; the model never sees this data, so we can confirm the model's general performance based on it. 20% of the data is set aside for validation.

...

from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

Finally, we're ready to train the model. It is small enough to train quickly on a decent CPU. We do 3 passes (epochs) over the training data:

...
# Train the model
history = model.fit(X_train, y_train, epochs=3, batch_size=64, validation_data=(X_val, y_val),  verbose=1)
print(history)
model.save('sentiment_analysis.keras') # saves the model for testing

"""
Epoch 1/3
1250/1250 ━━━━━━━━━━━━━━━━━━━━ 84s 67ms/step - accuracy: 0.7138 - loss: 0.5478 - val_accuracy: 0.7891 - val_loss: 0.4566
Epoch 2/3
1250/1250 ━━━━━━━━━━━━━━━━━━━━ 88s 70ms/step - accuracy: 0.8090 - loss: 0.4191 - val_accuracy: 0.7927 - val_loss: 0.4478
Epoch 3/3
1250/1250 ━━━━━━━━━━━━━━━━━━━━ 89s 72ms/step - accuracy: 0.8281 - loss: 0.3774 - val_accuracy: 0.8011 - val_loss: 0.4348
"""

Testing

Now, in a new file, let's load the model to test it on our own input. Much of the code is the same:

# testing file
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, SpatialDropout1D
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.text import tokenizer_from_json
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.saving import load_model
import re
import json

def preprocess_text(text):
    text = text.lower()
    text = re.sub(r's+', ' ', text)
    return text

label_to_type = {
    0: 'Negative',
    1: 'Positive'
}

num_words = 10000
max_seq_length = 100

with open('tokenizer.json') as f:
    data = json.load(f)
    tokenizer = tokenizer_from_json(data)

# Define the LSTM model
model = load_model("sentiment_analysis.keras")

while True:
    input_s = input("?")
    input_s = tokenizer.texts_to_sequences([input_s])
    input_s = pad_sequences(input_s, maxlen=max_seq_length)

    output = model.predict(input_s)[0].tolist()
    print('confidence:', max(output))
    val_idx = output.index(max(output))
    print('Predicted sentiment of input:', label_to_type[val_idx])

The output:

? I am happy :)
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 65ms/step
confidence: 0.9804297685623169
Predicted sentiment of input: Positive

? I am sad
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step
confidence: 0.9954018592834473
Predicted sentiment of input: Negative

? we got a dog
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step
confidence: 0.5758517980575562
Predicted sentiment of input: Positive

? we got a cat
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step
confidence: 0.8269914984703064
Predicted sentiment of input: Positive

? it snowed last night
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step
confidence: 0.5409414172172546
Predicted sentiment of input: Negative

? on no, it snowed last night
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step
confidence: 0.7260915040969849
Predicted sentiment of input: Negative

PyTorch

PyTorch is another library that can be used for training deep learning models like LSTMs. It offers more customization and so is the #1 choice for most researchers/developers. It's also often easier to debug than Keras.

PyTorch - Installation

You can install with pip

pip install torch 
pip install pandas 
pip install transformers

Text Preprocessing and Tokenization

First, import the required libraries

import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from transformers import AutoTokenizer

The reason for using the "transformers" library is that, as far as I can tell, pytorch doesn't have an equivalent to the Tokenizer that we used with the keras version. AutoTokenizer is good enough though, and offers very similar (often better) functionality.

Load the tokenizer from a pretrained model (BERT), as well as make a function to tokenize the text:

...
device = torch.device("cpu")  # change to "cuda" if you're running on a gpu or "mps" if you have apple silicon
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased', padding_side="left") # pads the beginning of the sequence. important!
# try padding right. the model won't converge :)
max_len = 100 # max length of the input. everything else will be cut off


def preprocess_text(text):
    return tokenizer(
        text,
        max_length=max_len,           # Maximum length of the sequence
        padding='max_length',     # Pad to maximum length
        truncation=True,          # Truncate longer sequences
        return_tensors='pt'       # Return PyTorch tensors, which are like numpy arrays
    )


Loading the data from the pandas file:

...
import random
# Load data
data = pd.read_csv('data.csv', encoding='latin-1')
data = data.fillna('')
df = [[x[-1], 0 if x[0] == 0 else 1] for x in data.values.tolist()] # convert the labels from 0 and 4 to 0 and 1

random.shuffle(df)
df = random.sample(df, 100_000)

split_idx = int(len(df) * 0.8)  # index of element around ~ the first 80% of data

a, b = df[:split_idx], df[split_idx:] # training: 80% of data, validation: 20% of data
X_train = preprocess_text([x[0] for x in a])['input_ids']
X_test = preprocess_text([x[0] for x in b])['input_ids']


y_train = [x[1] for x in a]
y_test = [x[1] for x in b] 

# gets the labels for training and validation

PyTorch has something different that keras called a DataLoader, which autobatches the data for us. As input, a DataLoader requires a wrapper class for our data called a Dataset. Keras can batch in the background (see batch_size), but in PyTorch we need to create a DataLoader to do it:

...

# Convert data to PyTorch DataLoader
class TwitterDataset(Dataset):
    def __init__(self, X, y):
        self.X = X
        self.y = y

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]


num_classes = 2

y_train = torch.tensor(y_train, dtype=torch.long).to(device)
y_test = torch.tensor(y_test, dtype=torch.long).to(device)

train_dataset = TwitterDataset(X_train, y_train)
test_dataset = TwitterDataset(X_test, y_test)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False) # no need to shuffle, order doesn't matter here

Defining the model:

...

# Define LSTM model

class LSTMModel(nn.Module):
    def __init__(self, embedding_dim, lstm_hidden_dim, output_dim, dropout_prob=0.2):
        super(LSTMModel, self).__init__()
        self.embedding = nn.Embedding(tokenizer.vocab_size, embedding_dim)
        self.dropout = nn.Dropout(p=dropout_prob)
        self.lstm = nn.LSTM(embedding_dim, lstm_hidden_dim, dropout=dropout_prob, batch_first=True)
        self.fc = nn.Linear(lstm_hidden_dim, output_dim)

    def forward(self, x):
        embedded = self.embedding(x)
        embedded_dropout = self.dropout(embedded)
        lstm_out, _ = self.lstm(embedded_dropout)
        lstm_out = lstm_out[:, -1, :]
        output = self.fc(lstm_out)
        return output


embedding_dim = 128
hidden_dim = 100
output_dim = num_classes
dropout = 0.2

model = LSTMModel(embedding_dim, hidden_dim, output_dim, dropout).to(device)
optimizer = optim.Adam(model.parameters())
criterion = nn.CrossEntropyLoss()

We can see that the structure is very similar to the keras version, however here we more precisely describe how the LSTM output is fed into the Linear ("Dense" in keras) layer. Also, we explicitly initialized the optimizer. Adam will help the model to converge.

Finally, here's the training/validation loop:

...
# Training and validation loop
epochs = 3
for epoch in range(epochs):
    # Training
    model.train()
    train_loss = 0.0
    train_correct = 0
    train_total = 0

    for batch_idx, (inputs, labels) in enumerate(train_loader):
        inputs = inputs.to(device)
        labels = labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels) # input to the model
        loss.backward() # update weights based on loss
        optimizer.step()

        train_loss += loss.item() * inputs.size(0)
        _, predicted = torch.max(outputs, 1)
        train_correct += (predicted == labels).sum().item()
        train_total += labels.size(0)

        # Print training loss and accuracy for each batch
        if batch_idx % 100 == 9:  # Print every 10 batches
            train_loss_avg = train_loss / train_total
            train_acc_avg = train_correct / train_total
            print(f'Epoch [{epoch+1}/{epochs}], Step [{batch_idx+1}/{len(train_loader)}], Train Loss: {train_loss_avg:.4f}, Train Acc: {train_acc_avg:.4f}')

    # Validation, we have to use "eval" so that the dropout isn't added
    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0

    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)

            val_loss += loss.item() * inputs.size(0)
            _, predicted = torch.max(outputs, 1)
            val_correct += (predicted == labels).sum().item()
            val_total += labels.size(0)

    # Print validation loss and accuracy after each epoch
    val_loss_avg = val_loss / val_total
    val_acc_avg = val_correct / val_total
    print(f'Epoch [{epoch+1}/{epochs}], Val Loss: {val_loss_avg:.4f}, Val Acc: {val_acc_avg:.4f}')

# Save the model to a file for later
torch.save(model.state_dict(), 'sentiment_analysis.pth')

Now, let's train the model. Training log:

Epoch [1/3], Step [10/1250], Train Loss: 0.6979, Train Acc: 0.5312
Epoch [1/3], Step [110/1250], Train Loss: 0.6922, Train Acc: 0.5246
Epoch [1/3], Step [210/1250], Train Loss: 0.6719, Train Acc: 0.5756
Epoch [1/3], Step [310/1250], Train Loss: 0.6479, Train Acc: 0.6110
Epoch [1/3], Step [410/1250], Train Loss: 0.6300, Train Acc: 0.6336
Epoch [1/3], Step [510/1250], Train Loss: 0.6172, Train Acc: 0.6497
Epoch [1/3], Step [610/1250], Train Loss: 0.6042, Train Acc: 0.6634
Epoch [1/3], Step [710/1250], Train Loss: 0.5941, Train Acc: 0.6746
Epoch [1/3], Step [810/1250], Train Loss: 0.5846, Train Acc: 0.6832
Epoch [1/3], Step [910/1250], Train Loss: 0.5777, Train Acc: 0.6897
Epoch [1/3], Step [1010/1250], Train Loss: 0.5707, Train Acc: 0.6965
Epoch [1/3], Step [1110/1250], Train Loss: 0.5639, Train Acc: 0.7019
Epoch [1/3], Step [1210/1250], Train Loss: 0.5591, Train Acc: 0.7059
Epoch [1/3], Val Loss: 0.4856, Val Acc: 0.7661
Epoch [2/3], Step [10/1250], Train Loss: 0.4557, Train Acc: 0.7922
Epoch [2/3], Step [110/1250], Train Loss: 0.4636, Train Acc: 0.7845
Epoch [2/3], Step [210/1250], Train Loss: 0.4634, Train Acc: 0.7836
Epoch [2/3], Step [310/1250], Train Loss: 0.4606, Train Acc: 0.7839
Epoch [2/3], Step [410/1250], Train Loss: 0.4620, Train Acc: 0.7831
Epoch [2/3], Step [510/1250], Train Loss: 0.4617, Train Acc: 0.7829
Epoch [2/3], Step [610/1250], Train Loss: 0.4599, Train Acc: 0.7842
Epoch [2/3], Step [710/1250], Train Loss: 0.4583, Train Acc: 0.7856
Epoch [2/3], Step [810/1250], Train Loss: 0.4573, Train Acc: 0.7861
Epoch [2/3], Step [910/1250], Train Loss: 0.4564, Train Acc: 0.7865
Epoch [2/3], Step [1010/1250], Train Loss: 0.4563, Train Acc: 0.7864
Epoch [2/3], Step [1110/1250], Train Loss: 0.4553, Train Acc: 0.7867
Epoch [2/3], Step [1210/1250], Train Loss: 0.4544, Train Acc: 0.7869
Epoch [2/3], Val Loss: 0.4555, Val Acc: 0.7867
Epoch [3/3], Step [10/1250], Train Loss: 0.4043, Train Acc: 0.8172
Epoch [3/3], Step [110/1250], Train Loss: 0.4127, Train Acc: 0.8088
Epoch [3/3], Step [210/1250], Train Loss: 0.4131, Train Acc: 0.8091
Epoch [3/3], Step [310/1250], Train Loss: 0.4142, Train Acc: 0.8089
Epoch [3/3], Step [410/1250], Train Loss: 0.4133, Train Acc: 0.8090
Epoch [3/3], Step [510/1250], Train Loss: 0.4140, Train Acc: 0.8089
Epoch [3/3], Step [610/1250], Train Loss: 0.4136, Train Acc: 0.8098
Epoch [3/3], Step [710/1250], Train Loss: 0.4147, Train Acc: 0.8089
Epoch [3/3], Step [810/1250], Train Loss: 0.4139, Train Acc: 0.8095
Epoch [3/3], Step [910/1250], Train Loss: 0.4134, Train Acc: 0.8105
Epoch [3/3], Step [1010/1250], Train Loss: 0.4129, Train Acc: 0.8111
Epoch [3/3], Step [1110/1250], Train Loss: 0.4116, Train Acc: 0.8118
Epoch [3/3], Step [1210/1250], Train Loss: 0.4116, Train Acc: 0.8117
Epoch [3/3], Val Loss: 0.4619, Val Acc: 0.8004

And then, let's test with our own input:

import pandas as pd
import torch
import torch.nn as nn
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased', padding_side="left")
max_len = 100

# Encode labels
reverse_label_encoder = {
    0: 'Negative',
    1: 'Positive'
}

num_classes = 2

def preprocess_text(text):
    return tokenizer(
        text,
        max_length=max_len,           # Maximum length of the sequence
        padding='max_length',     # Pad to maximum length
        truncation=True,          # Truncate longer sequences
        return_tensors='pt'       # Return PyTorch tensors
    )



class LSTMModel(nn.Module):
    def __init__(self, embedding_dim, lstm_hidden_dim, output_dim, dropout_prob=0.2):
        super(LSTMModel, self).__init__()
        self.embedding = nn.Embedding(tokenizer.vocab_size, embedding_dim)
        self.dropout = nn.Dropout(p=dropout_prob)
        self.lstm = nn.LSTM(embedding_dim, lstm_hidden_dim, dropout=dropout_prob, batch_first=True)
        self.fc = nn.Linear(lstm_hidden_dim, output_dim)

    def forward(self, x):
        embedded = self.embedding(x)
        embedded_dropout = self.dropout(embedded)
        lstm_out, _ = self.lstm(embedded_dropout)
        lstm_out = lstm_out[:, -1, :]
        output = self.fc(lstm_out)
        return output


embedding_dim = 128
hidden_dim = 100
output_dim = num_classes
dropout = 0.2

device = torch.device("mps")
model = LSTMModel(embedding_dim, hidden_dim, output_dim, dropout).to(device)
model.load_state_dict(torch.load("sentiment_analysis.pth"))
model.eval()

while True:
    input_val = input("?")
    if not input_val:
        break
    inputs = preprocess_text([input_val])['input_ids'].to(device)
    outputs = torch.softmax(model(inputs), dim=1)
    print(outputs)
    _, predicted = torch.max(outputs, 1)
    print("predicted label:", reverse_label_encoder[predicted[0].tolist()])

And the results:

? I am happy
tensor([[0.0250, 0.9750]], device='mps:0', grad_fn=<softmaxbackward0>)
predicted label: Positive
? I am sad
tensor([[9.9952e-01, 4.8366e-04]], device='mps:0', grad_fn=<softmaxbackward0>)
predicted label: Negative