BERT is running out of memory in forward pass for my dictionary

Question

Running code from this answer, my BERT is running out for my 4k words dictionary. I don't need to do anything with these words yet, just make embeddings for my data. So, using this exactly:

from transformers import BertModel, BertTokenizer
model = BertModel.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
encoded_inputs = tokenizer(labels, padding = True, truncation = True, return_tensors = 'pt')
ids = encoded_inputs['input_ids']
mask = encoded_inputs['attention_mask']
output = model(ids, mask)
lab_embeddings = output.last_hidden_state.tolist()

gives me memory leakage. How can I manage this with batching since I don't have labels for classification or something like that?

score 0 · Answer 1 · answered Apr 18 '21 at 12:00

0

It is likely to be independent of dictionary. Loading BERT model and running a forward pass has its own memory requirements. How did you figure that it is memory leak?

Try visualizing memory footprints on each step in your code by using some break points. It will give you clear idea about the hardware requirements and memory leak if any.

answered Apr 18 '21 at 12:00

jdsurya

387
1
8

It crashes notebook on Kaggle --- that's how I knew – taciturno Apr 18 '21 at 22:33

BERT is running out of memory in forward pass for my dictionary

1 Answers1