1

Running code from this answer, my BERT is running out for my 4k words dictionary. I don't need to do anything with these words yet, just make embeddings for my data. So, using this exactly:

from transformers import BertModel, BertTokenizer

model = BertModel.from_pretrained('bert-base-uncased') tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

encoded_inputs = tokenizer(labels, padding = True, truncation = True, return_tensors = 'pt')

ids = encoded_inputs['input_ids'] mask = encoded_inputs['attention_mask']

output = model(ids, mask) lab_embeddings = output.last_hidden_state.tolist()

gives me memory leakage. How can I manage this with batching since I don't have labels for classification or something like that?

taciturno
  • 137
  • 1
  • 9

1 Answers1

0

It is likely to be independent of dictionary. Loading BERT model and running a forward pass has its own memory requirements. How did you figure that it is memory leak?

Try visualizing memory footprints on each step in your code by using some break points. It will give you clear idea about the hardware requirements and memory leak if any.

jdsurya
  • 387
  • 1
  • 8