I'm trying to write a program that using Roberta to calculate word embeddings:
from transformers import RobertaModel, RobertaTokenizer
import torch
model = RobertaModel.from_pretrained('roberta-base')
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
caption = "this bird is yellow has red wings"
encoded_caption = tokenizer(caption, return_tensors='pt')
input_ids = encoded_caption['input_ids']
outputs = model(input_ids)
word_embeddings = outputs.last_hidden_state
I extract the last hidden state after forwarding the input_ids
to the RobertaModel
class to calculate word embeddings, I don't know if this is the correct way to do this, can anyone help me confirm this ? Thanks
model(input_ids, output_hidden_states=True)
to get the hidden states. Then, you concatenate them withtorch.cat
, liketorch.cat([outputs['hidden_states'][-i] for i in range(1,5)],dim=-1)
. – noe Jan 12 '24 at 20:14torch.Size([1, 9, 3072])
, is this normal ? I thought the hidden size should be the same768
why increase to3072
? – Jan 12 '24 at 20:22attention_mask
to theRobertaModel
in order to ignore padding tokens and calculate the contextualized word embeddings? Thanks – Jan 27 '24 at 18:03