Questions tagged [bert]

BERT stands for Bidirectional Encoder Representations from Transformers and is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers

355 questions
3
votes
1 answer

how to use bert-tiny using transformers?

how can I use BERT-tiny .. I tried to load bert-base-uncased by this line transformers.AutoTokenizer.from_pretrained("bert-base-uncased") but how can I use BERT-tiny, please?
sam
  • 35
  • 4
2
votes
0 answers

How to add custom embeddings for bert

Pretrained Bert input has token embeddings, segment embeddings, position embeddings. But I would like to add some custom embeddings along with them and feed them to pretrained bert. How can I implement this in pytorch? Is it possible?
SS Varshini
  • 239
  • 5
  • 13
1
vote
0 answers

Using BERT for input embeddings in a seq2seq model

I'm currently trying to implement a paper that describes using BERT to embed inputs into a seq2seq model. "For word vectors, we use the deep contextualized word vectors from ELMo (Peters et al., 2018) or BERT (Devlin et al., 2018). The answer tag…
Matthew
  • 11
  • 1
1
vote
1 answer

Building BERT tokenizer with custom data

I'm wondering if there is a way to train our own Bert tokenizer instead of using pre-trained tokenizer provided by huggingface?
1
vote
1 answer

Do I need to train a tokenizer when training SBERT with MLM?

I have trained a SBERT model with MLM on my own corpus which is somewhat domain specific using these…
ruslaniv
  • 163
  • 3
1
vote
1 answer

why is the BERT NSP task useful for sentence classification tasks?

BERT pre-trains the special [CLS] token on the NSP task - for every pair A-B predicting whether sentence B follows sentence A in the corpus or not. When fine-tuning BERT for sentence classification (e.g. spam or not), it is recommended to use a…
ihadanny
  • 1,357
  • 2
  • 11
  • 19
0
votes
1 answer

How to interepret BERT Attention

Can we tell BERT extracts local features? For example consider the sentence "This is my first sentence. This is my second sentence". Now How Bert extracts the features. attention is computed for each sentence or as whole?
SS Varshini
  • 239
  • 5
  • 13
0
votes
1 answer

Bert to extract local features

Bert is pre-trained model which can be fine-tuned for the text classification. How to extract local features using BERT
SS Varshini
  • 239
  • 5
  • 13
0
votes
1 answer

BERT training on two tasks: what is the order of tasks?

I read that BERT has been trained on two tasks: Masked Language Modeling and Next Sentence Prediction. I want to gain clarity how exactly it was done. Was it initially trained on Masked Language Modeling (where we predict masked token) and later…
user1700890
  • 345
  • 1
  • 3
  • 13
0
votes
1 answer

BERT is a supervised learning or semi-supervised learning?

I use 'bert-base-cased' pre-trained model for encoding a dataset of text that was labeled to labels 0, 1. Then the encoded dataset is trained using BERT model imported from Transformer library. Does it supervised learning or semi-supervised?
Balive13
  • 3
  • 1
0
votes
1 answer

why some authors said that BERT cannot be used for text prediction?

I was trying to get a grasp about BERT and found this post in DS StackExchange: Can BERT do the next-word-predict task? In broad terms, it says that Bert cannot be used for next-word prediction. I suppose that next-word prediction it could be used,…
Lila
  • 217
  • 2
  • 7
-1
votes
1 answer

Does BERT pretrain only on masked tokens?

I was a bit confused on the details of the Masked Language Model in BERT pretraining. Does the model only predict the masked tokens for the purposes of pretraining or does it predict it for all tokens?
rsvarma
  • 23
  • 3