BERT stands for Bidirectional Encoder Representations from Transformers and is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers
Questions tagged [bert]
355 questions
3
votes
1 answer
how to use bert-tiny using transformers?
how can I use BERT-tiny .. I tried to load bert-base-uncased by this line
transformers.AutoTokenizer.from_pretrained("bert-base-uncased")
but how can I use BERT-tiny, please?

sam
- 35
- 4
2
votes
0 answers
How to add custom embeddings for bert
Pretrained Bert input has token embeddings, segment embeddings, position embeddings. But I would like to add some custom embeddings along with them and feed them to pretrained bert. How can I implement this in pytorch? Is it possible?

SS Varshini
- 239
- 5
- 13
1
vote
0 answers
Using BERT for input embeddings in a seq2seq model
I'm currently trying to implement a paper that describes using BERT to embed inputs into a seq2seq model.
"For word vectors, we use the deep
contextualized word vectors from ELMo (Peters
et al., 2018) or BERT (Devlin et al., 2018). The
answer tag…

Matthew
- 11
- 1
1
vote
1 answer
Building BERT tokenizer with custom data
I'm wondering if there is a way to train our own Bert tokenizer instead of using pre-trained tokenizer provided by huggingface?

Loius Leong
- 13
- 2
1
vote
1 answer
Do I need to train a tokenizer when training SBERT with MLM?
I have trained a SBERT model with MLM on my own corpus which is somewhat domain specific using these…

ruslaniv
- 163
- 3
1
vote
1 answer
why is the BERT NSP task useful for sentence classification tasks?
BERT pre-trains the special [CLS] token on the NSP task - for every pair A-B predicting whether sentence B follows sentence A in the corpus or not.
When fine-tuning BERT for sentence classification (e.g. spam or not), it is recommended to use a…

ihadanny
- 1,357
- 2
- 11
- 19
0
votes
1 answer
How to interepret BERT Attention
Can we tell BERT extracts local features?
For example consider the sentence "This is my first sentence. This is my second sentence".
Now How Bert extracts the features.
attention is computed for each sentence or as whole?

SS Varshini
- 239
- 5
- 13
0
votes
1 answer
Bert to extract local features
Bert is pre-trained model which can be fine-tuned for the text classification. How to extract local features using BERT

SS Varshini
- 239
- 5
- 13
0
votes
1 answer
BERT training on two tasks: what is the order of tasks?
I read that BERT has been trained on two tasks: Masked Language Modeling and Next Sentence Prediction. I want to gain clarity how exactly it was done.
Was it initially trained on Masked Language Modeling (where we predict masked token) and later…

user1700890
- 345
- 1
- 3
- 13
0
votes
1 answer
BERT is a supervised learning or semi-supervised learning?
I use 'bert-base-cased' pre-trained model for encoding a dataset of text that was labeled to labels 0, 1. Then the encoded dataset is trained using BERT model imported from Transformer library. Does it supervised learning or semi-supervised?

Balive13
- 3
- 1
0
votes
1 answer
why some authors said that BERT cannot be used for text prediction?
I was trying to get a grasp about BERT and found this post in DS StackExchange:
Can BERT do the next-word-predict task?
In broad terms, it says that Bert cannot be used for next-word prediction. I suppose that next-word prediction it could be used,…

Lila
- 217
- 2
- 7
-1
votes
1 answer
Does BERT pretrain only on masked tokens?
I was a bit confused on the details of the Masked Language Model in BERT pretraining. Does the model only predict the masked tokens for the purposes of pretraining or does it predict it for all tokens?

rsvarma
- 23
- 3