Highest Voted 'bert' Questions - Data Science Stack Exchange

3

votes

1 answer

how to use bert-tiny using transformers?

how can I use BERT-tiny .. I tried to load bert-base-uncased by this line transformers.AutoTokenizer.from_pretrained("bert-base-uncased") but how can I use BERT-tiny, please?

bert

asked Feb 20 '22 at 16:02

sam

35
4

2

votes

0 answers

How to add custom embeddings for bert

Pretrained Bert input has token embeddings, segment embeddings, position embeddings. But I would like to add some custom embeddings along with them and feed them to pretrained bert. How can I implement this in pytorch? Is it possible?

bert

asked Dec 06 '21 at 11:44

SS Varshini

239
5
13

1

vote

0 answers

Using BERT for input embeddings in a seq2seq model

I'm currently trying to implement a paper that describes using BERT to embed inputs into a seq2seq model. "For word vectors, we use the deep contextualized word vectors from ELMo (Peters et al., 2018) or BERT (Devlin et al., 2018). The answer tag…

bert

asked May 21 '20 at 14:27

Matthew

11
1

1

vote

1 answer

Building BERT tokenizer with custom data

I'm wondering if there is a way to train our own Bert tokenizer instead of using pre-trained tokenizer provided by huggingface?

bert

asked Jun 20 '23 at 13:10

Loius Leong

13
2

1

vote

1 answer

Do I need to train a tokenizer when training SBERT with MLM?

I have trained a SBERT model with MLM on my own corpus which is somewhat domain specific using these…

bert

asked Oct 30 '22 at 08:32

ruslaniv

163
3

1

vote

1 answer

why is the BERT NSP task useful for sentence classification tasks?

BERT pre-trains the special [CLS] token on the NSP task - for every pair A-B predicting whether sentence B follows sentence A in the corpus or not. When fine-tuning BERT for sentence classification (e.g. spam or not), it is recommended to use a…

bert

asked Oct 17 '21 at 05:19

ihadanny

1,357
2
11
19

0

votes

1 answer

How to interepret BERT Attention

Can we tell BERT extracts local features? For example consider the sentence "This is my first sentence. This is my second sentence". Now How Bert extracts the features. attention is computed for each sentence or as whole?

bert

asked Aug 02 '21 at 14:27

SS Varshini

239
5
13

0

votes

1 answer

Bert to extract local features

Bert is pre-trained model which can be fine-tuned for the text classification. How to extract local features using BERT

bert

asked Aug 02 '21 at 13:46

SS Varshini

239
5
13

0

votes

1 answer

BERT training on two tasks: what is the order of tasks?

I read that BERT has been trained on two tasks: Masked Language Modeling and Next Sentence Prediction. I want to gain clarity how exactly it was done. Was it initially trained on Masked Language Modeling (where we predict masked token) and later…

bert

asked Oct 08 '19 at 17:05

user1700890

345
1
3
13

0

votes

1 answer

BERT is a supervised learning or semi-supervised learning?

I use 'bert-base-cased' pre-trained model for encoding a dataset of text that was labeled to labels 0, 1. Then the encoded dataset is trained using BERT model imported from Transformer library. Does it supervised learning or semi-supervised?

bert

asked Jul 14 '23 at 20:21

Balive13

3
1

0

votes

1 answer

why some authors said that BERT cannot be used for text prediction?

I was trying to get a grasp about BERT and found this post in DS StackExchange: Can BERT do the next-word-predict task? In broad terms, it says that Bert cannot be used for next-word prediction. I suppose that next-word prediction it could be used,…

bert

asked Jan 29 '23 at 16:38

Lila

217
2
7

-1

votes

1 answer

Does BERT pretrain only on masked tokens?

I was a bit confused on the details of the Masked Language Model in BERT pretraining. Does the model only predict the masked tokens for the purposes of pretraining or does it predict it for all tokens?

bert

asked Jul 06 '20 at 03:05

rsvarma

23
3

Questions tagged [bert]