Skip to content

singularity014/BERT_Tokenizer_for_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

BERT_Tokenizer_for_classification:

This repo gives a step by step guide of using BERT Style tokenizer and how it can be used for tasks like sentiment analysis with models like CNN, LSTM etc. BERT has a unique way of tokenizing, and we could leverage similar tokenization technique to feed tokenized data to our traditional models.

Experiment:

We will try to experiement and check out BERT's tokenizer utility. then we will build a 1-D CNN model to see the whole flow. To minimize the data loss due to padding, we will use a batching trick to create batches of sentences with similar length while training.

Please feel free to use similar steps for Glueing with other kind of models.

About

This repo gives a step by step guide of using BERT Style tokenizer and how it can be used for tasks like sentiment analysis with models like CNN, LSTM etc. BERT has a unique way of tokenizing, and we could leverage similar tokenization technique to feed tokenized data to our traditional models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors