multilingual-bert issue?

Hi, 

I found a weird thing that if using the multilingual-bert e.g: bert-base-multilingual-uncased, it seems like the grad_cache doesn't work. I know it sounds weird, changing different bert models shouldn't affect it, but the thing is I tried normal bert, german bert, and m-bert, only the latter one need very small batch_size (like 4) to successfully run. Other models like german bert runs with batch_size=128 successfully. Do you probably know the reason of this? Btw, great paper and code, extremely helpful! Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

multilingual-bert issue? #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

multilingual-bert issue? #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions