How to use Bert for long text classification?
You have basically three options: You can cut the longer texts off and only use the first 512 Tokens. The original BERT implementation (and probably the others as well) truncates longer sequences automatically. For most cases, this option is sufficient. You can split your text in multiple subtexts, classify each of them and combine the … Read more