Skip to content

tokenwiser

Bag of, not words, but tricks!

Goal

We noticed that a lot of benchmarks relied on heavy-weight tools while they did not check if something more lightweight would also work. Maybe if we just apply some simple tricks on our tokens we won't need massive language models. The goal of this package is to contribute tricks to keep your NLP pipelines simple. These tricks are made available for spaCy, scikit-learn and vowpal wabbit.

If you're looking for a tool that can add pretrained language models to scikit-learn pipelines as a benchmark you'll want to explore another tool: whatlies.

Features

Scikit-Learn Tools

The following submodules contain features that might be useful.

  • .textprep: Contains string pre-processing tools for scikit-learn.
  • .pipeline: Contains extra pipeline components for scikit-learn.
  • .wabbit: Contains a scikit-learn component based on vowpal wabbit.

SpaCy Tools

  • .component: Contains spaCy compatible components that might be added as a pipeline step.
  • .extension: Contains spaCy compatible extensions that might be added manually.