simsity
Simsity is a Super Simple Similarities Service[tm].
It's all about building a neighborhood. Literally!
This project contains simple tools to help in similarity retreival scenarios
by making a convientwrapper around encoding strategies as well as nearest neighbor
approaches. Typical usecases include early stage bulk labelling and duplication discovery.
Installation
You can install simsity via pip
.
Getting Started
If you'd like to get started, we recommend starting here.
Related Projects
This tool becomes even more powerful when you combine it with existing tools. In particular this library was designed to work well with:
- scikit-learn for general encoding tools and pipelines
- embetter for encoding tools on text/image data
- dirty_cat for encoding tools on dirty categorical data