Skip to content

simsity

Simsity is a Super Simple Similarities Service[tm].
It's all about building a neighborhood. Literally!


This project contains simple tools to help in similarity retreival scenarios by making a convientwrapper around encoding strategies as well as nearest neighbor approaches. Typical usecases include early stage bulk labelling and duplication discovery.

Installation

You can install simsity via pip.

python -m pip install simsity

Getting Started

If you'd like to get started, we recommend starting here.

This tool becomes even more powerful when you combine it with existing tools. In particular this library was designed to work well with:

  • scikit-learn for general encoding tools and pipelines
  • embetter for encoding tools on text/image data
  • dirty_cat for encoding tools on dirty categorical data