TIL: Duplications

Between Test and Training data!

TIL: Annotation Datasets

Let's study annotators

TIL: Punderstanding

Computational Pun-derstanding that is.

TIL: DALC

A Dutch Abbusive Language Corpus

TIL: Generating Receipts

This is a really cool use-case for Blender.

TIL: Annotators vs. Tasks

Are We Modeling the Task or the Annotator?

TIL: Won't Predict via Disagreement

Learning from Teachers, more Literally

TIL: Active Churning

Randomly Sampling is a Strong Benchmark

TIL: Active Churning

Randomly Sampling is a Strong Benchmark

TIL: Active Street Signs

Neat usecase for Active Learning.

TIL: Perfect Fit

Oh boy...

TIL: Active, but Visual, Learning

Colors and Convex Hulls

TIL: The Story Theory

Statistics, Storks and Babies

TIL: Vulnerable Contributions at Scale

Via Github Copilot!

TIL: VADER - rule based sentiment

Seems like a sensible baseline.

TIL: Linkrot is a Huge Problem

It's frequent and has hidden nasty bits.

TIL: Learning to Place

Classification as a Heavy-Tail Regressor

TIL: Pandas Timestamp Limitations

Don't predict too far into the future.

TIL: Running git from Another Folder

And a use-case for it!

TIL: Big Git Repos

And How to Clone Them.

TIL: Optimal Seeds

Manual_seed(3407) is All You Need

TIL: 1.4 Million Jupyter Notebooks

And only 24.1% of them actually ran.

TIL: Sentiment and Bias

Exploring Huggingface while I'm at it.

TIL: Gorilla Hypotheses

A hypothesis *can* be a liability.

TIL: Scots Wikipedia

The Saga continues in Embeddings

TIL: Analytics Providers

It's Numbers that Differ!

TIL: poke2vec

As in ... text embeddings!

TIL: Markdown Ticks

And how to render them.

TIL: Pandas Format

Pretty table renders.

TIL: Stopwords

They're not very consistent.

TIL: Dixit Data

How a Great Game became a Grand Challenge

TIL: Label Errors

How to find LOTS of them.

TIL: Confidence vs. Variability

Tracking Metrics over Epochs to Understand Labels Better.

TIL: DnD Data

There's a lot of it.

TIL: Shaded Screenshots

A "shortcut" with 4 keys.

TIL: Copilot & Pytest

Pytest vs. Parrot

TIL: metatags.io

It's a great helper

TIL: Copilot & Submodules

Autocomplete Might be Better

TIL: Github Actions as a Number

Is it big or is it small?

TIL: Plenty of Bad Labels

Data Quality Strikes Again

TIL: Recursive HTML

I *really* like Svelte.

TIL: Clusters of Risk

Graphs Mostly

TIL: Urban Dictionary Embeddings

It's an entertaining idea.

TIL: Gensim Koan

Ten Year Old Bug?

TIL: Tesla vs. Stoplights

Data Quality Strikes Again

TIL: Kolektor

My take on Git-Scraping[tm]

TIL: Flight Simulatoops

Data Quality Strikes Again

More articles »

til.knit