koaning.io:

Oct. 8, 2022

Vincent Warmerdam

TIL: Duplications

Between Test and Training data!

Oct. 6, 2022

Vincent Warmerdam

TIL: Annotation Datasets

Let's study annotators

Oct. 5, 2022

Vincent Warmerdam

TIL: Punderstanding

Computational Pun-derstanding that is.

Sept. 9, 2022

Vincent Warmerdam

TIL: DALC

A Dutch Abbusive Language Corpus

July 21, 2022

Vincent Warmerdam

TIL: Generating Receipts

This is a really cool use-case for Blender.

July 13, 2022

Vincent Warmerdam

TIL: Annotators vs. Tasks

Are We Modeling the Task or the Annotator?

May 17, 2022

Vincent Warmerdam

TIL: Won't Predict via Disagreement

Learning from Teachers, more Literally

May 2, 2022

Vincent Warmerdam

TIL: Active Churning

Randomly Sampling is a Strong Benchmark

May 2, 2022

Vincent Warmerdam

TIL: Active Churning

Randomly Sampling is a Strong Benchmark

April 23, 2022

Vincent Warmerdam

TIL: Active Street Signs

Neat usecase for Active Learning.

April 22, 2022

Vincent Warmerdam

TIL: Perfect Fit

Oh boy...

April 21, 2022

Vincent Warmerdam

TIL: Active, but Visual, Learning

Colors and Convex Hulls

Jan. 16, 2022

Vincent Warmerdam

TIL: The Story Theory

Statistics, Storks and Babies

Dec. 20, 2021

Vincent Warmerdam

TIL: Vulnerable Contributions at Scale

Via Github Copilot!

Dec. 5, 2021

Vincent Warmerdam

TIL: VADER - rule based sentiment

Seems like a sensible baseline.

Dec. 3, 2021

Vincent Warmerdam

TIL: Linkrot is a Huge Problem

It's frequent and has hidden nasty bits.

Oct. 29, 2021

Vincent Warmerdam

TIL: Learning to Place

Classification as a Heavy-Tail Regressor

Oct. 18, 2021

Vincent Warmerdam

TIL: Pandas Timestamp Limitations

Don't predict too far into the future.

Oct. 15, 2021

Vincent Warmerdam

TIL: Running git from Another Folder

And a use-case for it!

Oct. 14, 2021

Vincent Warmerdam

TIL: Big Git Repos

And How to Clone Them.

Oct. 13, 2021

Vincent Warmerdam

TIL: Optimal Seeds

Manual_seed(3407) is All You Need

Oct. 12, 2021

Vincent Warmerdam

TIL: 1.4 Million Jupyter Notebooks

And only 24.1% of them actually ran.

Sept. 27, 2021

Vincent Warmerdam

TIL: Sentiment and Bias

Exploring Huggingface while I'm at it.

Sept. 26, 2021

Vincent Warmerdam

TIL: Gorilla Hypotheses

A hypothesis *can* be a liability.

Sept. 13, 2021

Vincent Warmerdam

TIL: Scots Wikipedia

The Saga continues in Embeddings

Sept. 1, 2021

Vincent Warmerdam

TIL: Analytics Providers

It's Numbers that Differ!

Aug. 27, 2021

Vincent Warmerdam

TIL: poke2vec

As in ... text embeddings!

Aug. 22, 2021

Vincent Warmerdam

TIL: Markdown Ticks

And how to render them.

Aug. 10, 2021

Vincent Warmerdam

TIL: Pandas Format

Pretty table renders.

Aug. 6, 2021

Vincent Warmerdam

TIL: Stopwords

They're not very consistent.

July 29, 2021

Vincent Warmerdam

TIL: Dixit Data

How a Great Game became a Grand Challenge

July 22, 2021

Vincent Warmerdam

TIL: Label Errors

How to find LOTS of them.

July 21, 2021

Vincent Warmerdam

TIL: Confidence vs. Variability

Tracking Metrics over Epochs to Understand Labels Better.

July 17, 2021

Vincent Warmerdam

TIL: DnD Data

There's a lot of it.

July 17, 2021

Vincent Warmerdam

TIL: Shaded Screenshots

A "shortcut" with 4 keys.

July 16, 2021

Vincent Warmerdam

TIL: Copilot & Pytest

Pytest vs. Parrot

July 15, 2021

Vincent Warmerdam

TIL: metatags.io

It's a great helper

July 8, 2021

Vincent Warmerdam

TIL: Copilot & Submodules

Autocomplete Might be Better

June 25, 2021

Vincent Warmerdam

TIL: Github Actions as a Number

Is it big or is it small?

June 23, 2021

Vincent Warmerdam

TIL: Plenty of Bad Labels

Data Quality Strikes Again

June 18, 2021

Vincent Warmerdam

TIL: Recursive HTML

I *really* like Svelte.

June 16, 2021

Vincent Warmerdam

TIL: Clusters of Risk

Graphs Mostly

June 13, 2021

Vincent Warmerdam

TIL: Urban Dictionary Embeddings

It's an entertaining idea.

June 11, 2021

Vincent Warmerdam

TIL: Gensim Koan

Ten Year Old Bug?

June 5, 2021

Vincent Warmerdam

TIL: Tesla vs. Stoplights

Data Quality Strikes Again

June 3, 2021

Vincent Warmerdam

TIL: Kolektor

My take on Git-Scraping[tm]

June 1, 2021

Vincent Warmerdam

TIL: Flight Simulatoops

Data Quality Strikes Again

TIL: Duplications

TIL: Annotation Datasets

TIL: Punderstanding

TIL: DALC

TIL: Generating Receipts

TIL: Annotators vs. Tasks

TIL: Won't Predict via Disagreement

TIL: Active Churning

TIL: Active Churning

TIL: Active Street Signs

TIL: Perfect Fit

TIL: Active, but Visual, Learning

TIL: The Story Theory

TIL: Vulnerable Contributions at Scale

TIL: VADER - rule based sentiment

TIL: Linkrot is a Huge Problem

TIL: Learning to Place

TIL: Pandas Timestamp Limitations

TIL: Running git from Another Folder

TIL: Big Git Repos

TIL: Optimal Seeds

TIL: 1.4 Million Jupyter Notebooks

TIL: Sentiment and Bias

TIL: Gorilla Hypotheses

TIL: Scots Wikipedia

TIL: Analytics Providers

TIL: poke2vec

TIL: Markdown Ticks

TIL: Pandas Format

TIL: Stopwords

TIL: Dixit Data

TIL: Label Errors

TIL: Confidence vs. Variability

TIL: DnD Data

TIL: Shaded Screenshots

TIL: Copilot & Pytest

TIL: metatags.io

TIL: Copilot & Submodules

TIL: Github Actions as a Number

TIL: Plenty of Bad Labels

TIL: Recursive HTML

TIL: Clusters of Risk

TIL: Urban Dictionary Embeddings

TIL: Gensim Koan

TIL: Tesla vs. Stoplights

TIL: Kolektor

TIL: Flight Simulatoops

til.knit