About Groups

If a group is present on a Clumper then the behavior of some of the verbs will change. This guide will explain what changes you can expect and why they are useful.

What is a Group?

You can add a group to a Clumper by calling .group_by().

from clumper import Clumper

list_dicts = [
    {'a': 6, 'grp': 'a'},
    {'a': 2, 'grp': 'b'},
    {'a': 7, 'grp': 'a'},
    {'a': 4, 'grp': 'b'},
    {'a': 5, 'grp': 'a'}
]

(Clumper(list_dicts)
  .group_by('grp'))

The current group is now looking at all the items that have grp as a key.

This means that the collection is now aware that you're interested in calculating things per group. In this case you'd get two groups. One for {'grp': 'a'} and one for {'grp': 'b'}.

There are some verbs that will behave differently because of this.

Agg

Without Groups

When you don't have a group active then we'll make a single summary for the entire collection of items.

from clumper import Clumper

list_dicts = [
    {'a': 6, 'grp': 'a'},
    {'a': 2, 'grp': 'b'},
    {'a': 7, 'grp': 'a'},
    {'a': 4, 'grp': 'b'},
    {'a': 5, 'grp': 'a'}
]

(Clumper(list_dicts)
  .agg(s=('a', 'sum'),
       m=('a', 'mean'))
  .collect())

With Groups

When there is a group active then we'll make a summary per group. We'll also ensure that the keys of the relevant groups are made available in the new collection.

Note that the group is still active!

from clumper import Clumper

list_dicts = [
    {'a': 6, 'grp': 'a'},
    {'a': 2, 'grp': 'b'},
    {'a': 7, 'grp': 'a'},
    {'a': 4, 'grp': 'b'},
    {'a': 5, 'grp': 'a'}
]

(Clumper(list_dicts)
  .group_by('grp')
  .agg(s=('a', 'sum'),
       m=('a', 'mean'))
  .collect())

Aggergators

You can use your own functions if you want to do aggregation but we offer a few standard ones. Here's the standard mapping.

{
  "mean": mean,
  "count": lambda d: len(d),
  "unique": lambda d: list(set(d)),
  "n_unique": lambda d: len(set(d)),
  "sum": sum,
  "min": min,
  "max": max,
  "median": median,
  "var": variance,
  "std": stdev,
  "values": lambda d: d,
  "first": lambda d: d[0],
  "last": lambda d: d[-1],
}

Transform

The .transform() verb is similar to the .agg() verb. The main difference is that it does not reduce any rows/keys during aggregation. Instead they are merged back in with the original collection. The examples below should help explain what the usecase is.

Without Groups

With no groups active we just attach the same summary to every item.

from clumper import Clumper

data = [
    {"a": 6, "grp": "a"},
    {"a": 2, "grp": "b"},
    {"a": 7, "grp": "a"},
    {"a": 9, "grp": "b"},
    {"a": 5, "grp": "a"}
]

tfm_data = (Clumper(data)
             .group_by("grp")
             .transform(s=("a", "sum"),
                        u=("a", "unique"))
             .collect())

With Groups

With groups active we calculate a summary per group and only attach the relevant summary to each item.

from clumper import Clumper

data = [
    {"a": 6, "grp": "a"},
    {"a": 2, "grp": "b"},
    {"a": 7, "grp": "a"},
    {"a": 9, "grp": "b"},
    {"a": 5, "grp": "a"}
]

tfm_data = (Clumper(data)
             .group_by("grp")
             .transform(s=("a", "sum"),
                        u=("a", "unique"))
             .collect())

Mutate

This library offers stateful functions like row_number. If you use these functions while there is a group active you'll notice different behavior.

Without Groups

When there is no group we just start counting at one and we continue counting until we're at the end of the collection.

from clumper import Clumper
from clumper.sequence import row_number

list_dicts = [
    {'a': 6, 'grp': 'a'},
    {'a': 2, 'grp': 'b'},
    {'a': 7, 'grp': 'a'},
    {'a': 4, 'grp': 'b'},
    {'a': 5, 'grp': 'a'}
]

(Clumper(list_dicts)
  .mutate(r=row_number())
  .collect())

With Groups

Because there are groups you'll notice that the order is different but also that the row_number resets when seeing the new group.

from clumper import Clumper
from clumper.sequence import row_number

list_dicts = [
    {'a': 6, 'grp': 'a'},
    {'a': 2, 'grp': 'b'},
    {'a': 7, 'grp': 'a'},
    {'a': 4, 'grp': 'b'},
    {'a': 5, 'grp': 'a'}
]

(Clumper(list_dicts)
  .group_by('grp')
  .mutate(r=row_number())
  .collect())

Sort

Without Groups

With no groups active, we just sort the entire collection based on the key that is provided.

from clumper import Clumper

list_dicts = [
    {'a': 6, 'grp': 'a'},
    {'a': 2, 'grp': 'b'},
    {'a': 7, 'grp': 'a'},
    {'a': 9, 'grp': 'b'},
    {'a': 5, 'grp': 'a'}
]

(Clumper(list_dicts)
  .sort(key=lambda d: d['a'])
  .collect())

With Groups

With groups active, still perform the sorting but only within each group.

from clumper import Clumper

list_dicts = [
    {'a': 6, 'grp': 'a'},
    {'a': 2, 'grp': 'b'},
    {'a': 7, 'grp': 'a'},
    {'a': 9, 'grp': 'b'},
    {'a': 5, 'grp': 'a'}
]

(Clumper(list_dicts)
  .group_by('grp')
  .sort(key=lambda d: d['a'])
  .collect())

Ungroup

If you're done with a group and you'd like to move on you can drop all groups by calling .ungroup().