Pokemon Recommendations, Part 2

Even more of a Sequel than SQL.

Vincent Warmerdam koaning.io
2016-05-05

In a previous document I’ve explained how to use the CAPM model to create naive ranking for pokemon. Even though the approach had some logic to it, I wanted to explore if a naive probibalistic implementation of trueskill could do the job as well. This document contains a python implementation my naive approach as well as an application to a pokemon dataset. The first part will explain how it is made and tested. The second part will analyse the ranking of all 750 pokemon.

Starting Data

Before writing any application code, it seems sensible to first think about how to test if the code works. It would be a good idea to simulate some random players with predetermined skill levels such that we can test if our ranking algorithm can at least rank simulated data well enough. The code below generates this data.

import uuid
import numpy as np
import pandas as pd
import json 
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn; seaborn.set()
SIZE_ARR = 100

def sim(p1, p2, entropy = 1):
    ''' 
    Given two players (tuples: <name, skill>) simulate 
    a game between them. Returns (winner-name, loser-name)
    '''
    r = np.random.randint(-entropy, entropy + 1)
    if p1[1] + r > p2[1]:
    return p1[0], p2[0]
    else:
    return p2[0], p1[0]  

def gen_data(n_users = 20, n_games = 100, entropy = 1):
    '''
    Generate a lot of games and outputs a list of user tuples
    (<name>, <skill>)as well as a list of games played.
    '''
    data = []
    users = [(str(uuid.uuid4())[:6], i) for i in range(n_users)]
    for  in range(n_games): 
    i, j = np.random.choice(range(len(users)), 2)
    u1, u2 = users[i], users[j] 
    data.append(sim(u1, u2, entropy))
    return users, pd.DataFrame(data, columns = ['winner', 'loser'])

Code that does the math

We’ll now concern ourselves with writing code that updates a system of beliefs. The belief will be captured in a numpy array and can be interpreted as a histogram of likelihood values for a certain level of skill. The algorithm is explained in the previous blogpost as well but schematically, this is what the belief update looks like;

We start out with two arrays of prior skill beliefs for player 1 and player 2. We then combine them in a two dimensional density. This is our prior belief of the skill levels of the two players. Once a player wins, we can update this belief. One player has a lower skill level. To translate this to our belief system, we cut away all likelihood in the triangle of the losing player. Then we marginalise our belief back to the two players. By repeating this for all the games, we hope to achieve a belief of skill that can be updated in real time.

Towards Code

Next we’ll define all the functions that do the ranking. These functions are meant to be minimal, as we only want to check if this algorithm can work. It makes sense to think about what our app code might look like and to think about how to test individual components before writing the code though. I’m not particularily religious about testing but I do see some value for writing a test before writing code here. We’re going to be munging with numpy matrices and typos in indices will be unforgiving.

p1 = np.array([1.,2.,3.])
p2 = np.array([3.,2.,1.])
p3 = np.array([1.,1.,1.])
mat1 = gen_prior(p1, p2)
mat2 = gen_prior(p2, p1)
mat3 = gen_prior(p3, p3)
m1, m2 = gen_marginals(mat1)
m3, m4 = gen_marginals(mat2)
np1, np2 = gen_marginals(cut_matrix(mat1))
assert all(m1 == m2[::-1]) #marginals should still be same shape 
assert all(m3 == m4[::-1]) #marginals should still be same shape 
assert np.sum(mat1) == np.sum(mat2) #matrices should have equal values
assert np.sum(mat3) == 9 #all ones should sum to 9 
assert all(np1 == np2[::-1]) #marginals should still be same shape

With these tests in place, I could move on to the code.

def get_player_stats(name):
    '''
    Check if a player exists in the dictionary. If not, add it. 
    Then returns the player belief. 
    '''
    if name not in d.keys(): 
    d[name] = np.ones(SIZE_ARR)/np.sum(np.ones(SIZE_ARR))
    return d[name]

def gen_prior(arr1, arr2):
    return np.matrix(arr1).T  np.matrix(arr2)

def cut_matrix(mat):
    posterior = np.triu(mat) + 0.000001
    posterior = posterior/np.sum(posterior)
    return posterior
    
def gen_marginals(posterior_mat):
    winner = np.squeeze(np.asarray(np.sum(posterior_mat, 0)))
    loser = np.squeeze(np.asarray(np.sum(posterior_mat, 1)))
    return winner, loser

def update(winner, loser):
    winner_arr = get_player_stats(winner)
    loser_arr = get_player_stats(loser)
    prior_mat = gen_prior(loser_arr, winner_arr)
    posterior_mat = cut_matrix(prior_mat)
    new_winner, new_loser = gen_marginals(posterior_mat)
    d[loser] = new_loser
    d[winner] = new_winner

I wrote the code and the tests passed. For now this is enough. Before running this code against the pokemon dataset, let us first run it against some simulated data.

Running the code a few times.

First let’s define a few functions that automate plotting.

def plot_scores(plot_idx): 
    plt.subplot(plot_idx)
    plt.scatter(
        np.arange(len(users)), 
        [np.mean(d[usr[0]]  np.arange(SIZE_ARR)) for usr in users])

def plot_dist(plot_idx):
    plt.subplot(plot_idx)
    for usr in users:
        plt.plot(np.arange(SIZE_ARR), d[usr[0]], alpha = 0.5)

Next we’ll run some experiments.

Simulate n_players = 100, n_games = 400, entropy = 1

Simulate n_players = 100, n_games = 400/10000, entropy = 15

Simulate n_players = 100, n_games = 400/10000, entropy = 50

There’s still some room for improvement to this system to make the convergence better, but it seems well enough for my purposes. I want to use this metric to rank pokemon so if I need to simulate some more outcomes between two pokemon then I won’t mind. For more real-time purposes, the following tweaks seem to be important;

For the pokemon dataset I only have one concern; the effect of rock paper scissors.

Pokemon Data

I took the liberty of preparing a dataset. It’s easy to do yourself with a bit of python and this project. I’ll use the formula from my previous post here again to create a function that can simulate the outcome of a pokemon battle. It will be based on the number of turns that one pokemon can outlast the other.

\[ T_{ij} = \frac{HP_i}{DMG_{ji}} \]

where \(T_{ij}\) is the number of turns the pokemon \(i\) would be able to survive the opponent \(j\) if it were to be attacked indefinately,
\(HP_i\) is the amount of hitpoints that pokemon \(i\) has and \(DMG_{ji}\) is the amount of damage a basic attack pokemon \(j\) would deal to pokekom \(i\). This damage is defined via:

\[ DMG_{ji} = \frac{2L_j + 10}{250} \times \frac{A_j}{D_i} \times w_{ji} \]

The situation will very much assume some core aspects of the game away. The hope is that my making this assumption we’ll still leave much of the actual pokemon game intact. For ease of use, you can read the datasets from my hosted blog.

poke_df = pd.read_csv("http://koaning.io/theme/data/full_pokemon.csv")
weak_df = pd.read_csv("http://koaning.io/theme/data/pokemon_weakness.csv")

def weakness_weight(types_j, types_i): 
    res = 1.
    possible_types = [t for t in weak_df['type']]
    for i in filter(lambda t: t in possible_types, types_i):
    for j in filter(lambda t: t in possible_types, types_j):
    res = float(weak_df[weak_df['type'] == j][i])
    return res

def calc_tij(poke_i, poke_j):
    poke_row_i = poke_df[poke_df['name'] == poke_i].iloc[0]
    poke_row_j = poke_df[poke_df['name'] == poke_j].iloc[0]
    dmg_ji = (200. + 10)/250
    dmg_ji = float(poke_row_j['attack'])/float(poke_row_i['defence'])
    dmg_ji *= weakness_weight(eval(poke_row_j['type']), 
                              eval(poke_row_i['type']))
    return poke_row_i['hp']/dmg_ji

def sim_poke_battle(poke_i, poke_j):
    if calc_tij(poke_i, poke_j) > calc_tij(poke_j, poke_i):
        return poke_i, poke_j
    return poke_j, poke_i

I ran some of the code, and the posterior output looked like this:

The maximum likelihood then gives an insight for the best/worst pokemon.

          name       mle                 name       mle
128   Magikarp  0.011007        142     Snorlax  0.877423
112    Chansey  0.013735        248       Lugia  0.882689
348     Feebas  0.015845        487   Cresselia  0.883819
171      Pichu  0.019108        66      Machoke  0.887577
291   Shedinja  0.020473        713     Avalugg  0.890049
439    Happiny  0.026763        380      Latios  0.893087
241    Blissey  0.037009        148   Dragonite  0.895297
172     Cleffa  0.042802        611     Haxorus  0.895832
234   Smeargle  0.048575        290     Ninjask  0.897319
49     Diglett  0.054010        388    Torterra  0.902103
400  Kricketot  0.057252        717     Yveltal  0.904122
297    Azurill  0.058177        646      Kyurem  0.914460
235    Tyrogue  0.060337        485   Regigigas  0.916059
659   Bunnelby  0.060847        305      Aggron  0.916220
62        Abra  0.064720        597  Ferrothorn  0.916377
173  Igglybuff  0.068975        425    Drifblim  0.917240
190    Sunkern  0.071101        644      Zekrom  0.919065
279      Ralts  0.081876        537       Throh  0.921175
160    Sentret  0.082374        482      Dialga  0.928502
51      Meowth  0.083082        149      Mewtwo  0.951472
174     Togepi  0.087797        483      Palkia  0.953823
581  Vanillite  0.091351        288     Slaking  0.956629
212    Shuckle  0.095570        375   Metagross  0.957383
18     Rattata  0.100470        492      Arceus  0.957729
330     Cacnea  0.100798        716     Xerneas  0.960467

Conclusion

I have run this algorithm a few times and it seems doesn’t seem to converge very consistently. It does some things very well; the fact that magicarp is generally performing poorly is good news. I’ve also noticed Mew/Mewtwo in the higher skilled area. I’m still concerned that this algorithm fails to grasp the rock/paper scissors element that is in the game. But the fact that I am assuming that every pokemon is fighting with only basic attacks also does not help.

As a final push one might consider making an ensemble of these pokemon rankings to see if the convergence can be make to go faster. Another tweak to consider is to have the matching of these pokemon not be random during the learning. I like the results of the exercise, but it seems that we are not done with ranking pokemon.

Citation

For attribution, please cite this work as

Warmerdam (2016, May 5). koaning.io: Pokemon Recommendations, Part 2. Retrieved from https://koaning.io/posts/pokemon-recommendations-part-2/

BibTeX citation

@misc{warmerdam2016pokemon,
  author = {Warmerdam, Vincent},
  title = {koaning.io: Pokemon Recommendations, Part 2},
  url = {https://koaning.io/posts/pokemon-recommendations-part-2/},
  year = {2016}
}