obi
formerly david stone
Major update! Technical Machine home page and Mercurial repository.
I am working on an AI to play Pokemon (I'll make a more in-depth, general thread on it later), and one of the things I need is an evaluation function. I am using an expectiminimax algorithm with alpha-beta pruning, which functions best when moves are already ordered best. This means I need a relatively simple function that can try and "guess" at how good a particular position is. Therefore, I need to assign values to various things so that my program can properly weight things.
Unfortunately, Pokemon has elements of luck in it. This is why I have to use an expectiminimax tree instead of a minimax tree (I'm simulating the game as being a three-player game: AI, foe, and God. God moves whenever there are elements of luck). This means that I can't just get the order of things correct, I have to get their magnitude correct.
To summarize what an expectiminimax algorithm does is this:
The game is represented as a tree. Every move in a tree is called a node. At any point, the game is given a score. A positive score means the AI is winning; a negative score means the opponent is winning. The magnitude of the score determines how far ahead either person is. On any node for the AI, I try and find the move that maximizes the score. On any node for the foe, I try and find the move that minimizes the score (tries to make it more negative or less positive, which is maximizing the score from the point of view of the foe). On any node for God, I look at all nodes and average the score based on how likely each node is to be visited. For instance if the score after a CH is 600, and the score after not a CH is 200, the expected value for that is 225 (600 * 6.25% + 200 * 93.75%).
I am uncertain of what scores to assign to various conditions, however. This is what I want help with. To see an example, consider GNU Chess.
Here is my outline for what scores could possibly be, with vague reasoning:
Pokemon should be worth a large value of points (larger than anything else). Going from 6 Pokemon to 5 Pokemon should cause the player to lose fewer points than going from 3 Pokemon to 2 Pokemon.
Entry hazards should be worth less points. Stealth Rock is generally worth the most, then the order of Spikes vs. Toxic Spikes depends on opposing team. The first layer of Spikes is worth more than the second or third (twice as much, based on the damage it does). The value of Spikes and Stealth Rock should decrease with fewer opposing Pokemon, dropping to 0 when the foe is on their last Pokemon (or when all remaining Pokemon are immune). The value of Toxic Spikes should decrease with fewer opposing Pokemon that are not already statused (with permanent status like paralysis and poison lowering the value more than temporary status like sleep and freeze), but not dropping to 0 when all foes are statused due to things like Rest, Natural Cure, and Aromatherapy. The value should be 0 when the foe has 1 Pokemon remaining or all remaining Pokemon are immune.
PP should be given very little weight initially. However, the weight should increase very quickly when the PP gets very low (if two games are equal except in one the foe has a move with 0 PP instead of 4 and the AI has a move with 20 PP instead of 24, that should be a fairly large win for the AI). Perhaps this could be accomplished by giving a penalty for any move with "low" PP.
Pokemon should be given a penalty based on how much damage they've taken, with preference given to more damage on one Pokemon instead of little damage spread out again all Pokemon, in general. However, Pokemon should be given a boost if they have things like Blaze activated, or if they have Reversal / Flail / Endeavor.
Pokemon match up seems like it should be part of it, as well, but an evaluation function should be fast. I don't know of a fast way to determine which Pokemon has the advantage in a 1v1 match-up. I'm also completely unsure how to handle weather. Trick Room, Fog, Gravity, Uproar, Hail, Sun, Sand, and Rain all influence both sides of the battle. Perhaps I could do something like look at type-match ups and stats. If a water Pokemon is out during rain, give it a bonus, if a Fire Pokemon is out vs. a Grass Pokemon, give it a bonus sort of thing.
Pokemon should also be penalized for status, various conditions, etc.
What's more, the game does not start out with the score at 0 (an even match up). As an example, a team of 6 Suicune is likely in a good position against a team of 6 Entei. However, at the very start of the game, all the information the AI has is information about the lead Pokemon. The score of the game should depend on team match ups, but I'm not sure how to quickly decide which team has the advantage over the other.
In other words, I have a vague idea of how things should be scored, but I need actual numbers. They don't need to be exact: I can fiddle with them later. I'm just looking for a relatively close estimate of how valuable various things are.
I am working on an AI to play Pokemon (I'll make a more in-depth, general thread on it later), and one of the things I need is an evaluation function. I am using an expectiminimax algorithm with alpha-beta pruning, which functions best when moves are already ordered best. This means I need a relatively simple function that can try and "guess" at how good a particular position is. Therefore, I need to assign values to various things so that my program can properly weight things.
Unfortunately, Pokemon has elements of luck in it. This is why I have to use an expectiminimax tree instead of a minimax tree (I'm simulating the game as being a three-player game: AI, foe, and God. God moves whenever there are elements of luck). This means that I can't just get the order of things correct, I have to get their magnitude correct.
To summarize what an expectiminimax algorithm does is this:
The game is represented as a tree. Every move in a tree is called a node. At any point, the game is given a score. A positive score means the AI is winning; a negative score means the opponent is winning. The magnitude of the score determines how far ahead either person is. On any node for the AI, I try and find the move that maximizes the score. On any node for the foe, I try and find the move that minimizes the score (tries to make it more negative or less positive, which is maximizing the score from the point of view of the foe). On any node for God, I look at all nodes and average the score based on how likely each node is to be visited. For instance if the score after a CH is 600, and the score after not a CH is 200, the expected value for that is 225 (600 * 6.25% + 200 * 93.75%).
I am uncertain of what scores to assign to various conditions, however. This is what I want help with. To see an example, consider GNU Chess.
Here is my outline for what scores could possibly be, with vague reasoning:
Pokemon should be worth a large value of points (larger than anything else). Going from 6 Pokemon to 5 Pokemon should cause the player to lose fewer points than going from 3 Pokemon to 2 Pokemon.
Entry hazards should be worth less points. Stealth Rock is generally worth the most, then the order of Spikes vs. Toxic Spikes depends on opposing team. The first layer of Spikes is worth more than the second or third (twice as much, based on the damage it does). The value of Spikes and Stealth Rock should decrease with fewer opposing Pokemon, dropping to 0 when the foe is on their last Pokemon (or when all remaining Pokemon are immune). The value of Toxic Spikes should decrease with fewer opposing Pokemon that are not already statused (with permanent status like paralysis and poison lowering the value more than temporary status like sleep and freeze), but not dropping to 0 when all foes are statused due to things like Rest, Natural Cure, and Aromatherapy. The value should be 0 when the foe has 1 Pokemon remaining or all remaining Pokemon are immune.
PP should be given very little weight initially. However, the weight should increase very quickly when the PP gets very low (if two games are equal except in one the foe has a move with 0 PP instead of 4 and the AI has a move with 20 PP instead of 24, that should be a fairly large win for the AI). Perhaps this could be accomplished by giving a penalty for any move with "low" PP.
Pokemon should be given a penalty based on how much damage they've taken, with preference given to more damage on one Pokemon instead of little damage spread out again all Pokemon, in general. However, Pokemon should be given a boost if they have things like Blaze activated, or if they have Reversal / Flail / Endeavor.
Pokemon match up seems like it should be part of it, as well, but an evaluation function should be fast. I don't know of a fast way to determine which Pokemon has the advantage in a 1v1 match-up. I'm also completely unsure how to handle weather. Trick Room, Fog, Gravity, Uproar, Hail, Sun, Sand, and Rain all influence both sides of the battle. Perhaps I could do something like look at type-match ups and stats. If a water Pokemon is out during rain, give it a bonus, if a Fire Pokemon is out vs. a Grass Pokemon, give it a bonus sort of thing.
Pokemon should also be penalized for status, various conditions, etc.
What's more, the game does not start out with the score at 0 (an even match up). As an example, a team of 6 Suicune is likely in a good position against a team of 6 Entei. However, at the very start of the game, all the information the AI has is information about the lead Pokemon. The score of the game should depend on team match ups, but I'm not sure how to quickly decide which team has the advantage over the other.
In other words, I have a vague idea of how things should be scored, but I need actual numbers. They don't need to be exact: I can fiddle with them later. I'm just looking for a relatively close estimate of how valuable various things are.