I am 'somewhat' knowledgeable in the fields of both programming and the logic being used here (meaning that I'm better than everyone I know, but I don't really know anyone who's any good), but I have noticed a few things about the problem that bear thinking about. First off, I agree with what Res Ipsa Loquitur about the problem with basing things off HP (and sorry if you addressed this already; I must have missed it), and after thinking about it awhile have decided that the best way to weight that sort of thing would be with a measure of 'KO-ability', given your current team (perhaps the weighted average of the number of moves each of your Pokemon needs to KO, with the weighting against those of your pokemon unlikely to survive and in favor of those likely to be out against it. This needs to be thought out further, obviously). I also think that Pokemon should be evaluated in terms of a percentage rather than a number, because that lets you use infinity (for a Pokemon which guarantees victory, perhaps) rather than just really big numbers. I don't think that it actually matters though, except from a stylistic point of view.
Also with regards to what people have been saying about prediction and base point values for moves (which would of course be modified by things like STAB, weather, etc.), I thought of something that I'm not sure would work (not knowing enough about the programming language you're using) but would be useful if it did. First include some variable in all of the various estimates (how likely are they to switch out? How much intrinsic value does Ice punch have?) so that instead of '30%' we had '30% + N'. Now, after every battle where that statistic comes into play, pick a random N between say, -5 and 5. Over many battles, if substantially more wins occur with an N of 3-5 (in relevant situation) then change the range of Ns to 0-10, and so on. You could probably calculate this continuously, perhaps decreasing the bounds of the range by .1% for each win with a low N and increasing by .1% for each win with a high N. Someone might have suggested something like this earlier, I'm not sure.
Another note on prediction: something interesting to see would be if different people with similar skill predict in different ways, either based on their personality or with simple(?) randomness. Give two people a copy of the program, and check their 'N' values after a few hundred battles, and see whether and how they are different. This could also show whether different instances of essentially the same scenario (I need to guess whether he switches in fear of a S.E. hit or stays in in fear of a different S.E. hit) are predicted the same, or if some variations exist which change the optimal percentages (the only thing I can think of is maybe with different S.E. types in this case; maybe Ice is always better than ground?).
I think that using some variant of the above (thats two paragraphs ago) strategy would be necessary, or at least more helpful than recorded statistics. I mean, when we get the data from Shoddy it just says absolute usages; for all we know a ton of the starmie usage was on one day, and most of the time it would have rated 19. And as previous statistics have made obvious, it is next to impossible to predict anything for the future based on these statistics until its already happened. The numbers jump around from month to month, and I wouldn't be at all surprised if they jumped even more from day to day. Normally there would be no way to get around that, but if this AI were widely used, it would be gathering huge amounts of data constantly on its own, not enough to predict the future, but probably (hopefully) enough to get a handle on the present.
Of course, that's just my opinion.
I also have a suggestion in regard to how Pokemon are rated. I agree that the sum of the scores of the moves should play a part, but I think that, in keeping with the idea I presented earlier, any further alterations (due to sandstorm, type advantage, STAB, etc) should be multiplicative, probably removing a percentage of the difference between the Pokemon's other value and 1 (a Kingdra which already has two dragon dances would not gain as much from rain as would one with no other boosts).
Finally, I disagree somewhat with the analysis of taunt you present. You suggest that it heavily de-value the non-attacking moves of the target, not dropping them to zero because the status isn't permanent. I say that it should drop them to zero, but only on the turns it is in effect. I may be misunderstanding what you are saying, but you seem to have forgotten that you have intermediate nodes to work with, not just a static score (in this case, by my reasoning, the non-attacking moves would be absolutely de-valued for every turn until it switches out, curing the status and fully restoring the moves' scores. With nodes perhaps looking like: 100 (not taunted), 10 (taunted), 10 (stayed in), 100 (switched out))
I stand by each thing I said above, though I would guess at least three quarters of it is probably worthless. I hope some of the good stuff helps, and more importantly that none of the stupidity hurts. Good Luck