Ratings

Ok. After a month of researching, I've slightly modified the DWZ system to fit Shoddy battle's needs. Also, I've added in some additional elements.

This should be used as the CRE Only.

The range of ratings.

With ELO, and a K factor of 32, ratings on a Chess Server called "Internet Chess Club" has range of somewhere between 400 to 3400, which is by far too large for Shoddy Battle's needs.

Therefore, we include an acceleration factor, which is much higher than the K factor of 32, but slows down as more games are played. This is the DWZ's version of Volatility.

Our Old friend from ELO


So, we are going to take apart this system.

K = acceleration factor, Ro = Old rating, Rn = New Rating, r = result, Re = expected result.

I wasn't really able to mess with the Re as it required an in depth knowledge of Statistics which I don't have. So we will keep it as



The Acceleration Factor



Now, this is interesting. Kb = base K, and E is the volatility factor, and n is the number of games played.

Kb should be somewhere around 600-800, this should vary depending on the number of players actively playing Shoddy Battle.

Volatility Factor



This is where things get complicated...

a is the Ratings Factor (starting to run out of usable variables), and Eo is the volatility, and B is the Breaking Point.



This calculation raises Eo for higher rated players, therefore, lowering the K as we remember it is Kb / (E+n), so with higher Eo, the higher E, which means Kb is divided by a higher number. Of course, this is on the assumption that higher rated people are more stable.
(Which is not true in all cases).

The S is the Slowdown Constant, changes how you much you want to slow the rating down. I haven't played around with this value yet. (I've used 7.5 in all of my calculations), but it can be anywhere from 5 to 15. Of course the higher the slowdown factor, the lower weight the volatility has on the rating.

The Ratings Factor is to create some sort of floor. First we establish:



So, now, we want, optimally ratings to be from about 1000 ~ 2000. So, we establish a as



Now the final part. B, the Breaking Point.

B is tricky, it is so that lower rated players can accelerate faster.




This has the effect of reducing the K factor so less rating is lost when r is smaller than Re.

Also, we must note, to curb the ratings,



if B=0. and 5 < E < 150, if B > 0.
Notes

The current function of the CRE as Colin has said is a statistic to rank players. But the point of ratings is to rank players. Is it not? In the current system, you can lose rating due to inactivity, or even by winning games.

In this system, if you perform better than expected, then you gain rating, if you perform worse than expected, you lose rating. It fixes you to some specific ranking and adjusts it once a match is played. Is this not what Shoddy Battle needs?

This system allows you to start with a high K-factor, but it decreases as we go up. So, it we were to take statistics of everyone's rating, the curve would have few players ranked low (i.e. 800-1400). Many players ranked from (1500-1700) and very few players ranked 1700+**. This allows us to differentiate the better players from the worse ones, and is "a statistic introduced to simply rank players".

This system is still missing RD from the Glicko-2 system, but it doesn't matter, as the CRE is again "a statistic introduced to simply rank players", the slight rating difference is neglible since many players will be between 1500 to 1700. This system mainly differentiates very good players (1700+) that belong to the ladder board and players who are here just to have fun.

**You can also see this curve on tests. I.e. the SAT, where there will be ~3000 people with 600+ on a specific section, but 30,000 with 400-590 on another specific section.
 

Cathy

Banned deucer.
Ok. After a month of researching, I've slightly modified the DWZ system to fit Shoddy battle's needs. Also, I've added in some additional elements.

This should be used as the CRE Only.
I don't think you understand the function or definition of the CRE.

Also, what are "Shoddy Battle's needs" and why is this rating system more able to satisfy them than the existing one? I don't think you even understand the present system because you seem to think that the CRE (which is a statistic introduced simply to rank players) is a player's rating, and it isn't. Anyway, before you can claim that this system fits Shoddy Battle's needs, you first need to state what you think those needs are.

With ELO, and a K factor of 32, ratings on a Chess Server called "Internet Chess Club" has range of somewhere between 400 to 3400, which is by far too large for Shoddy Battle's needs.
What are Shoddy Battle's needs?


You haven't provided a single reason your rating system is better in any way than the present one other than that it "satisfies Shoddy Battle's needs", which you don't even specify or show how the present system fails to satisfy.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
Give us reasons why the ELO rating system with modified K-factor you're suggesting is better than the Glicko-2 rating system used by Shoddy, and maybe then we might look into your system.

Note that we are going to scrap the CRE formula for a better one in the future. (I don't know when, so please don't ask me about that.)

Also I've suggested to Colin that the volatility starting from 0.06 is inappropriate for our needs, and that it should start at 0.09. Colin also can have access to a formula that converts old volatilites from 'starting at 0.06' to 'starting at 0.09' (I didn't give it to him, but he can have it whenever he says that he wants it). I'm assuming though that Colin will implement the new volatility in ShoddyBattle 2 and not in the current one.
 
*please ignore this thread for now. Something urgent just came up and requires me being out of town for 2 days. So, I will finish this thread and answer your questions later.
 

Cathy

Banned deucer.
As X-Act said, we are going to replace the CRE by a different rating estimate at least as early as Shoddy Battle 2. We aren't going to replace the rating system just because the estimate we're using has some problems. You need to tell us what problems there are with the underlying rating system, not with the rating estimate we're using.
 
As X-Act said, we are going to replace the CRE by a different rating estimate at least as early as Shoddy Battle 2. We aren't going to replace the rating system just because the estimate we're using has some problems. You need to tell us what problems there are with the underlying rating system, not with the rating estimate we're using.
Sorry, what do you mean by that?
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top