ROCKY ROOK: ChessTempo July 2014

Wednesday, August 06, 2014

ChessTempo July 2014

Just a follow-up from my June post about my ChessTempo performance.

The three-week vacation made it a bit difficult to consistently keep up with my goal of 20 problems per day. As such, I found myself doing 60-100 problems per day the last week of July in order to "catch up". But despite that, I still managed to stay above 1700 per 20 problems. There were a handful of times when I dipped below 1700, but I bounced right back above 1700. Also, I hit a new all-time high in July at 1764.3.

I'm still not entirely convinced this improvement in performance is entirely due to the duplicate reduction issue (which was implemented in April 2014). The fact is I am seeing more 1700+ problems in June and July than I have from January to May this year.

In October 2012, I started tracking how many problems I had that were below 1599, between 1600 and 1699 and above 1700.

For example, in October 2012, CT gave me 456 problems that were rated 1599 and lower; 190 problems that were rated 1600-1699; and 54 problems that were rated 1700 and above. My respective correct percent for each of those was 85%, 74% and 56%.

2014 January to May monthly averages stats for each of these categories is:
below 1599: avg total probs 324; 89.68% correct
below 1699: avg total probs 205; 78.60% correct
above 1700: avg total probs 92; 62.12% correct

Those same states for June and July 2014 are:
below 1599: avg total probs 216; 90.31% correct
below 1699: avg total probs 231; 82.69% correct
above 1700: avg total probs 175; 64.20% correct

So, CT is serving me fewer lower rated problems (those rated 1599 and lower) and more 1600 problems along with more 1700 and above problems. The >1700 stat is actually pretty amazing. I was getting about 90 per month the first 5 months of the year, and then in June and July CT served me an extra 85 higher rated problems and my percent correct when up by over 2%.

Other stats I find interesting is my monthly average of "average recent per problem time spent" continues to increase. I started tracking this in February this year. It was 47 seconds in Feb, 50 in Mar, 52 in Apr, 54 in May, 62 in June and 65 for July.

Again, it all may come crashing back down, but this is now two months in a row of "good indications."

rating chart july 2014

blitz non dup rolling avg of gain/loss

fide est rating based on per 20 blitz problems

17 comments:

AoxomoxoA wondering1:48 PM
If your rating gets higher then the server will serve you higher rated problems. And if you think longer then your % correct will get higher, but at the same time you will get less points per problem because you are thinking longer.

Empirical rabbit made some exellent posts on these things.

So if your %correct is getting higher but ( your average time ) / ( average time of problems ) is not the same then you cant tell if you did improve.

ReplyDelete
Replies
rockyrook3:29 PM
so, maybe I didn't improve compared to my personal past performance, but could I say I improved compared to other users on CT?

Because if I'm taking longer, but *still increasing in rating points* wouldn't that suggest I'm solving the problems given to me, quicker than the average solve time for those problems? Otherwise, if I'm solving them slower than the average solve time, my rating would have stayed the same or gone down. And if my rating stays the same or goes down, CT serves me fewer high-rated problems.

I could take as much time as I wanted on every problem, and in theory be close to 100%, but that would mean (generally speaking) that my rating would be much lower than what it is today. If that is true, then the fact that my rating is higher implies that I'm generally beating the average solve time; which implies I've improved compared to other users who solved those problems.

And if this is true, this begs the question: "why haven't I been able to do this before?" why in June and July have I seen this? what changed? Does this really all boil down to the fact that I take an average of 18 seconds longer to solve the problems? And for that adjustment, my rating is on average 30-50 points higher?

If that is the case, then I can live with that. I supposed this is the new "norm" for me and now I've got to find find that sweet spot where I can maximize rating increase at the right solve time while holding % correct even or better.
ReplyDelete
Replies
Aox5:22 PM
http://empiricalrabbit.blogspot.co.uk/2012/09/rating-time-and-score-revisited.html
http://aoxomoxoa-wondering.blogspot.de/2012/03/rating-speed-score-not-easy-relation-at.html

Look at a random problem at ct, you will see the Standard rating of the problem is lower but the average solving time is therefore higher in standard mode.

Your rating did rise so you get higher rated problems served. the server gives you problems which are most of the time in the range "your rating-200" up to "your rating". This are about 9000 problems in the database. You already have 44000+ attempts so most of the times you solve a duplicates. ( simplified calculation ;)

If you want to know if you did get better you have to analyse your performance on "non duplicates" . You can do that by downloading your history ( for premium users only). Some adults users did send my their history and they did not improve after a few thousand attempts. If you realy improved it would be very interesting how you made that.
ReplyDelete
Replies
rockyrook10:08 PM
i downloaded my history file. i deleted all the duplicates. i then deleted the non-blitz problems. so, now all i have are about 8400 blitz non-duplicates.

i don't know how to fully analyze this data, but here are some quick hits

MAX PROB RATING: 3109
MIN PROB RATING: 800.48
MED PROB RATING: 1589.63
AVG PROB RATING: 1638.36

MY MAX RATING: 2135
MY MIN RATING: 1556.36
MY MED RATING: 1704.71
MY AVG RATING: 1763.14

TOTAL PROBLEM: 8422
PROBS CORREC: 6695
PROBS INCORCT: 1727
% CORRECT: 79.49

stats since january 2012
total probs: 3318
total correct: 3062
percent correct: 92.28
total probs rated 1599 and lower: 1986
% correct: 97.99
total probs rated 1600-1699: 581
% correct: 91.05
total probs rated 1700+: 775
% correct: 78.19

out of those 3318 problems, i gained a total of 2587.88 points more than i lost. this is broken down by +1109.87 for problems 1599 and lower; +521.97 for 1600-1699 rated problems and +956.05 for 1700 and above rated problems.

so, what other analysis should I perform on this data set?
ReplyDelete
Replies
AoxomoxoA wondering5:16 AM
you need to look at the "non-duplicates" in the interesting months ( last 2? months ).
These show your real performance in tactics now
You may compare these to the non-dublicates a few months before, what will tell you the real performance in tactics before. ( compare them to the non-duplicates in the year 2013 for example )

Dont look at non-duplicates loong time ago. Too many things did change.

The best method to compare these sets is to calculate a complete new rating on each of these both sets see http://chesstempo.com/user-guide/en/tacticRatingSystem.html , http://en.wikipedia.org/wiki/Glicko_rating_system , You may set RD =35 ignore richards overscoring so you dont need to know the standard deviation
or even better! you may use Emprical Rabbits method to calculate a rating on each set.

Thats quite some work , if you want to do that i might search my old calculations based on a rating related to richards method and help you.

easier might be this:
compare the average gain/loss in points on these 2 sets of non duplicates

The avarage win/loss per non-duplicate-problem tells you how much your CT-rating differs from you rating on problems you never have seen before ( in first order ! )

Now:
If you have 0 memory and you have no benefit of solving a duplicate : then the average win/loss in points on non-duplicates is on the new/present set ~zero

If you have a "very good" memory and you benefit strong from solving duplictes then the average win/loss on duplicates is negative ( on both sets )

If you realy performed better in the last 2 months then : your average win/loss on non duplicates should be about the same.on both sets.

If your old ratings was too low because the duplicate reward reduction was unfair ( because your memory was not as good as richards expectations ) then your old win/loss on duplicates was positive ( your old CT-rating was too bad because of to high duplicate reward reduction, you lost too many points on duplicates and won points on non-duplicates). That was seemingly the case. Now, without duplicate-reward reduction your rating is higher.

and so forth...

With a calc-program i would
1. sort for problemnumber
2.erase duplicates
3.sort for problemtype
4. erase non blitz
5. sort for date
6. erase old data
and now i would calculate a rolling average on the win per problem and plot a graph

If i am right : then the plot would start in + and change 3? months ago to the minus. Your old CT-rating was "to low" and your new CT-rating is "too high"

Richard does earn a lot of money with CT so he has a great interest that there is "an improvement" by doing many CT puzzles. Erasing the duplicate reward reduction will create an improvement in CT-rating for almost every serious ct-tactician. A better indication of personal progress used to be the fide elo estimate because it was based strictly on ( the last 100? ) non-duplicates.
ReplyDelete
Replies
rockyrook4:07 PM
i think this makes sense. let me play with the data a few days to slice by months and then make comparisons.

thanks for taking time to explain.

when i have time later, I have a few other questions for you about what your definition of "improvement" is. Also, how *much* memory plays a role ... for example, if I did a problem in February 2010 and missed it, but then saw it for the 2nd time in May of 2014 and got it right, in theory, what does this mean? I can't conceivably have remembered that tactic.

Compare that with another problem which I see twice or more in a span of a week or two weeks ... in that case, I can see how memory plays a role.
ReplyDelete
Replies
rockyrook8:41 PM
ok - so i *think* i did it ... check the second graphic i posted in my blog post. This shows all the 2014 non-duplicate blitz rolling average of the gain/loss. i then added a linear line for all those data points. it shows just above 0 at the beginning of the year and then steadily falls below 0 to august 1.

does this look correct?
ReplyDelete
Replies
AoxomoxoA wondering3:36 AM
Your second graph looks correct
My simplified calculation of the size of your ct-set with 9000 was close to your 8400 non-duplicates :)

Let me do some further explanations / calculations

You said "out of those 3318 problems, i gained a total of 2587.88 points more than i lost"
Since 2012 you have gained 2587.88 points at non-duplicates but your real CT-rating gain is "close to 0" ( = less than 100 points )
So you have lost about 2500 points on duplicates

Your average gain per non-duplicate was about 0.8
Something like 20% of your attempts at CT are non-duplicates and about 80% are duplicates
So your average loss! per duplicate was something like 0.15
This does mean that richards duplicate reward reduction was too harsh with you, your memory was not as strong as richards estimate.
(i could explain why richard calculation was "not precise", but that's not important any more ;)

if your average RD was about 37? then your "ct-rating on non duplicates" was 0.8*37 ( or is it 0.8*36? or 0.8*38? i'm to lazy to check that ) higher than your average ct-rating
( http://en.wikipedia.org/wiki/Elo_rating_system#Most_accurate_K-factor )
So your CT-rating was something like 29 points lower than your performance on non-duplicates did indicate
Since April its the opposite around: your performance on non duplicates is lower than your CT-rating indicates
If my old eyes are correct it seems as if your second graph indicates that the effect is now even stronger in the opposite direction so:
Lets assume your average loss! per non-duplicate is now 0.8 then now your CT-rating would be something like 29 points to high
29+29=58

So a rating gain of 58 Points "for you" during April 2014 can be easily explained by the change of the ratingsystem at CT
As a result: You ability to solve unknown tactical problems might be unchanged
ReplyDelete
Replies
SVC11:33 AM
Hey, Good to see your review of CT performance. What is your USCF rating?
ReplyDelete
Replies
AoxomoxoA wondering11:26 PM
i was to lazy to calculate the correct k from RD and bang!, messed up the whole formula. So my calculations of how much rating gain equals how many rating-points is wrong
ReplyDelete
Replies
rockyrook12:45 PM
SVC - my USCF rating is 1589, but the last rated USCF game I played was March 2013.

AOX - I think I read somwhere on the CT forums that the FIDE estimated rating takes the last 100 non-duplicate problems and measures an estimated FIDE rating. In your opinion, how reliable is looking at the estimated FIDE rating?

I started tracking my estimated FIDE rating based on blitz problems starting in January 2014. I recorded the rating after every 20 problems I completed.
ReplyDelete
Replies
AoxomoxoA wondering3:38 PM
My Fide Elo is ~1900. I have no USCF rating, i live in Germany

The Fide estimate would be good, if Richard would not change the calculationformula "that often".
But as far as i know there was no change in 2014.

So: If your fide rating estimate did raise since April significatly ,then you still have a chance that you might realy perform better lately.

I suggest: Just stay with the normal CT-Rating: You just have to keep in Mind that an increase in Rating of 1 or 2 Points per Month can be caused by a Ratingdrift ( of the whole System ) or by the effect of "learning" duplicates.

If you start doing hundreds of puzzles per day then you cant rely on your ct-rating.

As far as i see, its virtual impossible for an 30+ to improve "significantly" after 1000-4000 tactic puzzles in tactics. You may look at the statistics of high volume ct-blitz-tacticians and ask the very! few improvers for their age.
An extra training of blunders, spaced repetition, extreme care to get high %correct ... seems to have no positive effect ( extreme small statistic though) .

What seems to work ( a little! ) are the "Saltmines" = High speed + High quantitys of very easy puzzles

I did break my plateau in bordvision lately ( see my attacker and defender statistic at http://www.chessgym.net/ ). But i dont know if this was "real" or just a coincidence of anytype. I need a "guinea pig" who does some boardvision training until reaching the plateau ( estimatingly 6+ hours of training ) and then break it with my "method" ( well,... or not break it )
ReplyDelete
Replies
rockyrook4:08 PM
AOX - again, thanks for all the explaining and patience!

Honestly, I enjoy solving tactics on CT, but it is good to have a reality check on improving (which apparently I'm not).

I charted my FIDE estimated rating based on per 20 blitz problems and the line looks very similar to the non-dup rolling average of the gain/loss. I'll post this chart as well. I think this is the stat that I'll focus the most on going forward.

regarding your "saltmine" comment, I do practice fork / double attack problems, but I don't do it at very high speed or high quantity. But these are "easy". Maybe I should update the speed and quantity on those a bit more.

Also - you mentioned you need a guinea pig for board vision training. I'd be happy to offer my time if it helps at all. Just let me know.

And again - thanks for taking so much time to explain things regarding my CT efforts.
ReplyDelete
Replies
AoxomoxoA wondering4:57 PM
If you want to be guinea pig :

Some Background
Its about breaking the plateau and board vision. You may search "Dan Heisman" + "Board Vision" at google to get an idea why Boardvision is interesting
Heisman is talking about 3 different "Visions" i think there are much more visions these are just different types of pattern ( chunks ) of different complexity. The easy types of pattern are Boardvision pattern : White Bishop on b2 attackes Black rook at h8
A more complex tactical pattern : Bishop on b2 pins a Knight at f6 if the king is at g7

At the attack training you learn to see : which Piece is attacked. Its a usefull skill to pick up hanging pieces of your opponent or to prevent to drop a piece by yourself.
This is also called "contact" http://iplayoochess.com/2011/11/08/what-is-contacts-examination-is-it-a-medical-optical-or-chess-issue/

Now the experiment:
You need to get to your plateau at the attack-training at chessgym. Usually you improve in the beginning quick and reach after some time a plateau = a level where you dont improve anymore.
I suggest to do something like 10 or 20 min attack training per day ( or more or less ) after something like 3 weeks ( = 6 hour of training ) you should stop to improve ( well.. might be earlier or later )
Then i would tell you the method how i did break the plateau and then we will see...

So you have to do no nonsense
ReplyDelete
Replies
AoxomoxoA wondering1:40 PM
I hope the salt mine is not too dry ;)
ReplyDelete
Replies
rockyrook5:17 PM
work and family life have been busy (school starting, vacations over, etc). But I have been practicing at chessgym.net; the attackers exercise.

i've managed to get about 2 hours and 15 minutes of total time and i'm at just over 14 attackers per minute (no where near your 44 per minute :-) )

i'm still trying to work my way up to around 6 hours. will let you know when i get there.
ReplyDelete
Replies
AoxomoxoA wondering6:12 PM
As better in chess, as quicker in boardvision ( = low level perceptual processes) see for example :

https://chessprogramming.wikispaces.com/Pertti+Saariluoma

especially :

http://www.uv.es/revispsi/articulos1.01/SAAmo.pdf

My attacker rating is now that of an GM ( the Tactic-rating at ChessGym is a few hundred points lower than the ELO-rating )

see http://aoxomoxoa-wondering.blogspot.de/2014/07/gessgymnet-tactic-vs-positional-and.html

ReplyDelete
Replies