An early version of my critique of the paper of Chuine et al.  was sent to the authors; following discusses the authors' response.
The relevant portion of the authors' response to my critique was as follows.
Keenan compares the simulated and observed anomalies of the 4 warmest years of his series from Dijon (2003, 1947, 1952, 1945) and concludes on the model failure. We find it a bit light to conclude from 4 years taken in a series of more than 600 years that the "results of the paper are plainly highly unreliable".
First, we never claimed that our reconstructed yearly anomalies could be interpreted individually as observed anomalies. Each anomaly has to be interpreted in the light of the whole series. Even if model simulation gives a higher anomaly than observed for 2003, it remains without contest the highest temperature of the whole measurement period with an anomaly nearly twice as large as the second hottest year. Thus the conclusion of the paper remains absolutely correct.
Second, Keenan curiously only uses the three hottest years to conclude a model underestimation of all unusually warm years except 2003. If he had taken the following 3 hottest years in the series, he would have seen on the contrary that 1976 and 1893 have a simulated anomaly higher than observed. Considering the seven warmest years of the series, 4 are actually underestimated and 3 are overestimated.
Likewise it seems also strange from a statistical point of view to consider that the years [Table 1 Keenan] with anomalies higher than one standard deviation are "nearly average". The four warmest years in the observed 1880-2000 series [Table 1 Keenan] are actually all in the 17 warmest years in the simulated series. More generally among the 25 warmest years in the observed series, more than 70% are found in the 25 hottest years in the simulated series and more than 80% in the 30 hottest years.
So in contradistinction to the conclusion of Keenan, the model does not fail at all, even for detecting the warmer years.
It is easy to get lost trying to follow the reasoning in the above—and then, upon finding the errors, rebut them in a way that is necessarily complicated. A complicated rebuttal would leave the reader uncertain as to whether the rebuttal is valid. Perhaps that is what the authors hoped for.
Consider first the authors' assertion that 2003 “remains without contest the highest temperature of the whole measurement period … Thus the conclusion of the paper remains absolutely correct”. The year 2003 has the highest modeled temperature. This is obvious and not in dispute. The authors then conclude that this implies their paper is correct. The conclusion is obviously not logical.
Consider next the authors' objection to my paper's use of the term “nearly average”. It is common in statistical practice to regard data within 1 std. deviation of the mean as being about average. Moreover, it is common in statistics to require that the data lie more than 2 std. deviations away from the mean in order to be considered extreme (for a Gaussian distribution, this corresponds with 95%-confidence intervals, which are almost ubiquitous in science). Some studies require 2.5, or more, std. deviations away from the mean. Thus, saying that a year whose temperature was 1.05, 1.18, or 0.95 std. deviations above the mean is “nearly average” (as my paper did) is fine.
Third, consider the authors' assertion that “Keenan curiously only uses the three hottest years to conclude a model underestimation of all unusually warm years except 2003”. The authors seem to be confused about what is required for a year to be extremely warm—yet it is extreme years in which we are interested. If, in order to be extreme, we require data to be outside the 95%-confidence interval, then, given that there are 120 years of data, we would expect 0.05*120 = 6 years that are extreme. Of those 6, half would be expected to be extremely warm, and half extremely cool. My paper's consideration of the 3 warmest years (prior to 2003) is thus consistent with that. The authors' use of 17 years would imply considering years that were >1.07 std. deviations above the mean as extremely warm, which is obviously untenable. The authors' use of 25, or 30, years is worse.
A crucial point is that for the three hottest years during 1883–2002, the authors' model underestimates the temperatures so much that those years appear to be nearly average. Such a model should obviously not be trusted to identify the hottest years prior to 1883.
It is possible to discuss the authors' objections further. It should be clear, though, that the authors have nothing substantive to say in rebuttal.
Acknowledgement: thanks to P. Claussen for reporting a mistake in an earlier version.
Keenan D.J. (2007), “Grape harvest dates are poor indicators of summer warmth”, Theoretical and Applied Climatology, 87: 255–256. doi: 10.1007/s00704-006-0197-9.
Instrumental temperature data is available from
Model-simulated temperature data is available from