New Evidence on Class Size Effects : A Pupil Fixed Effects Approach

The impact of class size on student achievement remains a thorny question for educational decision makers. Meta-analyses of empirical studies emphasise the absence of class-size effects but detractors have argued against such pessimistic conclusions because many of the underlying studies have not paid attention to the endogeneity of class-size. This paper uses a stringent method to address the endogeneity problem using TIMSS data on 45 countries. We measure the class size effect by relating the difference in a student’s achievement across subjects to the difference in his/her class-size across subjects. This (subject-differenced) within-pupil achievement production function avoids the problem of the non-random matching of children to specific schools, and to classes within schools. The results show a statistically significant effect of class size for 16 countries but in only 10 of them is the effect negative, and the effect size is very small in most cases. Several robustness tests are carried out, including control for students’ subject-specific ability and subject-specific teacher characteristics, and correction for possible measurement error. Thus, our stringent approach to addressing the problem endogeneity confirms the findings of meta-analyses that find little support for class size effects. We find that class-size effects are smaller in resource-rich countries than in developing countries, supporting the idea that the adverse effect of larger classes increases with class-size. We also find that class size effects are smaller in regions with higher teacher quality.


Introduction
Increasing teacher inputs is popularly thought to be an effective way of increasing student learning and teachers are mostly in favour of smaller classes. However, reducing class size is usually an expensive policy. In most developed countries, demographic changes have mechanically involved important reductions in class size. This evolution strongly contributes to the increase of the cost per pupil over a long period because teacher salaries consume the dominant share of recurring education expenses, estimated between 66% and more than 90% in OECD countries.
The key empirical challenge in identifying class size effects arises due to the potential non-random matching of students to schools and, within schools, to particular classes. For instance, if higher ability students are systematically placed in the larger classes within their grade in the school, a positive coefficient for class size can appear in a standard estimate of the education production function but this would not represent a causal effect. In order to circumvent this kind of endogeneity bias two major approaches have been used in the literature.
The first uses randomized experiments. The Project STAR (Student/Teacher Achievement Ratio) carried out in Tennessee in the mid-1980s is probably the most important study of this kind. Krueger (1999) found that the advantage of experimentally reducing class size were relatively small: a reduction of around 8 students per class raised test score by less than 0.02 standard-deviations, making class size reduction a very costly way of achieving small learning gains. An important limitation of randomized trials is that participants may react to being in an experiment. Hoxby (2000) argues that teachers may exhibit behavioural change in an experiment because the outcome of the experiment has implications for the future funding of their schools. Another approach to dealing with endogeneity bias is using valid instruments which induce exogenous variation in class size. If class size varies due to some exogenously given administrative rules (rather than due to student or school choices), the measurement of the class size effect can be free of endogeneity bias. The solution consists of seeking a variable correlated with class size but not otherwise correlated with pupil achievement. Angrist and Lavy (1999) exploit a rule of the Israeli school system whereby the mandated maximum class size is fixed at 40 pupils. Consequently, when number of pupils in a grade increases from 40 to 41 pupils, the grade is split into two classes, with mean class size of 20.5, and the change in class size is not due to the non-random choice of pupils or school. Angrist and Lavy apply this principle and obtain a significant effect from a reduction in class size on student achievement. However, even when an exogenously given (e.g. government-mandated) maximum class-size rule exists in a developing country, it is rarely adhered to in practice. Thus a discontinuity design of the type used by Angrist and Lavy may not yield valid instruments for class size in developing country contexts.
Within the instrumental variables (IV) approach, Wößmann and West (2006) estimate the class size effect on the school performance for 11 countries by combining school fixed effects and the IV method, in order to identify the random variation of the class size between two consecutive grades inside the same school. They regress the change in performance of pupils between grades 7 and 8 on the change in class size between grades 7 and 8 in the same school.
The authors combine school fixed effects and IV estimation to identify random class-size variation between two adjacent grades within individual schools. It should be noted that they do not use a pupil fixed effects approach but rather a school fixed effects technique. The authors show that conventional estimates of class-size effects are strongly skewed by the non-random distribution of pupils into small and large class sizes in schools. The instruments used are the average number of pupils for each grade within the school. While Wößmann and West (2006) find large and significant effects coefficients for Greece and Iceland, they show small or no effects for the other 9 countries.
In this paper, we examine the effect of class size on pupil achievement by using the traditional achievement production function, but the innovation is to allow for pupil fixed effects in cross section data. Across-subject rather than across-time differencing is used. This is possible because the TIMSS database provides a pupil's score in several evaluated subjects for some countries (and in at least mathematics and science for all countries). This approach enables us to control all subject-invariant student and family unobservables. Our cross-section data make it possible to check if the class size in different subjects in a given year is correlated with the student's marks across those subjects within the grade in the school. In other words, we estimate a within-pupil across-subject equation of the achievement production function rather than a withinpupil across-time one. The idea is exactly the same as in panel data estimates of the achievement production function but we will show below that our approach is superior to the panel data

Estimation technique
The objective is to estimate an educational production function in a consistent manner. The standard achievement function is specified as follows: where the achievement level of student i in school k is determined by a vector of his/her personal characteristics (X) and by a vector of school and teacher characteristics (S lessons for each subject are taught in different classes, and it is possible to include class size as an explanatory variable in its own right in a school fixed effects equation. With this approach it is also possible to include pupil fixed effects whereby the only variables retained in the achievement equation will be class size and teacher characteristics since it is only these that vary within pupil (across subject). This is the approach we follow. We estimate a simple pupil fixed effects equation of achievement: A is the achievement of student i in subject j in school k, X is a vector of characteristics of pupil i, C is the class size and teacher characteristics for subject j and S a vector of characteristics of school k. The composite error term is in brackets. ij  , jk  and jk  represent respectively the unobserved characteristics of the student, the subject class and the school. A pupil fixed effects model implies, for the simplified case of two subjects, 1 and 2: Pupil fixed effects implies within-school estimation since a student necessarily studies within a single school. If school unobservables are not subject specific ( does not have a j subscript) and if pupil unobservables are also not subject specific (  does not have a j subscript) then within school k, we have: (4) and regressing difference in a pupil's test scores across subjects on the difference in class size across subjects nets out the effect of all student unobserved characteristics. However, if student ability is subject-varying, it not netted out but   i.e. if a school emphasises a subject, it is often reflected in (or is because of) small class sizes in that subject and this is likely to show up in the subject specific class size effect.
However, for consistent estimation of the class size effect, it is also required that class level unobserved characteristics be unrelated to the included class size variable: Since omitted class level variables in 1  , 2  may be correlated both with class size 1 C , 2 C and with student achievement 1 A , 2 A , we cannot say that pupil fixed effects estimation of achievement permits us to interpret the effects of class size as causal. While across-subject pupil fixed effects estimation solves one source of endogeneity (the correlation between  and C ), it does not solve the second potential source of endogeneity (the possible correlation between  and C ). This is analogous to the situation with standard panel data analysis where class unobservables remain in the error term.

Data
The estimation strategy presented above requires a specific database. Two conditions are necessary. First of all, it is necessary to have students' achievement scores for different subjects.
If we have only one subject for each pupil, it is not possible to use the pupil fixed effects estimator. Another condition is that there be reasonable variation in class size between subjects.
A database that answers both these conditions is the TIMSS survey (Trends in Of the 45 countries taking part in TIMSS for pupils at grade 8, we retained 33 countries which met the two conditions mentioned above. For 8 countries, class size does not vary between the subjects (Indonesia, Israel, Italy, Japan, Philippines, Singapore, Syria, Taiwan). In some other countries, the variations in class size are too small. In general, when the class size difference between subjects is less than 10% apart -when averaged across the sample schools within a country -we do not retain that country. Four countries are in this situation (Saudi Arabia, Ghana, Japan, Norway). Finally we do not retain Morocco because data is lacking for a large number of schools and because the education system consists of two parts: an "integrated system" (where only one science matter is taught) and a "diversified system" (where several science subjects are taught). Although for some countries, more than 5 subjects are available, we do not have specific achievement scores in all the subjects. This is especially the case in countries of Central Asia and Eastern Europe for which 5 or 6 scientific subjects are taught.
Descriptive statistics are presented in Appendix Tables A1 and A2. In graphs 1 to 4, we present the kernel densities of achievement mark by subject for 4 countries (Australia, US, Bulgaria and Egypt) as an illustration. It is clear that the distribution of achievement scores is not similar across subjects. For example, the science scores in Egypt are higher and less dispersed than scores in mathematics. In order to use the difference of achievement scores between subjects as the dependent variable, it is necessary to standardize the scores. We standardize the score in each subject by the average of the score in the subject in the country, i.e. we use the z-scores of achievement score. The z-score is the score of a pupil in a given subject minus the national average score in that subject, divided by the standard deviation of the national score on this subject. By construction then, the average of the standardized score in each subject is equal to 0 and its standard deviation is 1. The right parts of graphs 1 to 4 show that the distribution of standardized scores is more similar across subjects than the distribution of raw scores.

Results
We start by discussing the main results. Then, we perform various robustness tests and finally make a synthesis of the effects obtained.

Overall results
The results of the regressions are presented in Tables  we use the dummy variable approach for missing values. This is likely to reduce biases due to omitted variables and also helps to avoid reducing the sample size.
The estimates use pupils' results in all subjects tested in a country. We start with weighted least squares regression. Our estimations are thus weighted by students' sampling weights to ensure that the contribution of the students from each stratum in the sample to the parameter estimates is the same as would have been obtained in a complete census enumeration (DuMouchel and Duncan, 1983 ;Wooldridge, 2001). This method can only explain the differences between pupils across schools and carries the risk that coefficients are biased due to the correlation of class-size with students' unobserved characteristics, such as ability. To correct such endogenity bias, we first re-estimate the WLS achievement equations on the reduced sample of only those classes which are not grouped by student ability (Table 2) and then attempt to address endogeneity by instrumenting class-size (not presented). In Table 3 we present results from School fixed effects estimation. We start with WLS regressions in Table 1. The first column imposes the restriction that class-size has a linear relationship with achievement (class-size is entered linearly). In the second set of columns, we allow the class-size effect to be non-linear by introducing both the linear and quadratic terms of class-size. In the majority of linear estimates, class size seems to have a significant coefficient. For instance, it is statistically significant in 32 out of 45 countries.
However, the sign of the class size coefficient differs across country groups. Looking at the first column, class size is mostly positively correlated with pupil achievement in developed countries: The relationship is quadratic in 6 of the 13 countries. The last group of countries -namely developing countries -demonstrate a mostly negative relation between class size and student performance. The class-size coefficient is negative and statistically significant in 10 out of the 14 countries for which there is a statistically significant class-size 'effect'. Ghana, Indonesia, and Lebanon have a positive and significant class-size coefficient. Where the relationship is nonlinear, it is mostly convex i.e. achievement falls with higher class size but at a decreasing rate.
Where the convexity has an upward sloping part, it occurs at very high levels of class-size (on average above 49 students per class, and only 8% of developing country classes have more than 49 students per class). Thus, for most of the relevant range of class-sizes in developing countries, the relationship of class-size with achievement is negative.
Thus it appears that, in general, the class size effect varies according to the economic level of the countries. While it is generally positive in the developed countries, it seems to be mostly negative in developing countries. The effect is mixed for the EECA countries. The fact that the correlation of class-size with pupil achievement across schools is generally negative in developing countries but not so elsewhere accords with the notion that class-size matters negatively when classes are large, since mean class-size is substantially larger in developing countries than in the EECA and developed countries, as seen in Appendix Table A2. However, these are naïve WLS results and, as such suffer from potential endogeneity bias.
We try several different ways of reducing endogeneity bias. Firstly, we can partly address endogeneity concerns by estimating achievement equations only for that subset of classes where students are not grouped by ability. The TIMSS survey asked schools whether students were grouped by ability within their maths classes and within their science classes. Appendix Table A3 shows the proportion of maths and science classes that are grouped by ability in each country, and how sample size changes when we consider only students not grouped by ability. Table 2 presents WLS results for the reduced sample of students not grouped by ability. In the developed country group, the result of removing the effect of student ability in this way is generally to cause the class-size effect (CSE) to become more negative. Comparing column 1 in Tables 1 and 2, the CSE falls in 9 out of 13 countries, though rarely statistically significantly. In four of these 9 (Australia, Italy, Sweden and USA) the coefficient turns from positive to negative; in the case of Sweden it turns from significant positive to significant negative and in the US it turns from insignificant positive to significant negative. Changes are apparent in the quadratic specification as well, e.g. for England where the sign of the linear term turns negative. In both the EECA group and the developing country groups, the result of removing ability-set classes is more mixed and generally more inconsequential 1 .
A second way we attempted to address the endogeneity of class size, is by using Instrumental Variable (IV) estimation. The TIMSS survey asked teachers to what extent high student teacher ratio limited how they taught. This is highly correlated with class size but perhaps should not otherwise be correlated to student achievement. If we believe in the instrument's validity, this correction for endogeneity makes quite a lot of difference. Appendix A4 presents the results. Whereas Table 1 showed that in the developed country group, the CSE was positive and significant in 10 out of the 14 countries, IV estimates show the CSE to be negative and significant in 5 of these 14 countries. In Belgium, New Zealand and Taiwan the significant positive CSE turns to a significant negative effect and the same also happens in Indonesia and Lebanon in the developing country group, though there are less substantial changes in the EECA country group.
When we move from across school estimation to school fixed effects estimation in Table 3, the results again change much. For developed countries, the positive CSE remains statistically significant only for 5 countries, down from 10 countries in Table 1, when we focus only on variations inside schools. There is a similar change for the two other groups of countries. On the whole, 6 of the EECA countries had statistically significant effects with the school fixed effects estimator. The coefficient also seems to be reduced for developing countries where only 4 countries still have significant coefficients, down from 13 in Table 1.
However, the school fixed effects estimator will produce biased estimates of the CSE if pupils are sorted into smaller or larger classes within their grade in the school on the basis of their unobserved characteristics. For instance, even if the school does not have a deliberate policy of grouping students by ability, more ambitious/able parents may insist on having their children placed in smaller classes within the grade. This would lead to a negative correlation between ability and class-size within the school and exaggerate the size of the expected negative coefficient on class size. If this occurs, the coefficient on class-size will be biased (i.e. be a bigger negative) even in a school fixed effects estimator since ability is systematically correlated with both class size and with student performance within the school.
To address the problem of the endogeneity of class size even within the school, we now use the pupil fixed effects estimator. Under this estimator, the identification of effects comes only from across-subject variation in class-size and in achievement score for the same pupil and, as such, this is an extremely stringent test for the presence of a class-size effect since it nets out the effect of all subject-invariant pupil and school unobservables. Providing we believe that pupils who are bright in math are also bright in science(i.e. if ability is not subject specific within the maths-science subject set), this estimator takes care of all student and school level unobservables.
In order to check for any non-linearity of the class size effect, both linear and quadratic specifications are presented.
When variations in achievement and class-size across subject within the same pupil are taken into account in Table 4, the CSE is diminished in all countries, compared with previous estimations. This suggests that part of the CSE previously observed across and even within school was spurious, arising due to the correlation of class-size with pupil unobservables.
However, even though diminished, there remains a statistically significant class size effect for 13 out of the 33 countries for which it was possible to use the pupil fixed effects estimator, though in only 8 of these does it have the expected negative sign. One explanation for the diminished coefficients on class-size could be attenuation bias, which is exacerbated in differenced estimation. However, this is not an important cause for worry because it seems unlikely that measurement error in class-size will be subject-specific within a grade/school. As long as measurement error in class-size for the different subjects is equal and in the same direction, differencing will not cause attenuation bias. In any case, we attempt to address any attenuation bias by instrumenting class size later in Table 5.
In general, the CSE in Table 4  the effect of a 1 SD reduction in class size would be to raise achievement by 0.07 and 0.05 SD respectively.

Robustness
In this section, we check the robustness of our pupil fixed effects results. A potential criticism of the pupil fixed effects approach is that its efficacy depends on the assumption that pupil unobservables do not vary by subject, i.e. that differencing a student's marks in different subjects nets out all aspects of the student's unobserved characteristics. But it may be that students are more able in, or more motivated to study, particular subjects. If students who are specially interested/able in maths are systematically allocated to smaller (or larger) maths classes -either because they lobby or because of school policy -then subject-specific class-size may be systematically correlated with student unobservables. While such a concern may seem somewhat far-fetched, and while we have already sought to address it in Table 2 where we restrict estimation to students not class-setted by ability, we seek to further control for the possibility of different abilities for different subjects.
If pupil ability differs by subject, the ideal way to control for this would be to have indicators of ability for each subject. While such indicators are not available, it is possible to build a proxy for subject-specific ability based on the TIMSS database. An indicator of subjectspecific ability has been constructed from five questions asked directly of the pupil. These are detailed in Appendix 1. The first two columns of Table 5 present the results of pupil fixed effects achievement equations fitted for each country, which include the subject varying pupil ability variable. Although we do not report the underlying achievement regressions for each country, the coefficients on the ability variables were positive and significant, i.e. subject-specific ability enables pupils to have better results in achievement tests in that subject, as might be expected.
But what interests us more is the effect of the inclusion of the ability measure on the CSE.
Compared with the results in Table 4, the CSE here remains roughly identical for the majority of the countries. Thus, it appears that subject-varying ability is not systematically or strongly correlated with class size. This could be because within a grade in a school students may not be able to engineer to sort themselves into smaller classes for those subjects in which they are more capable, while remaining in larger classes for the other subjects.
A potential omitted variable bias occurs if other class-level variables are correlated with class size but are omitted from the equation. These can be variables concerning the teachers.
Indeed, typically, other than class-size, only the teacher characteristics can vary between the classes of the same grade in a school. A possible example of bias is then the possibility that there is a correlation between the class size and teacher characteristics. For instance, if more experienced teachers are, for some reason, systematically allocated to the smaller classes and teacher experience affects student achievement, then class-size could 'pick up' the effect of omitted teacher experience. But in order for this to apply, the correlation of teacher characteristics with class-size would have to be subject-specific, e.g. more experienced maths teachers are systematically assigned to teach small (or large) maths classes and more experienced science teachers are systematically assigned to small (or large) science classes, something that appears unlikely. In any case, in order to check for such omitted variable bias, we introduce four teacher characteristics into the estimation: teacher's gender, age, possession of a master-level or higher diploma in the subject, and experience entered in quadratic form. Since teacher characteristics vary by subject, it is possible to include these variables in pupil fixed effect estimations. The results are presented in the columns (3) and (4) of Table 5. Neither the size nor the significance of the class size coefficients changes, or when it does, generally the coefficients are higher. We can thus conclude that the previous class-size effects are not biased.
A final robustness test we perform is to see if measurement error is driving the pupil fixed effects results. To do this, we instrument class-size with the average class-size in the subject across all grades in the school where this is possible (in some countries there is data only on one class per school). This is identical to the methodology followed in Woessmann and West (2006) but is done here in the context of a pupil fixed effects regression. The results are presented in Table 6. It shows that while the CSE is unaffected by instrumenting in most countries (compared with Moreover, the size of the effect is at best modest, as we discuss in the next section.

Synthesis
The magnitude of the CSE estimated in our various regressions is synthesized in Table 7. Lastly, we attempt to understand and explain our findings with the help of  It is clear that the CSE is much greater in developing countries than in the EECA countries and that in turn is greater than in the developed countries where, if anything, the CSE is positive albeit weak. This follows the hierarchy of per student expenditure on education and also country resources. In developing countries, where resources for education are more limited, reducing class size has the largest effect on achievement: a 1 SD reduction in class-size (by 8.2 pupils per class) would increase student achievement by 0.06 SD 2 ; in EECA countries where resources are more plentiful, the CSE is small. Thus there is some support for diminishing returns to resources.
Secondly, teacher quality is measured by teacher education level (column 9), in particular by the percentage of teachers with MA qualifications and training. Again in regions where teachers are more skilled / higher quality (the EECA and developed countries), the CSE is smaller than in the region with the lowest quality teachers, namely developing countries.

Conclusion
The impact of class size on student achievement remains a thorny question for politicians and educational decision makers. If class-size reductions could bring a sizeable gain in achievement for pupils, decision makers would be tempted to act in this direction. However, recent literature in this field leaves researchers as perplexed as policy makers. In his review work, Hanushek (1997,2003,2006) shows the lack of any consistent or strong relationship between class size and pupil performance across a large range of studies.
In this study, we have sought to check the veracity of these findings using a new technique that avoids arguably the most important sources of endogeneity bias. We measure the class size effect by relating the difference in achievement across subjects to the difference in class-size across subjects. A subject-differenced achievement production function helps to address the non-random matching of children to specific schools and to classes within schools.
We proceeded in several stages. Weighted least squares estimates showed significant and large class-size effects in a high proportion of the countries studied, though many of the 'effects' were positive for developed and transition countries. While WLS effects were negative for developing countries, they were small in size. Pupil fixed effects estimation of the achievement production function powerfully nets out the effect of all subject-invariant pupil, class and school unobservable variables. Several tests of robustness were carried out in order to confirm the results countries and take the mean across all developing countries, the class-size effect on student achievement is only about -0.03 SD rather than -0.06 SD, for a one SD increase in class-size.
obtained. In particular, we controlled for students' subject-specific ability, (subject-specific) teacher characteristics, and also controlled for measurement error. The results show a statistically significant effect of class size for 16 out of 33 countries for which pupil fixed effects achievement equation could be fitted. However, in only 10 countries is the effect of the expected negative sign.
Moreover, the magnitude of the class size effect is small in most cases. At best, an expensive 1 SD reduction in class size in developing countries (equal to reducing 8.2 students per class) allows pupil scores to increase by only 0.06 standard deviations.
Our study confirms one of the principal conclusions of Hanushek (2003,2006) in his meta-analyses, namely that class size does not have a systematic or substantial effect on pupil performance. Detractors (e.g. Krueger, 2003) have argued that meta-analyses based conclusions are flawed because many of the studies that go into meta-analyses are of low quality, for instance, across-state or across-school rather than within-state or within-school studies. In the present paper, Hanushek's conclusions are corroborated using a much more stringent methodology that accounts for endogeneity in the pupil achievement class-size relationship. We conclude that, given the economic implications, class-size reductions do not appear to be a cost-effective strategy for raising student achievement levels in developed, developing or transition countries.
We find that class-size effects are smaller in resource-rich countries than in developing countries, supporting the idea of diminishing returns. We also find that class size effects are smaller in regions with higher teacher quality, suggesting that more skilful teachers cope better with larger classes than less skilful teachers.

Appendix 1 : Evaluation of pupils' "facilities" as a proxy for ability
As mentioned in the robustness section (Section 5.2) of the text, we construct and use a proxy for ability for each pupil in each subject. This is calculated starting from 4 questions put directly to the pupil for each evaluated subject. As the pupil must answer for each subject, the 'capability' level built varies between the subjects.
The scale of our capability proxy varies between 0 and 10 and it depends on the 4 variables  "[Subject] is more difficult for me than of many of my classmates"  "I enjoy learning [subject]"

 "I learn things quickly in [subject]"
A score between 1 and 5 is given for each question, the highest value corresponding to the response "agree a lot". The coefficient of "facility" is calculated by carrying out the arithmetic mean of the scores obtained. Hence, as we have 4 different variables, a score between 4 and 16 is computed. Lastly, for having a score between 0 and 10, we simply multiply this score by 10/16. It should be noted that, by construction, the minimum of a score is in fact 2.5 and not 0, but this is not a problem for our study.   Controlling for student, school and family background variables. Numbers is parenthesis represents robust absolute t statistics. In order to simplify the lecture, coefficients significant at the 10 percent level or better are in bold. Simple least squares (without weighting) gives approximately same results. Logging class-size also gives results that are not too dissimilar. Controlling for student, school and family background variables. Numbers is parenthesis represents robust absolute t statistics. In order to simplify the lecture, coefficients significant at the 10 percent level or better are in bold. Only pupils who are not present in classes grouped by ability are included in the regression (WLS reduced sample regression). See text for more details. Controlling for student, school and family background variables. Numbers is parenthesis represents robust absolute t statistics. In order to simplify the lecture, coefficients significant at the 10th percent level are indicated in bold. Controlling for school fixed effects and student-and family-background. Numbers is parenthesis represents robust absolute t statistics. In order to simplify the presentation, coefficients significant at the 10 percent level or better are in bold. Controlling for pupil fixed effects. Numbers is parenthesis represents robust absolute t statistics. In order to simplify the presentation, coefficients significant at the 10 percent level or better are in bold. Controlling for pupil fixed effects. For some countries, we show the quadratic relation instead of the simple linear form (marked by a 'q') because coefficients from linear estimation are not statistically significant. Numbers is parenthesis represents robust absolute t statistics. In order to simplify the presentation, coefficients significant at the 10 percent level or better are in bold. The instrument used is the average class-size in the subject across all grades in the school. For some countries, we cannot perform this estimation because only one class per school is tested (countries marked by an asterisk: 'na' means 'not applicable'). See table A.5 for background data and table A.6 for first-stage estimates. Note : Coefficients present in the columns (2), (4), (6), (8) and (10) represent the effect, on pupil achievement, of a 1 SD reduction in class size. "ns" indicates that the relation is not statistically significant. A "+" or "-" before the "ns" indicates that the relation is respectively positive and negative but insignificant. "n" indicates that the regression is not possible for this country. The numbers in columns (3), (5),