Studies in Educational Evaluation, 1997, 23, 31-48. pdf
(Based on earlier work presented at Earli congress 1995 Nijmegen, a.o. paper) (intended to be the first part of my dissertation [in Dutch]: 1995 Leren waarderen SVO-project met notenapparaat)

concept

Assessment in historical perspective

Ben Wilbrink

abstract

A historical perspective on assessment is presented, organized around seven themes, and located in the time window between the early middle ages and the end of the 19th century: ways of questioning, the graded school, medieval university examinations, the disputation, ranking and marking systems, examinations and personnel selection, and the possibly facilitating influence of Chinese mandarin examinations on the development of meritocratic assessment. It is concluded that the essential characteristics of current assessment practice were already established by the end of the 19th century, in contrast to the received view on ‘educational measurement’ being a direct descendant of the innovative work of Galton and Binet.

Assessment in historical perspective

Historical study has the power to illuminate current assessment practice, because that practice is for the greater part traditional practice, and it is a powerful instrument to stimulate reflection on assessment. However, a systematic historical treatment of the subject is not available. Some authors do come close, such as Smallwood (1935) in her study on the history of examinations in the U.S.A., Prahl (1974) in his dissertation on the history of examinations in Western Europe's universities, or Hanson (1993) who, from the perspective of the anthropologist, treats critically the place testing has in modern American society, tracing its roots in witch trials and medieval education. The history of assessment has to be assembled from information hidden in many different monographs, school histories, and studies on this or that aspect or period of assessment practice, as will be clear from the references used in this article. The purpose of this article is to facilitate reflection on critical aspects of current assessment practice by tracing their possible roots in history. The search may uncover some unsuspected facts, as for example the existence of an earlier continental analogue of the Mathematical Tripos, the 18th century Cambridge competitive examination.

The sheer age of some assessment traditions shows them to be relatively immune to changes in cultural environment. Indeed, the university as an institution is one of the oldest of the western world, and university examinations are as old as the universities of Bologna and Paris. The concept of school organization in forms (the graded school), and so the concept of curriculum, is only two centuries younger. University examinations and the idea of the school form are obvious subjects for historical analysis. While examinations in the 13th century share characteristics with modern examinations, it is not to be assumed they had the same function and meaning to the actors involved as they have in the 20th. Assessment practice must be studied in its historical context, in order to understand how a particular practice was a solution to problems and tasks as perceived by historical actors.

The reverse case is just as interesting, the solutions of the past still being thought valid in current education even though the original problems have long since ceased to exist. It is quite conceivable that our ineradicable habit of ordering and ranking students is such a solution to a problem that no longer exists, or that it is no longer a legitimate solution to an original and still existing problem.

In this article assessment is to be understood in a generic sense, in contrast to the specific approach known as ‘educational measurement’ which shares its 19th century roots with those of psychological testing. The concept of assessment will be intentionally left undefined, leaving it to the historical analysis to give it shape and content. That it would be misleading to attempt to define assessment on the basis of contemporary practice is illustrated by the fact that in the medieval university the disputation was a prominent part of the examination, with no parallel or analogue in current examinations. We have to settle with ‘family resemblances’ to illuminate the concept of assessment in its historical and current manifestations.

What does it mean to know something ?

Medieval education can be characterized as ‘teaching’ students to learn sacred and other texts by heart. To know something was to know it by heart (Riché, 1989, p. 218). In the early middle ages the texts to be learned were religious texts, and learning took place mostly in monasteries and convents. There was an urgent motivation to learn the Holy Scripture and other religious texts, because doing so made it more likely after one's death to be admitted to heaven. Not only the scarcity of manuscripts forced the monks to learn the scripts by heart; medieval manuscripts being difficult to read one had already to know the text by heart in order to be able to read it (Bolgar, 1954, p. 111). Muslim manuscripts were ambiguous because they consisted only of consonants, therefore the Muslim student had to give proof by recitation that he ‘knew’ the text. Only then would his master authorize him to teach the text (Berkey, 1992, p. 29).

The medieval monk was confronted with a double task: learning Latin grammar in order to be able to learn Latin texts. Meditation, consisting of the recitation of religious texts, was an important activity for the monk. Holy texts, of course, were written in Latin, so one had to study Latin grammar in order to learn to understand and to speak Latin. The study of grammar consisted in the learning by heart of famous grammars dating from the Roman Empire, or simpler textbooks used for beginners. These grammars were written in the style of questions-and-answers, which was a familiar style in antiquity, see in the Bible the questioning of Adam and Eve by the Lord. Memory could use some support, so many manuscripts had illustrations that served as mnemonics. The ‘art of memory’ (Yates, 1966) was practised widely, the Jesuit Matteo Ricci even tried to convince the Chinese of its usefulness in preparation for their exams.

Assessment under these circumstances of necessity took the form of having students recite, answer the questions as posed in the grammar that was used, or question each other. The arts examinations at the medieval universities consisted mainly of very simple questions and answers (Lewry, 1982, p. 116). Questioning and answering was the dominant didactic form in teaching and learning. Knowing the right answers to questions about religious texts was extremely important. Out of this kind of questioning grew the catechism, and in its wake the catechetical method. These archetypes of assessment were still dominant in education as late as the 19th century (Foden, 1989, p. 12). Only in the second half of the 19th century did the American Colleges replace the recitation method by lectures or ‘group discussions.’ The recitation method was a combination of learning and examining, but in the American colonies the examining part was in fact non-existent: “The colonial college student was essentially ungraded and unexamined. (...) public oral examinations were gestures in public relations and therefore not designed to show up student deficiencies” (Rudolph, 1977, p. 145). The first written examinations in Oxbridge in a sense followed the catechetical method, because no questions were put that allowed different interpretations: ‘the way to achieve more accurate and certain means of evaluating a student's work was to narrow the range of likely disagreement and carefully define the area of knowledge students were expected to know’ (Rothblatt, 1974, p. 292).

It is the experience of almost every living adult in developed countries that even today a substantial part of all questioning and assessment in education is recitation and giving the ‘right’ answers to known types of questions. Most standardized tests only count the proportion of ‘right’ answers. The difference between modern and medieval testing seems to be mainly that not the salvation of one's soul but that of one's career depends on producing the right answers. Irony apart, the question-and-answer paradigm deserves the critical looking after it is getting now from many different quarters.

Joan Cele: 14th century originator of Western style education

Wandeling met Sjoerd Karsten in Zwolle, stad van Joan Cele

Important principles of curricular and school organization were developed by Joan Cele, rector of the Latin school of Zwolle, a Hanze town in the Low Countries, in the period ca. 1375 to 1415 (Frederiks, 1960; Codina Mir, 1968). Cele, a famous teacher, had to run a school with 800 to 1000 students in a town with only five thousand inhabitants. Many of these students came from Utrecht, Liège, Flanders, and the German countries. Cele solved the organizational problems posed by the sheer number of his students by imposing a new and strict division of the students in eight classes, as well as the curriculum in eight different forms. Cele hired two Parisian masters in the arts to teach philosophy in the highest two classes. However, most of the students were in the lower classes, learning Latin and its grammar. Still being confronted with classes of up to hundred students, Cele introduced a subdivision in groups of ten students called decuriae. Each group had a leader who was responsible for learning and discipline; leadership was changed every week. Twice a year Cele held examinations for promotion to a higher form. In the lower forms the exam consisted of a recitation to check on the achievement of the task posed in that form; in the higher forms Cele also looked for the student's insight (sententia) into the meaning and message of the Latin texts that were translated.

Cele's innovations were important because his students introduced the new didactic principles in schools all over Europe, among them the university of Paris. The Jesuits, whose Ratio Studiorum was inspired by this didactic modus parisiensis, definitely established this pedagogy in Europe's schools and universities (Codina Mir, 1968). Joan Cele single-handedly established the European model of the graded school, examinations for promotion, and ranking of students on the basis of merit. The historical importance of Cele's innovation was only recently revealed by the work of Post (1954), Frederiks, and Codina Mir. As late as 1960 Philippe Ariès could still present a meticulous study on the evolution of the graded school, being unaware of the source of the innovations lying in Zwolle in the 14th century.

The medieval class contained pupils of different ages, who were yet in the same form for possibly quite different durations. Current school organization is certainly based on the ideas of Cele, but in the 18th and 19th centuries classes came to be constituted bureaucratically according to age and with a fixed duration of stay, grade retention coming to be a consequence of bureaucratic rules, whereas earlier students used to be promoted on the basis of their learning potential (Paulsen, 1921, p. 621; Ingenkamp, 1972, p. 24, 42). The educational philosophy legitimating this modern school organization after the model of the standing armies of the newly formed states, was already formulated by Comenius in the 17th century (Ingenkamp, 1972, p. 16). This evolution in the principle of grouping pupils in classes on the basis of age instead of learning is extremely important, for now pupils came to be assessed in comparison to their peers in age, not in comparison to their peers in learning.

Examinations at the medieval university of Paris

The university of Paris in the middle ages was an organization of masters, in contrast to the university of Bologna that was an organization of its wealthy students. Most of the Parisian masters, however, were masters of arts, being at the same time students in one of the ‘superior’ faculties of law or theology. In the medieval university one first had to study grammar and philosophy in the four-year arts curriculum, only ‘masters of arts’ being admitted to the ‘superior’ faculties of medicine, law or theology. Nobody could be a student in Paris without having a master, so the first thing the newly arrived student had to do was to seek himself a good master (Thorndike, 1944, p. 30). The master was responsible for his students, he saw to it that they spent their time in study and not in idleness, he set daily exercises and heard their recitations. The master put students in competition to each other giving explicit praise to the student with the best achievement of the day, and blaming the student who blundered worst by giving him the cap with earflaps or asinus. Assessment was part and parcel of the daily life of the medieval student. In the early German universities the ‘propaedeutic’ arts examination tested students on questions and answers that were extensively practised in the years before. The level of this arts examination was surpassed by that of many schools, such as the schools of Brethren of the Common Life (Schwinges, 1986, p. 336, 356). To help students in the preparation for their exams there were, already in de 13th century, ‘examination compendia’ available (Lewry, 1982), the same kind of book with questions and answers and model-poems from the civil service examinations that was a selling success in China (Hu, 1984, p. 13).

A major responsibility of the master was to nominate his students for examinations, but only if he deemed them ready. Examinations were public and formal events, failing a candidate was an extremely rare event and even then the reason would be the moral behaviour of the candidate (Schwinges, 1992, p. 235). The candidate was questioned on his knowledge of the prescribed books, he had to deliver a lecture on a text that was only hours before stated to him and he had to take part in a public disputation. The candidate had to give a proof of what the examination would qualify him to do: to lecture.

What really is new and characteristic for the university as a new institution, is the examination by a committee of masters acting on behalf of the representative of the Pope, the chancellor of Paris (see Weijers, 1995, on research on these examinations). “More particularly, [the universities] were the only institutions—and this was one of the great innovations of the medieval university system—;to link teaching and examinations closely together” (Verger, 1992, p. 43). The successful candidate received the ‘licence to teach,’ a certificate that enabled the licentiate to teach anywhere in the Christian world and to attract students. Until the rise of the university, the authority to teach was self-declared or based on a written statement from one's own master, and the license to teach was temporarily given by the local representative of the church. The genesis of the university examination coincides with the loss, about 1200, of absolute autonomy of the individual master. The individual master became dependent on his examining colleagues: only they could recommend his pupil for the ‘licence to teach.’ Another way to describe the introduction of the examination is to say that the chancellor of Paris lost his autonomy in the appointment of university teachers because now there had to be an examination of the candidate by a committee composed of masters of the university. Gradually there grew a distinction between the examination and the appointment as a master. Still later, examinations did qualify one for a certain profession, but did not give entry to that profession because an academic grade was only one of many qualifications, descent and wealth being the more important ones (Moraw, 1992). The new institution and its examinations for the first time in Western history defined what knowledge was, thereby also encouraging the new phenomenon of professionalization (Bullough, 1978).

The university examination was a new institution, having no model in the past, nor in any other country. Webber (1989, p. 36) suggested that the sudden appearance of examinations was influenced by contacts with the Chinese. There are two problems with this hypothesis. At the time, just before Dzenghis Khan established his realm, there were no direct contacts with the Chinese, and Marco Polo went on his journey to China when the examinations were already in place. More important is that the then existing Chinese examinations did not particularly resemble the new university examinations. Another possibility would be that the university examination was copied from practices in higher education in the Muslim world, but there the individual masters were strictly autonomous in licensing their disciples, leading Makdisi (1981) to the conclusion that the organizational forms of the Western universities and their examinations were real innovations.

The methods of lecturing and studying made it necessary to ‘hear’ the lecture series on a particular book more than once, before one had a reasonably sure knowledge of the text and its commentaries. The regulations of the university stipulated the minimum number of times to hear the lecture series on every book in the examination, making repetition a natural characteristic of education in the universities as well as in the schools, and contributing to the very long duration of studies.

Order of merit in the middle ages was based on one's position in society. The right order was extremely important, even in sitting positions at daily lectures; rich students could buy themselves a place in the ‘noble bench.’ Also the order of merit at examinations, the locatus, was first of all an order of social merit (by birth), and was only in second place determined by criteria such as the length of study (the longer the stay, the higher the place) (Rashdall, 1895 i, p. 459; Schwinges (1986, p. 355; 1992, p. 234). Many students, however, did not even have the intention to go for the arts examination. The conclusion is: yes, there was an honours list for every examination, but the place on the list had little or nothing to do with academic merit.

In the medieval university merit in the modern sense of academic achievement was important in daily practice, but was not explicitly recognized in the examinations in the way of a ranking order. The medieval university examination has always been an important model for examinations ever since, but the competitive examination definitely is a later development.

The disputation: a lost examination format

The disputation is the high mark of medieval education. Famous are the disputations between Abelard and William of Champeaux; Abelard describes in his autobiography the flavour of the times and the details of his contests with William (Thorndike, 1944, p. 3). These disputations attracted large numbers of ‘students,’ and marked the beginnings of what would become the university of Paris. There are, of course, many different forms of disputation, and over the centuries there have been important developments in techniques and traditions. A disputation was a major event: all other activities in the university were cancelled so as to give everybody the opportunity to attend. The pièce de résistance of the disputation was a theorem or problem posed by the master who chaired the disputation. The position of the master was to be defended by one of his students (the respondens), and could be opposed by other masters and students. The disputation could last the better part of the day, or even the whole day. The next day the master would give a summary of the arguments pro and contra, and indicate why the opposition failed and what the conclusion or solution (determinatio) of the problem should be. For the respondens participation in the disputation was part of the fulfilment of his examination requirements.

In rare cases the problem posed was a sincere problem eagerly waiting for a solution; here the disputation was a method of finding new secure knowledge. In the middle ages the disputation was the only method to develop new knowledge, and to critically analyze newly translated or discovered theories. In the Muslim world, in the 11th century the disputation was an important instrument in the development of Muslim law, and for that reason an important method in higher education; Makdisi (1981), in good disputational style, posits the primacy of the Muslim disputational form over that of the later European universities. In the development of logic the disputational method was crucial, as described by Kretzmann & Stump (1988, p. 6).

There is an extensive body of literature on the disputation. Many reports have been preserved in the particular literary form of the report as authorized by the master. McDermott (1993) presents in his anthology of the works of Thomas Aquinas a number of quaestiones disputatae. In this anthology there is also a lecture of Thomas; in this lecture one can find the same elements as used in the disputation: arguments and counterarguments, conclusions and refutations. In the field of logic a number of disputations and an introduction to the genre are to be found in Kretzmann, Kenny & Pinborg (1982). Lawn (1993) treats the disputation in medicine and science, shows its essential place in the development of science, and gives some examples. References to the literature can be found in Weijers (1987).

Most of the time, however, the disputations were exercises intended to sharpen the wits of the participants, and as such they were related to the didactic form of questions and answers. Little is known about the role of the disputation in the instructional process, ‘about how students were taught,’ but Perreiah (1984, p. 85) gives details about how ca. 1400 ‘trial disputations’ were delivered: under very strict rules specific to trial disputation, the obligation or the insoluble, and of course under the rules of logic. In the context of trial disputation Perreiah, following Aristotle (Topics 159a 250), speaks explicitly of an instrument to test the knowledge of the participants.

In Jesuit schools the disputation was an instrument to rank students according to merit: the lower ranked student could ‘win’ the rank of his adversary, and vice versa (Compère, 1985, p. 83). Winning or losing was determined by the number of errors made by each contestant. This was also the practice in the Latin school of Sturm (Codina Mir, 1968, p. 173). This kind of ranking by competitive disputations was also known in late Antiquity (Lim, 1995), and in the Muslim world about 1000 (Makdisi, 1981).

The disputation kept a prominent place in university curricula and examinations until the 18th century, when they became more farcical and finally gave way to modern forms of examination in the 19th century. In Leiden, early in the 17th century, disputations took place once in every two weeks, the very rude discussions regularly resulting in a serious scuffle (Schotel, 1875, p. 332). In Oxbridge, in the 18th century students and faculty no longer took disputations seriously, but only halfway the 19th century they disappeared altogether (Rothblatt, 1974) .

The disputation is the only major type of exercise on which one's intellectual agility could be assessed not to survive as such in modern times. The disputation was a public event, and because of that the participants must have been highly motivated to do a good job and to give a good public impression. Assessment in this case was self-assessment as well as assessment by one's public. The disputation has been replaced by examinations in question-and-answer style, yet it might be that there is a modern equivalent to the disputation: scientific research and all the preparation for it that goes into modern secondary and higher education. To be able to do scientific research in the late 20th century demands extensive preparation in mathematics, statistics, discipline-specific research methods, and in the peculiar stylistic scholastics that has developed around reporting and publishing research (e.g. in psychology: Madigan, Johnson, & Linton, 1995). The assessment characteristics are like those of the disputation: reporting is public, and standards for good practice are explicit and objective.

Punishment or reward? Ranking and marking systems

A perennial problem in education is to keep the student's attention on his educational tasks. Punishment is traditionally used to this purpose, often taking the form of punishment for non-disciplinary behaviour. The heads of medieval schools and universities were entitled to punish their students, even for crimes committed outside the school. For the medieval student punishment was a daily routine. In the 11th century Egbert, a teacher in Liège, criticized the harsh punishment in the schools of his day, and 14th century Joan Cele was known to be mild in his punishments (Fortgens, 1956, p. 36; Frederiks, 1960, p. 56). The humanists propagated competition and reward instead of punishment to motivate students (Bot, 1955). Scaglione (1986, p. 13) sees a connection between the emergence of these new ideas and practices and the innovations of Joan Cele; he also points out that in the Renaissance there was an extraordinary eagerness to learn, in contrast to the periods before and after (o.c., p. 93). The influence of the humanists led to a system of prizes for the best students of the class that dominated Western education until deep in the 19th century.

In order to be able to reward the best student one should know who he is, and one should have some rules to rank students for this purpose. The prize mechanism led to bookkeeping of points or notae throughout the academic (half) year, points being earned by good behaviour, or lost by making academic mistakes as well as by bad behaviour. The prize system is a driving force behind the development of systems of points and 19th century marking systems. The schools of the Brethren in the late middle ages already had an elaborate system of ranking students according to merit, examinations being used to determine their ranking. Students could challenge the rank given to them, in which case a contest between the challenger and the next better ranked student was held (Codina Mir, 1968, p. 173). Haskins (1923, p. 74) gives the example, from a 15th century student manual, of the daily disputation held by the master with his own pupils, where a prize as well as a symbolic punishment (asinus) was given for keeps until the following dispute; the same practice existed in 1559 in Calvin's Academy of Geneva. There was then already a practice of keeping a record of earned points or notae. “Classes were divided into decuriae not by age or social rank but by merit and achievement. The decurio supervised all work, and punishment for intellectual sluggishness could take the typical form of nota asini "the ass's mark" or nota sermonis soloecismi, "the mark of bad Latin" (Scaglione, 1986, p. 47).” Centuries earlier, in the Muslim world, the same practice existed of ordering of pupils according to merit (Makdisi 1981, p. 81, 91).

In Jesuit schools competition and ranking by academic merit was the core of the educational program. “The Jesuits, as educators general before modern times, did not formally grade students’ homework or even tests, but by their results they listed the students publicly in order of merit (Scaglione, 1986, p. 74).” From the 17th century lists have survived where every student at the end of the school year was graded according to his achievements and capacities. (Compère, 1985, p. 83).

There have always been objections to the prize system. In the middle ages Italian parents objected to the leniency of the system: they preferred punishments. A frequently stated objection was that the many students who would never be able to earn a first or second prize were in fact neglected by this system of rewards. Also there were objections against certain moral problems in the wake of the competition for prizes: fraud, malicious delight, stress, and lying.

In England, during the latter part of the 18th and the first half of the 19th century, the university climate grew competitive, written examinations replacing the orals, and candidates being ranked according to achievement on lists of ‘honours’ candidates that were made public. Low achieving candidates could hide their shame by taking a ‘pass', in which case they were not ranked and their names were not made public. At Cambridge the participants in the Mathematical Tripos were until 1910 ranked according to achievement, the best achievement being honoured with the title of Senior Wrangler, the least one with a title as well as a man-sized attribute: the Wooden Spoon. Competitive examinations in Oxbridge, in the early 19th century, put the students under great pressure (Rothblatt 1982). Competitive examinations were also known on the continent, where already in the 17th century at Leuven there was fierce competition between students from its four colleges, called pedagogies (Vanpaemel, 1986, p. 33). Leuven also had its own variant of Cambridge's classes, called lineas; the best candidate was called the primus and he was highly honoured. The great pressure on students that Rothblatt mentions manifested itself at Leuven already before 1675. The resemblance between the examinations in Leuven and Cambridge does seem to have escaped the attention of historians.

Exactly why and how ranking systems were in the 19th century replaced with marking systems is not known, but surely the 19th century belief in the power of measurement (Kula, 1986) must have been involved. Ranking of students was in the first half of the 19th century still the dominant practice in secondary education. According to Compère (1985, p. 83), before 1850 there was no marking system in use in France. In early 20th century Germany, class ranking was possibly still in general use: Stern (1920) compared the scores on his new intelligence test with the rank in class, not the marks obtained. In the Netherlands the gymnasium of Groningen was probably the last school to substitute a marking system for the sytem notae and ranking lists, doing so only in 1901 (Van Herwerden, 1947, p. 41). For the United States the history of grading systems in higher education is described by Smallwood (1935). Notwithstanding the replacement of the believed oldfashioned ranking system with the marking system, high marks were still as scarce a good as the first or second place in the class order of merit, because they were artificially made scarce (Deutsch, 1979, p. 393). In England the first case of marking examination papers is found in the Mathematical Tripos of 1836: “Earlier examiners and moderators tended to rely on impression” (Rothblatt, 1982, p. 14).

In the ranking system rank was determined by the summed scores (= notae) of all the students in the form. For that purpose notebooks were kept; in Groningen, for example, every student had a notebook wherein all notae were jotted down, not only those of himself, but also those of all other students (Van Herwerden, 1947, p. 41; Rudolph, 1977, p. 147, for a parallel at Harvard). The notebooks in Western education resemble the Books of Merit and Demerit in China, in the 16th and 17th centuries (Brokaw 1991), but there is probably no link between the two systems. There must be some kind of relation, however, between ranking systems and their replacements, the marking systems that are still used all over the world; knowing that relation might shed some light on the reasons for adopting marking systems.

A short description of the emergence of the marking system in England is given by Rothblatt (1993, p. 44); competitive examinations in Oxbridge demanded objective assessment, and credible objectivity demanded the curriculum to be narrowed so as to be able to assess by using marks. This is an important clue, that marking served purposes of ranking, especially to legitimize the judgments being made of the examination papers, and that curricular content was adjusted to make this kind of assessment possible. In France the marking system seems to have evolved from the ranking system: Chervel (1993, p. 136 ff.) shows how juries for the French concours d'agrégation gradually change a complex ranking procedure into a marking system. Instead of simply ranking the candidates from the worst (number one) to the best achiever (equal to the number of candidates), candidates came to be ranked on a fixed range from one (worst) to ten (best), allowing ties, or breaking ties by using halves. The change was made complete by not using the extreme numbers when the impression was that candidates were not good or bad enough to ‘deserve’ them. Marking systems differ from country to country, while the basic idea underlying them is the same everywhere in the Western world: the system of ranking stripped of its prizes, and pseudo-objectified by evaluating achievement directly on a marking scale. With hindsight, the problem in the new marking systems is the lack of rules or standards that could make the translation from the number of errors to the assigned grade an objective one.

Competition and the state

Modern examinations were formed in the critical period of the late 18th and early 19th century, this formation having much to do with the rise of modern states in Europe. In fact it was state influence that was the crucial factor in most countries, England being a special case because of the autonomous nascence of Oxbridge competitive examinations, and the U.S.A. not yet participating in this process of state formation.

University enrollment in the 17th and 18th century was low and in many countries examinations did not exist any more, or what was called examination was farcical. “All through the 18th century, the examination for the B.A. had been a purely formal ritual of answering standard questions known in advance, and reading a "wall lecture," so called because the examiners would generally leave during the reading of the lecture. It was purely a formal requirement that the lecture be read and the examiners were not required to judge its quality” (Engel 1974, p. 307). “For most of the 18th century undergraduates and collegiate fellows were bored” (Rothblatt, 1974, p. 247). In continental Europe the general trend in the 17th and especially the 18th century was that the state tried to get a hold on the universities and its examinations in order to control the numbers and qualities of its civil servants (Frijhoff 1992). Where earlier one's family, wealth and relations were decisive to get attractive government positions, now merit was becoming the prime criterion. This did not mean that other factors now became unimportant, or that elite positions were threatened by newcomers (Fischer & Lundgreen, 1975). The importance of merit also did not mean that positions were now in fact open to all talented: the costs involved in reaching competitive positions in education were so high that only the established elites and wealthy merchants could bear them, as was the case in the middle ages also (Schwinges 1986, p. 5). Only the 20th century would see the combination of merit and more equal opportunity.

England

The development of ‘modern’ examinations in England begins already in the first half of the 18th century with the institution of the Senate House examination at Cambridge, later to become the Mathematical Tripos (Gascoigne, 1984). The why and how of this development is unknown, but Rothblatt (1974) presents many relevant facts and interesting speculations. Roach (1971, p. 12) affirms the decisive role the English university examinations played as a model for the civil service examinations that were established in the middle of the 19th century. The pervasive influence of the university examinations is described by Rothblatt (1982, p. 15): “The Oxbridge model was followed in the schools, in military academies, in the system of local examinations and in the various branches of civil services, excepting the Department of Education and the Foreign Office. Different career phases became linked together by the same examinations (...).”

France

Present-day France knows the educational contest, the concours, for entrance to prestigious institutions and colleges; this tradition has its origin in a legate of Louis Legrand, who started a yearly contest between 10 Parisian colleges in 1747 (Palmer, 1985, p. 24). In the later 18th century more examinations began to be used, and in a more stringent manner, for recruitment to technical institutions for the army (Ecole du Génie) and the government (Ecole des Ponts et Chaussées), after the revolution the Ecole Polytechnique, an institution that was much followed after by other European countries. The whole point of the concours is that admission to a grande école, for example, will practically guarantee a prestigious job. In France it was the government that made examinations, for the first time in French history, decisive for many a state career; for this purpose it instituted examinations that did not exist before in this form.

Prussia

Prussian rulers in the 18th century built the most efficient bureaucracy of Europe. They instituted the earliest civil service examinations with the intention to break the monopoly of the aristocracy in high government positions (Prahl, 1974, p. 300). In the 18th century a course preparing for government jobs was instituted next to the traditional faculties of theology, law, and medicine. To regulate numbers there came restrictions, also for the other faculties, taking the form of a final examination of the Gymnasium: the Abitur. In the 19th century students in government tracks and of limited means, the so-called Brotstudenten, were cramming for their state examinations; this group was not sold on to the Humboldtian ideal of the university. Growing numbers of students in the 19th century led to the bureaucratization of state examinations themselves as well, strengthening the natural tendency of Brotstudenten to cram for their exams (McClelland, 1980). In these strong developments taking place in the 18th and 19th century the form and function of assessment in Germany was definitely set.

The characteristic development in the 18th and 19th century is that assessment has become a serious matter. No longer was it only a question of honour to win the prize, now one's future career depended on it. No wonder that competitive examinations were going to dominate the educational scene: assessment now served many other lords and interests besides those of transmission of cultural heritage. Assessment served no longer any didactic purposes, instead it dictated them in the form of the necessity of cramming for narrowly defined examinations. Rothblatt (1982) studied the stress that Oxbridge students experienced in their years of study early in the 19th century. From now on for most students only counted what would ultimately be tested.
Because so much now depends on the outcome of examinations, the pressure is in the direction of kinds of questions that do not divide assessors, and on procedures of counting errors or assigning marks that give the impression of exactness. Assessors now stand on the side of state interests or of the professional association, no longer on the side of the student like the medieval master did. Merit assessment has its price: an objectifying distance between assessors and assessed. Yet, the same meritocratic procedures, once in place, made it possible in the 20th century to really offer educational and career possibilities to the talented from all classes in modern society, even though in the eyes of some this may have been a mixed blessing (Ringer, 1979, voicing this feeling).

Chinese mandarin examinations a model for the West

Imperial Chinese examinations are the first known written examinations in history; they gave entry to civil service and were very selective. They were held once in every few years, in halls dedicated to these examinations. Examinations were thoroughly meritocratic, reflecting the Confucian philosophy on the place of merit in China's hierarchic society. Examinations had different forms and functions in different periods, as documented for the examinations of the Ming and Tsing dynasties, from the 14th century until 1905, by Ho (1962) and Miyazaki (1976). Miyazaki's title, ‘China's examination hell,’ adequately depicts the character of these examinations. The main characteristics of these examinations are the written form, the elaborate measures taken to ensure objective assessment, their literary content based on the Confucian classics, the possibility of open participation to all, chances of success more often than not only one in a hundred, unlimited opportunities to participate again even in high age, and the frequency of once in every three years. Examinations were the most important opportunity to become a civil servant, a function in very high esteem and well paid. With the exception of the reign of the Khan's, who abolished the examinations, examinations played a crucial role in the stability of the empire, curtailing the power of the aristocracy and the military, and legitimizing the favoured position of civil servants.

Changes in the examination ‘culture’ in Europe between the early 18th and the late 19th century were manifold, and in the direction of the chief characteristics of Imperial Chinese examinations and bureaucracy: from oral to written examinations, from inconsequential examinations to explicit selection for civil service, from formal ceremonies to competitive examinations, from small numbers to numbers of participants many times higher than the numbers of available places. The resemblance of the European developments during the Enlightenment with the situation in China in the 18th century adored by many intellectuals in Europe suggests some influence of the Chinese model. During the 18th and 19th century many factors influenced the development towards competitive examinations in Europe, among them the achievement of free trade, a principle that also could be of use in government and education. Among the numerous factors mentioned in the literature, the availability of the model of Chinese civil service examinations deserves special mention. It was widely known in Europe, and examinations modeled after this Chinese examination format were propagated by, for example, Adam Smith in his Wealth of nations; see Têng (1943) and Guy (1963) for details on the way the Chinese model influenced European thinking on examinations and their societal role. In their turn, European examinations influenced developments in Japan; its Meiji government instituted meritocratic civil service examinations after the Chinese model, with strong Prussian influence (Spaulding, 1967; Rohlen, 1983, p. 61).

That the Chinese model might have served as a kind of magnet for developments in Western Europe should strengthen our reflective mood regarding the dominant presence of examinations in our daily life. The Chinese civil service examinations were just what the name suggests: a means for selection of civil service personnel, not an educational system. Imperial China never developed an adequate educational system, although in the Sung period a serious effort was made. The suggestion from the Chinese experience is that a strong examination system threatens the quality and even the existence of the educational system. Selection is not a productive process, for it does not of itself produce qualifications; a society that takes the productivity of its educational system seriously should keep education and assessment and selection in proper balance.

Discussion

This search for possible roots of assessment, superficial as it of necessity must be, nevertheless shows some significant and maybe quite unsuspected facts. The first observation is that, indeed, before the beginning of the 20th century assessment had already developed into the forms and procedures that still characterize it today. This underscores that our ‘assessment culture’ is, for the better or the worse, the legacy of societies that long since have gone. Another conclusion from this historical exercise is that the history of ‘educational measurement,’ going back to Galton and Binet, is surely not the history of assessment. Assessment itself was seen to be a complex concept that could be analyzed in terms of its content, its context of the graded curriculum, its descent from medieval university examinations, its instrumental quality to motivate students, its uses in (societal) selection as an instrument of the state, and as strengthened in its meritocratic character by the example of China's mandarin examinations. Still, some aspects had to be left out, such as how medieval masters and students use their time and what their attitude towards work is (Van den Hoven, 1996), or the period immediately before the rise of the universities (Jaeger, 1994).

This article might give rise to more questions than it answers, in which case it would fulfil its intention to stimulate reflection on assessment. The historical facts in this article concern educational systems primarily serving the upper classes of society, while education in the 20th century is mass education, even extending to mass higher education. Why then would knowledge of the roots of assessment be relevant for understanding current assessment practice? The fascinating observation is that assessment procedures handed down by tradition were in this century rather uncritically adopted in mass education, possibly leading to major inefficiencies in education and for too many students a lack of quality of school life.

Note. The research for this paper was partly subsidized by the Netherlands Foundation for Educational Research (SVO) in The Hague, grant number 94707.

References

Ariès, Ph. (1960). L'enfant et la vie familiale sous l'ancien régime. Paris: Plon.

Berkey, J. (1992). The transmission of knowledge in medieval Cairo. A social history of islamic education. Princeton: Princeton University Press.

Bolgar, R. R. (1954). The classical heritage & its beneficiaries. Cambridge, at the University Press.

Bot, P. N. M. (1955). Humanisme en onderwijs in Nederland. Utrecht: Het Spectrum.

Brokaw, C. J. (1991). The ledgers of merit and demerit. Social change and moral order in late imperial China. Princeton: Princeton University Press.

Bullough, V. L. (1978). Achievement, professionalization, and the university. In J. IJsewijn, & J. Paquet (Eds.). The universities in the late middle ages (p. 497-510). Leuven, at the University Press.

Chervel, A. (1993). Histoire de l'agrégation. Contribution à l'histoire de la culture scolaire. Paris: INRP Editions Kime.

Codina Mir, G. (1968). Aux sources de la pédagogie des Jésuites; le ‘Modus Parisiensis.’ Roma: Institutum Historicum S.I. https://archive.org/details/bhsi28

Compère, M. M. (1985). Du collège au lycée (1500-1850). Généalogie de l'enseignement secondaire français. Parijs: Gallimard/Julliard.

Deutsch, M. (1979). Education and distributive justice: some reflections on grading sytsems. American Psychologist, 34, 379-401. pdf

Engel, A. (1974). Emerging concepts of the academic profession at Oxford 1800-1854. In Stone, L. (Ed.).The university in society. Vol I Oxford and Cambridge from the 14th to the early 19th century (p. 305-351). Princeton: Princeton University Press.

Fischer, W., & Lundgreen, P. (1975). The recruitment of administrative personnel. In Tilly, C. (Ed.). The formation of national states in western Europe (p. 456-561). Princeton: Princeton University Press.

Foden, F. (1989). The examiner. James Booth and the origins of common examinations. Leeds: School of Continuing Education.

Fortgens, H. W. (1956). Meesters, scholieren en grammatica; uit het middeleeuwse schoolleven. Zwolle: Tjeenk Willink.

Frederiks, J. (1960). Ontstaan en ontwikkeling van het Zwolse schoolwezen tot omstreeks 1700. Een historische studie. Zwolle: Tijl.

Frijhoff, W. (1992). Universities: 1500-1900. In Clark, B. R., & Neave, G. R. (Eds.). The encyclopedia of higher education (II, p. 1251-1259). Oxford: Pergamon Press.

Gascoigne, J. (1984). Mathematics and meritocracy: the emergence of the Cambridge Mathematical Tripos. Social Studies of Science, 14, 547-584.

Guy, B. (1963) The Chinese examination system and France, 1569-1847. In Besterman, T. Studies on Voltaire and the eighteenth century, vol. 25, 741-778. Geneva: Institut et Musée Voltaire.

Hanson, F. A. (1993). Testing testing. Social consequences of the examined life. Berkeley: University of California Press. online

Haskins, Ch. H. (1923/1957) The rise of the universities. London: Cornell University Press. site

Ho, P. T. (1962). The ladder of success in imperial China. Aspects of social mobility, 1368-1911. New York: Columbia University Press.site

Hu, C. T. (1984). The historical background: examinations and control in pre-modern China. Comparative Education, 20, 7-26.

Ingenkamp, K. (1972). Zur Problematik der Jahrgangsklasse. Weinheim: Beltz

Jaeger, C. S. (1994). The envy of angels. Cathedral schools and social ideals in Medieval Europe, 950-1200. Philadelphia: University of Pennsylvania Press. site

Kretzmann, N., & Stump, E. (Eds.) (1988). The Cambridge translations of medieval philosophical texts. Volume one: logic and the philosophy of language. Cambridge: Cambridge University Press. site

Kretzmann, N., Kenny, A., & Pinborg, J. (Eds.) (1982). The Cambridge History of Later Medieval Philosophy. From the rediscovery of Aristotle to the disintegration of scholasticism 1100-1600. Cambridge: Cambridge University Press. site [later edition 1988]

Kula, W. (1986). Measures and men. Princeton: Princeton University Press. site

Lawn, B. (1993). The rise & decline of the scholastic ‘quaestio disputata’ with special emphasis on its use in the teaching of medicine and science. Leiden: Brill. site

Lewry, O. (1982). Thirteenth-century examination compendia from the faculty of arts. In Les genres littéraires dans les sources théologiques et philosophiques médiévales (p. 101-116). Louvain-la-Neuve.

Lim, R. (1995). Public disputation, power, and social order in late antiquity. Berkeley: University of California Press. site

Madigan, R., Johnson, S., & Linton, P. (1995). The language of psychology: APA style as epistemology. American Psychologist, 50, 428-436. pdf

Makdisi, G. (1981). The rise of colleges: institutions of learning in Islam and the west. Edinburgh: Edinburgh University Press. site {see also Abdul Haq Compier, 2011, How Europe came to forget about its Arabic heritage pdf

McClelland, Ch. E. (1980). State, society, and university in Germany 1700-1914. Cambridge: Cambridge University Press. site

McDermott, T. (Ed.) (1993). Thomas Aquianas, Selected philosophical writings. Oxford: Oxford University Press. site

Miyazaki, I. (1976). China's examination hell. New York: Weatherhill. site

Moraw, P. (1992). Careers of graduates. In H. de Ridder-Symoens (Ed.). A history of the university of Europe. Volume I, Universities in the middle ages (p. 244-279). Cambridge: Cambridge University Press. site

Palmer, R. R. (1985). The improvement of humanity. Education and the French revolution. Princeton: Princeton University Press.

Paulsen, F. (1921/1960). Geschichte des gelehrten Unterrichts auf den deutschen Schulen und Universitäten vom Ausgang des Mittelalters bis zur Gegenwart. Vol. II. Berlin. online

Perreiah, A. R. (1984). Logic examinations in Padua circa 1400. History of Education, 13, 85-103.

Post, R. R. (1954). Scholen en onderwijs in Nederland gedurende de middeleeuwen. Utrecht: Het Spectrum.

Prahl, H. W. (1974). Abschlussprüfungen und Graden. Sozialhistorische und ideologiekritische Untersuchungen zur akademischen Initiationskultur. Dissertation Universität Kiel.

Rashdall, H. (1895) The universities of Europe in the middle ages. Edited by F. M. Powicke en A. B. Embden (1936). Oxford: at the Clarendon Press.

Riché, P. (1989). Ecoles et enseignement dans le Haut Moyen Age. Fin du Ve siècle - milieu du XIe siècle. Paris: Picard.

Ringer, F. (1979). Education and society in modern Europe. Bloomington: Indiana University Press.

Roach, J. (1971). Public examinations in England 1850-1900. Cambridge: Cambridge University Press.

Rohlen, T. P. (1983). Japan's high schools. Berkeley: University of California Press.

Rothblatt, S. (1974). The student sub-culture and the examination system in early 19th century Oxbridge. In Stone, L. (Ed.). The university in society. Vol I Oxford and Cambridge from the 14th to the early 19th century (I, p. 247-303). Princeton: Princeton University Press.

Rothblatt, S. (1982). Failure in early nineteenth century Oxford and Cambridge. History of Education, 11, 1-21.

Rothblatt, S. (1993). The limbs of Osiris: liberal education in the English-speaking world. In Rothblatt, S., & Wittrock, B. (Eds.). The European and American university since 1800. Historical and sociological essays (p. 19-73). Cambridge: Cambridge University Press.

Rudolph, F. (1977). Curriculum. A history of the American undergraduate course of study since 1636. San Francisco: Jossey Bass.

Scaglione, A. D. (1986). The liberal arts and the Jesuit college system. Amsterdam: Benjamins.

Schotel, G. D. J. (1875). De academie te Leiden in de 16e, 17e en 18e eeuw. Haarlem: Kruseman & Tjeenk Willink.

Schwinges, R. C. (1986). Deutsche Universitätsbesucher im 14. und 15. Jahrhundert: Studien zur Sozialgeschichte des alten Reiches. Stuttgart: Steiner.

Schwinges, R. C. (1992). Student education, student life. In De Ridder-Symoens, H. (Ed.). A history of the university of Europe. Volume I, Universities in the middle ages (p. 195-243). Cambridge: Cambridge University Press.

Smallwood, M. L. (1935). An historical study of examinations and grading systems in early American universities. Cambridge: Harvard University Press.

Spaulding, R. M. (1967). Imperial Japan's higher civil service examinations. Princeton: Princeton University Press.

Stern, W. (1920). Die Intelligenz der Kinder und Jugendlichen und die Methoden ihrer Untersuchung. Leipzig: Barth.

Têng, S. (1943). Chinese influence on the western examination system. Harvard Journal of Asiatic Studies, 7, 267-312.

Thorndike, L. (1944). University records and life in the middle ages. New York: Columbia University Press.

Van den Hoven, B. (1996). Work in ancient and medieval thought. Amsterdam: Gieben.

Van Herwerden, P. J. (1947). Gedenkboek van het Stedelijk Gymnasium te Groningen. Groningen: Wolters.

Vanpaemel, G. (1986). Echo's van een wetenschappelijke revolutie. De mechanistische natuurwetenschap aan de Leuvense Artesfaculteit (1650-1797). Verhandelingen van de Koninklijke Academie voor Wetenschappen, Letteren en Schone Kunsten van België, Klasse der Wetenschappen, Jaargang 48, Nr. 173. Brussel: Paleis der Academiën.

Verger, J. (1992). Patterns. In De Ridder-Symoens, H. (Ed.) A history of the university of Europe. Volume I, Universities in the middle ages (p. 35-74). Cambridge: Cambridge University Press.

Webber, C. (1989). The mandarin mentality: civil service and university admissions testing in Europe and Asia. In Gifford, R. (Ed.). Test policy and the politics if opportunity allocation: the workplace and the law (p. 33-60). Dordrecht: Kluwer.

Weijers, O. (1987). Terminologie des universités au XIIIe siècle. Roma: Edizione dell’ Ateneo.

Weijers, O. (1995). Les règles d'examen dans les universités médiévales. In Hoenen, M. J. F. M., Schneider, J. H. J., & Wieland, G. (Eds.). Philosophy & learning. Universities in the middle ages. (p. 201-223). Leiden: Brill.

Yates, F. A. (1966). The art of memory. London: Routledge & Kegan Paul.

Correspondence

Geert Vanpaemel, June 25 1996 (from an email, in Dutch):

I am not aware of any special study on the system of examinations in Leuven. That its competitive system is so unique I did not know. As I have not researched the period before 1650, I do not know where it originates from. It existed already in the beginning of the sixteenth century and has probably been copied from Paris.

There is a paper (licentieverhandeling) on the organisation of the Artes department in the eighteenth century by Cleenewerck de Clayencour, not published, it should be available (typoscript) at Universiteitsarchief, Mgr Ladeuzeplein 21, B-3000 Leuven. The following publications might be relevant:
E. Reusens (1867). Statuts primitifs de la Faculte des Arts. Bulletin de la Commission Royale d'Histoire, 3, 9 (1867), 151-183.
J. Paquet (1970). Statuts de la faculte des Arts de Louvain 1567-1568?, Bulletin de la Commission Royale d'Histoire,136, 179-271.
P. F. X. de Ram (1861). Codex veterum statutorum Academiae Lovaniensis. In J. Molanus: Historiae Lovaniensis Libri XIV, Bruxelles, 2, 944-979 and 1089-1178.

After being invited to write the article for the StEE I did not have the time to follow up the leads Geert Vanpaemel provided me with, regrettably. May 1997 I obtained van Vocht's (1951) four-volume history of the Collegium Trilingue Lovaniense, that could shed some light on the early sixteenth century situation Geert Vanpaemel hinted at. I have not yet found the opportunity to browse the more than 2000 pages for information on the system of examination used. A relative of a forefather of my grandchildren was Maarten van Dorp, Martinus Dorpius, a professor at Leuven, the college ‘De Lelie', who died in 1525. He probably got this position on the basis of excellent examination results. ‘De Universiteit te Leuven, 1425-1975’ mentions the following on the ranking of students within their college:
“Aan het eind van deze reeks examens werden de studenten gerangschikt—locatio heette dat—volgens de behaalde uitslagen. De rangschikking gebeurde volgens vier categorieën of ordines; eerste rang de rigorosi. tweede de transibilies, derde de gratiosi capaces tamen gratiae, afgewezen werden de gratiosi seu refutabiles. Nummer één in de rangschikking werd tot primus uitgeroepen en in triomf door de stad gevoerd.” (p. 90).

[After the close of the series of exams the students were ranked (locatio according to the results obtained. The ranking followed the four categories of ordines; first rank the rigorosi, second the transibilies, third the gratiosi capaces tamen gratiae, while the gratiosi seu refutabiles were failed. The number one in the rank order was acclaimed to be the primus and he was carried through the city in triumph.]

The last sentence suggest that the ‘winner’ was only one person, not the bunch of four winners from the four colleges. This secondary source is not clear on this point. The relative numbers of students passed and students failed are not known form the sources. So much is known, however, that instruction at the Artes department was directed at the high standards that only the best students were able to comply with. Even with this small amount of information on the examinations in the sixteenth century it is evident that there must have been a fierce competition between students, and many students dropping out of it already at the early stages of their study.

Bots, H., I. Matthey en M. Meyer (1979). Noordbrabantse studenten 1550-1750. Tilburg: Stichting Zuidelijk Historisch Contact.
An early, 1555! Brabant student winning the Leuven concours (see de Vocht, mentioned above) is Rogerus (Rutger) Hessels, alias Alardi (Alarts) (Bots, p. 49, 363 #2148):

Macharen. Imm. Leuven in januari 1555 (pedagogie het Varken, pauper). Promoveerde 16-3-1557 tot A.L. als 1e (primus) van 173 candidaten. In 1558 vermeld als student theologie met een beneficium te Macharen. Komt voor op een bursalenlijst van het Standonckcollege uit de jaren 1559-1562. De opsteller daarvan typeerde Hessels als "een talentvol man, één en al vriendelijkheid, maar te veel tuk op gezelschap." Was in 1571 pastoor te Macharen. Werd in dat jaar deken van het district Oss. Vanaf ca. 1578 tot zijn dood pastoor te Grave en deken van het district Cuyk. Overleed ca. 1596. [for sources see Bots e.a. p. 363]

"De volledige Artes-cursus vergde in de 15e en 16e eeuw te Leuven 2 1/2 jaar, later 2 jaar. De tweejarige cursus was opgebouwd uit 9 maanden logica, 8 maanden fysica, 4 maanden metafysica en ethica, waarna 3 maanden volgden voor het repeteren van de behandelde stof. Vier maanden na het begin van de studie moest de Artiest zijn eerste proeve van bekwaamheid leveren, de actus determinantiae. De disputen voor de titel van baccalaureus artium vonden plaats in het begin van het tweede jaar. De Artes-studie werd afgesloten met het licentiaatsexamen in de vorm van een concours, waaraan alle candidaten deelnamen. Daaraan vooraf ging een examen in elk van de vier pedagogiën, dat gericht was op de voorselectie van de beste studenten, de lineales, die in een tweede ronde met elkaar zouden gaan wedijveren. De drie studenten uit elke pedagogie die bij dit vooronderzoek, calamus geheten, de hoogste ogen gooiden, plaatsten zich voor de eerste linie. Degenen die bij de calamus op de 4e tot 6e plaats eindigden, kwamen terecht in de tweede linie. De nummers 7, 8 en 9 vormden de derde linie. De studenten die zich niet voor een van de linies wisten te klasseren, werden als postlineales aangeduid. De resultaten tijdens het slotconcours bepaalden de plaats binnen de linies alsmede de rangschikking van de postlineales. De beste student uit de eerste linie werd uitgeroepen tot primus universitatis. De primus was het voorwerp van uitbundige festiviteiten, zowel te Leuven als in zijn plaats van herkomst."

Onder de Brabantse studenten zijn de volgende primi:
2148 Rogerus Hessels (details zie box hierboven),
2264 Lambertus Hoex (Houckx) primus 17-11-1726,
3348 Petrus de Louw, Den Bosch, primus nov 1588,
3736 Laurentius Nagelmaeker alias Bacx alias Van Westerhoven primus 18-2-1563,
5546 Lambertus Vincent, Grave, primus 12-11-1648,
5575 Godefridus van Vlierden, Den Bosch, primus 18-2-1574,
5747 Johannes van den Warck (Waerck), Breda, primus 27-11-1590,
5915 Goswinus Witte, Hilvarenbeek, primus 14-2-1576,
en in Douai: 1485 Simon Fierlands *ca 1602 Den Bosch

Bots p. 46-47

Previous work

Wilbrink, B. (1995). What its historical roots tell us about assessment in higher education today. 6th European Conference for Research on Learning and Instruction, Nijmegen. Paper: auteur. html

Wilbrink, B. (1995). Leren waarderen: de geschiedenis. Amsterdam: SCO-Kohnstamm Instituut (concept). (SVO project 94707) html

Wilbrink, B. (1995). Leren waarderen: de geschiedenis. Versie met uitvoerig notenapparaat. (SVO project 94707) [html 480kB]

Related issue: if grading really is a form of ranking, then what is a GPA?

This article has articulated how grading really is a form of ranking as it has been developed on the bases of straightforward ranking as practiced in schools throughout Western Europe in modern times. Once in place, a grading system invites one to regard grades as a kind of quantities that may be added, middled, subtracted, etcetera. A GPA seems a natural way to combine grades, no matter how individual grades might be obtained: in what time of year, on what kind of exercise, in which discipline. The next step then is to use GPA in decisions on admissions, tracking, selection. Scores on tests, such as the American Scholastic Achievement Test, may be standardized, yet their fundamental character is that of ranks also. What has changed since the eighteenth or nineteenth century, then, is the complexity of the rankings involved. In former centuries the ranking was based on, for example, the number of errors the students had made in their work. In more recent times, the GPA is a kind of ranking of rankings, possibly quite another kind of procedure than the old way of ranking pupils.

As far as grading may be regarded to be (a form of) ranking, it is possible to precisely analyze its inner workings. This kind of analysis uses the famous Impossibility Theorem by Arrow (1963), a result in social choice theory concerning the impossibility of consistently rankordering a set of choice alternatives. The bite is this: inconsistent ranking of individuals definitely is unfair. The analysis of assessment rankings in terms of Arrow's result is presented in Vassiloglou and French (1982). What is unfair about ranking on the basis of ranks is, for example, that the resulting rank order is dependent on who else is being ranked together with you. The authors present an example from the literature, a ranking of five candidates on the basis of five subrankings, and show that deleting one candidate may change the ranking of the other four relative to each other. Now imagine these candidates applying for admission to Harvard University ..... . The SAT results etcetera might validly indicate merit, yet in the details of the selective process it will be the case that admissions also depend on trivial circumstances.

There are at least two possibilities to escape the unfairness of ranking of rankings: the first is to get rid of the problem itself by replacing simple ranking of performances with assessment of the strengths of performances (French and Vassiloglou, 1986), the second is to inform candidates about the workings of the ranking method that will be used in the coming examination (De Groot, 1970; Van Naerssen, 1970).

Amy N. Langville & Carl D. Meyer (2012). Who's #1? The Science of Rating and Ranking. Princeton University Press. site

Chapter One: Introduction to Ranking free pdf
This theorem of Arrow’s and the accompanying dissertation from 1951 were judgedso valuable that in 1972 Ken Arrow was awarded the Nobel Prize in Economics. While Arrow’s four criteria seem obvious or self-evident, his result certainly is not. He provesthat it is impossible for any voting system to satisfy all four common sense criteria simultaneously. Of course, this includes all existing voting systems as well as any clever newsystems that have yet to be proposed. As a result, the Impossibility Theorem forces us tohave realistic expectations about our voting systems, and this includes the ranking systemspresented in this book. Later we also argue that some of Arrow’s requirements are lesspertinent in certain ranking settings and thus, violating an Arrow criterion carries little orno implications in such settings.

Simon French and Marilena Vassiloglou (1986). Strength of performance and examination assessment. British Journal of Mathematical and Statistical Psychology, 39, 1-14. abstract

About relative and absolute strength of performance, in an attempt to escape the Impossibility Theorem of Arrow (see Vassiloglou and French, 1982)
Simon French (1989). Statistical and decision theoretic aspects of examination assessment. Trabajos de Estadistica, 4 (1), 33-66. abstract of pdf

A. D. de Groot (1970). Some badly needed non-statistical concepts in applied psychometrics. Nederlands Tijdschrift voor de Psychologie en haar Grensgebieden, 26, 360-376. Didakometrisch en Psychometrisch Onderzoek, juni 1970. [Article in English. Partly available in html]

R. F. van Naerssen (1970). Over optimaal studeren en tentamens combineren. Openbare les. html [Tentamen model. English abstract available]

Marilena Vassiloglou and Simon French (1982). Arrow's theorem and examination assessment. British Journal of Mathematical and Statistical Psychology, 35, 183-192. https://doi.org/10.1111/j.2044-8317.1982.tb00651.x abstract

Other, recent, or recently found, publications on the subject

Rita Copeland & Ineke Sluiter (2009). Medieval grammar & rhetoric. Language arts and literary theory, AD 300-1475. Oxford University Press. Leiden - contents

Olga Weijers (1987). Terminologie des Universités au XIIIe siècle. Edizione dell' Ateneo, casella postale 7216 - 00100 Roma

Olga Weijers, (1995). La ‘disputatio’ à la Facultédes arts de Paris (1200 - 1350 environ). Esquisse d’une typologie. Turnhout: Brepols.

Olga Weijers (2002). La ‘disputatio’ dans les Facultés des arts au moyen âge. Brepols . [nog niet ingezien]

Olga Weijers (2009). Queritur utrum: recherches sur la 'disputatio' dans les universités médiévales. UBL 3467 E 16 [not yet studied]

Olga Weijers (2011). Études sur la Faculté des arts dans les universités médiévales. Recueil d’articles. info [not in UB Leiden} {not yet seen]

Olga Weijers, (1996). Le maniement du savoir. Pratiques intellectuelles à l'époque des premières universités (XIIIe-XIVe siècles). Turnhout: Brepols. info & annas-archive

Le programme d’enseignement à la Faculté des arts: règles et réalité
Les auteurs de base et les manuels
Les cours: méthodes et pratiques
La méthode de la ‘questio’
La ‘disputatio’: méthode d’enseignement et de recherche
Les exercices et les ‘sophismata’
L’élaboration de disciplines systématiques
Les examens et les cérémonies: règles et pratiques
La langue: instrument et objet d’enseignement
L’oral et l’écrit dans les universités médiévales
Les dictionnaires, érudits et pratiques
Les répertoires et les index: une mentalité nouvelle
Les classifications du savoir
Mise en page des textes universitaires; les images et les diagrammes

Jan Spoelder (2000). Prijsboeken op de Latijnse school. Een studie naar het verschijnsel prijsuitreiking en prijsboek op de Latijnse scholen in de Noordelijke Nederlanden (ca. 1585-1876), met een repertorium van wapenstempels. With a summary in English. Proefschrift Katholieke Universiteit Nijmegen. Amsterdam: APA-Holland University Press. samenvatting, tentoonstelling [gezien maar niet gelezen; veiling Burgersdijk & Niermans mei 2009]

o.a. De pedagogiek van de wedijver in historisch perspectief
Prijsboeken op de Latijnse school, het begin van een traditie
De gang van zaken bij promotie en prijsuitreiking
De prijsuitreiking in Europees perspectief en de laatste fase van haar geschiedenis in de Nederlandse contekst

Heikki Lempa (2006). Patriarchalism and Meritocracy: Evaluating Students in Late Eighteenth-Century Schnepfenthal. Paedagogica Historica, 42, 727-749. abstract

"This study probes the ways in which a meritocratic system of student evaluation emerged in German educational institutions in the late eighteenth century."
p. 736: "In fact, the Meritentafel was the final point, the public representation of a complex process of assessment. The system was constantly changed and fine-tuned but, as we know from the first two decades of the institute, it had the following four components. The first step was the collection of data and observations of students. Among the teachers [of the Schneppenthal Institute] , GutsMuths was known for carrying a notebook that he used to record students’ achievements in sports. We have more precise information about the second step, the daily assessments that took place during the last hours of the day. For their daily achievements, students received a mark (Marke) or lost one if their performance was lower than expected. For exceptional performance, studnets could also receive major recognition, a ticket (Billett). The third step was the senate that convened every week after the Sunday service to assess students’ weekly performance. Including the whole community the senate awarded a ticket or, as often happened, several tickets if the student passed scrutiny and could present all the marks reeived over the week. All these tickets were displayed with a thumbtack placed on the Meritentafel after the name of the student so that the whole community could follow and control the student's development. After collecting fifty thumbtacks, the student was entitled to be named Knight of Diligence (Orden des Fleisses)."
Wow, fantastic. Almost the same Meritentafel was used in Dessau some years earlier (see description on p. 737)
This particular assessment system was highly controversial, however, and was discontinued in Dessau. The controversy was about its intrusion into the traditional domain of the aristocracy.
This GutsMuths singlehandedly devised a kind of objective assessment system in sports and gymnastics, elaborately described by Lempa. p. 740: "For him ambition was not the privilege of a nobleman but a universal characteristic of a man, any man. Honor had become identical with merit, with the measurability of achievements and outcomes, but it also had become the very definition of masculinity."

Paul Black (2001). Dreams, Strategies and Systems: portraits of assessment past, present and future. Assessment in Education: Principles, Policy & Practice, 8, 65-85.

abstract Systems of testing and assessment are shaped in part by personalities and institutions who pursue research insights and technical innovations. Out of these they fashion ‘dreams’ which drive their efforts to improve these systems. This paper develops this perspective, whilst acknowledging that it overlaps with and complements analyses of assessment systems from social and cultural perspectives. Four different examples are considered. Two from past and current history are the growth, from an origin in IQ testing, of standardised multiple choice tests and the dream of raising standards by external testing. The other two, nascent with their influence yet to be determined, are the dream of improvement by formative assessment and the dream that recent developments in psychology can provide a basis for new and improved assessment practices.
I do have a copy available, yet?

Thomas Sullivan (2000). Merit ranking and career patterns: The Parisian faculty of theology in the late Middle Ages. In William J. Courtenay and Jürgen Miethke: Universities & schooling in Medieval society (pp. 127-163) Leiden: Brill.

Merit ranking as a phenomenon in the medieval university, according to Sullivan, ‘is little known and even less considered.’ Which is somewhat amazing, because it should be generally known that and how at the University of Leuven the merit order was quite important (see for example Bots, as cited in this webpage)
In the ‘Blanchard affair,’ the Parisian chancellor removed from his posiiton in 1386, his abuse of the chancellor's prerogative to depart from the merit order as determined by the masters themselves was one of the grievances held against him. This fact itself indicates that the order of merit was important
Sullivan uses available archival documents to study the question what typically distinguished the first from the last in each merit order. It is clear that the merit order was not necessarily one according to status or prestige, but was correlated with later academic positions and achievements. Again, no information on whether and how a high merit rank would open doors to particular positions, at least for the secular clerics (for clerics they all were, as the faculty involved is that of theology.
Sullivan apparently does not have data on typical academic achiements—such as winning disputes—that would result in high placements in the merit order. He ends the article citing Hastings Rashdall (volume 1, p. 481 in the 1936 edition of The universities in Europe in the Middle Ages), that:
"the only questions were whether [the candidate] had duly performed all the residence, exercises, and acts required by the statutes, and whether the reputation he had acquired during his university course for ability, character, and orthodoxy was such as to entitle him to the license."

June Barrow-Green (1999). 'A corrective to the spirit of too exclusively pure mathematics': Robert Smith (1689-1768) and his prizes at the Cambridge University. Annals of Science, 56, 271-316. abstract

"The Smith's Prize competition was established in Cambridge in 1768 by the will of Robert Smith (1689-1768). By fostering an interest in the study of applied mathematics, the competition contributed towards the success in mathematical physics that was to become the hallmark of Cambridge mathematics during the second half of the nineteenth century." (from the abstract)
The Smith competition was an examination that could be taken shortly after the Mathematical Tripos. The citation above indicates its importance, and its being an instance of an examination having important external effects. While Cambridge is famous for its tripos examination, for mathematicians the Smith prize was more important. This stems from the difference in character between the two examinations, at least after 1883, "the distinction between the capacity for examination work and the capacity to write an original thesis." (p. 301) Before that time the questions set, "unlike Tripos questions, they were often geared towards evincing an original or creative, as opposed to a rote-learning, approach." (p. 282)
Quite remarkable is the number of examinees: typically only a little more than two, sometimes even only two. The number of distinguished positions (prizes) was only two, therefore most of the time only those who would have a reasonable chance to win a prize did participate. Although being much different in character, most of the time the senior wrangler in the Tripos also won the first prize in the Smith competition.

R. J. Mislevy (1993). Foundations of a new test theory. In N. Frederiksen, R. J. Mislevy and I. I. Bejar Test theory for a new generation of tests. Hillsdale, NJ: Erlbaum.

p. 19: "It is only a slight exaggeration to describe the test theory that dominates educational measurement today as the application of 20^th century statistics to 19^th century psychology. "
Robert J. Mislevy (1994). Test theory reconceived. National Center for Research on Evaluation, Standards, and Student testing (CRESST) download.

Hoi K. Suen and Lan Yu (2006). Chronic consequences of high-stakes testing? Lessons from the Chinese Civil Service Exam. Comparative Education Review, 50, 46-65. http://mentalpolyphonics.com/wp-content/uploads/2007/02/suen06.pdf [dead link? 2-2009]

No summary or abstract. Let us instead take the very last words of the article: ... minimize the stakes of tests.
The authors use the Chinese experience of many centuries to warn today's politician's and psychometrists that unintended and detrimental side effects of high-stakes testing will not go away by designing clever countermeasures.
Visit the site of Hoi Suen for more papers on high-stakes testing.

David R. Hubin (1988). The history of the SAT. Submitted as an American History Ph.D. dissertation in 1988 to the University of Oregon.

Each chapter available for download on his website html.

Engelhard, G., Jr. (Ed.) (1997). Special Issue: History of Modern Psychometrics. Educational measurement: Issues and Practice. Volume 16 # 4 winter 1997.

a.o.: Traub: Classical test theory in historical perspective—Brennan: a perspective on the history of generalizability theory—Bock: A brief history of item response theory—Wright: a history of social science measurement.

Judges, A.N. (1969). The evolution of examinations. In Lauwerys, J. A., & Scanlon, D. G. (Hg.) (1969). Examinations. The World Yearbook of Education. London. p. 17-31.

Kleinschmidt, H. (2000). Understanding the Middle Ages. Woodbridge: The Boydell Press.

About coming to understand important concepts and themes in history. A sort of methodology for the study of history on a conceptual or thematic basis, the kind of approach followed by the Assessment article. Referring to Kleinschmidt solves my problem in defending the particular approach I have chosen in my study.

George F. Madaus and Thomas Kellaghan (1993). Testing as a Mechanism of Public Policy: A Brief History and Description. Measurement and Evaluation in Counseling and Development, 26, april, 6-10. [I have yet to look this one up, does anybody have a digital copy?]

abstract ERIC Author/NB Examines proposals to establish national test/testing system as administrative, policy mechanism to foster good teaching and learning, monitor student progress, help in college admission and employment decisions, and monitor progress toward national goals. Notes that history of testing as administrative mechanism suggests that linking of important rewards or sanctions to test performance is key element.

George F. Madaus & Laura M. O'Dwyer, (1999). A Short History Of Performance Assessment. Lessons learned. Phi Delta Kappan, May. [Volledige artikel als file beschikbaar (op de site van Encyclopaedia Brittannica?).] read online JSTOR [Maakt waarschijnlijk ook gebruik van materiaal in mijn 1997 (note 418), maar is een heel eigen manier om naar de geschiedenis van beoordelen te kijken. Goed stuk.]

George F. Madaus (1988). The influence of testing on the curriculum. In Laurel N. Tanner (Ed.) (1988). Critical issues in Curriculum (83-121). NSSE. [onmiddellijk daarop volgend: Daniel Tanner (1988). The textbook controversies. pp 122-147. [feedforward, backwash, washback] paywalled

Marguerite M. Clarke, George F. Madaus, Catherine L. Horn and Miguel A. Ramos (2000). Retrospective on educational testing and assessment in the 20th century. Journal of Curriculum Studies, 32, 159-181. 10.1080/002202700182691 abstract

George F. Madaus (1993). A National Testing System: Manna From Above? An Historical/Technological Perspective. Educational Assessment 1(1): 9-26 DOI: 10.1207/s15326977ea0101_2 abstract

Joan L. Richards (1988). Mathematical visions. The pursuit of geometry in Victorian England.. Academic Press.

Extracts and annotations in a dedicated page here
Among other things, Richards writes some important history of the Cambridge Mathematical Tripos, easily the most spetacular European examination in the nineteenth century. An interesting event is the abolition of its extreme ranking procedure, only in 1907. Another point of transgression from ranking to the use of grades.

Searby, Peter (1997). A history of the University of Cambridge. Volume III, 1750-1870. Cambridge: Cambridge University Press. Atheneum, 19-11-97.
Covers the period of developement of the competitive tripos examinations, giving many details concerning the examinations and developments over time.

Vocht, H. de (1951-1955). History of the foundation and the rise of the Collegium Trilingue Lovaniense 1517-1550. 4 parts. Louvain: Bibliothèque de l'Universitee, Bureau de Recueil.
The early history of a famous Leuven college, not one of the four colleges participating in the competitive Leuven examinations for artes students. Vocht gives many details on the ranking in the Promotion to Master of Arts, of many persons, also in the fifteenth century. He uses E. H. J. Reusens (1869). Promotions de la Faculté des Arts de l'Université de Louvain, 1428-1797 (1st part, 1428-1568); and the Leuven manuscript Promotiones in Facultate Artium Universitatis Lovaniensis ab anno 1500 ad annum 1659; and for example Extracts from the Sextus Liber Actorum Facultatitis Artium (itself now lost; the first (1427-1441), second (1441-1447) and fifth (1508-1511) have (partly) survived)

Ginette Delandshere (2001). Implicit Theories, Unexamined Assumptions and the Status Quo of Educational Assessment. Assessment in Education: Principles, Policy & Practice, 8, 113-133.

abstract scihub pdf

Cameron Graham and Dean Neu (2004). Standardized testing and the construction of governable persons. Journal of Curriculum Studies, 36, 295-319.

Abstract

W. Todd Rogers and Donald A. Klinger (NCME, 2007). Purposes of and Issues with the Provincial Testing Programs in Alberta. pdf

The authors use history to reflect on the purpose etcetera of large scale assessment programs.

Johann Georg Prinz von Hohenzollern und Max Liedtke (Hrsg.) (1991). Schüler-beurteilungen und Schulzeugnisse. Bad Heilbrunn/Obb.: Julius Klinkhardt. isbn 378150655X.

i.a.: Hans-Werner Fischer-Elfert: "Das Ohr eines Knaben sitzt auf seinem Rücken, er hört nur, wenn man ihn schlägt." Schülerbeurteilungen im Alten Ägypten—Alfons Rösger: Zur Schülerbeurteilung in der Antike—Hellenistische Schulwettbewerbe—Bernhard Ebneth: Schulprüfungen in Sp¨tmittelalter und Frühneuzeit an einem Beispiel: Die Beurteilung der chorales am Neuen Spital in Nürnberg—Rudolf W. Keck: Zensieren und Zertieren: Zur Kontroll- und Gratifikationspraxis der katholischen Pädagogik im jesuitischen Einflussbereich—Marianne Doerfel: Schülerbeurteilungen in der ‘Pietistenschule’ Neustadt/Aisch im 18. Jahrhundert—Hubert Buchinger: Zur Geschichte von Zensuren und Zeugnissen in der bayerischen Realschule

Laura Meilink-Hoedemaker (ongedateerd). Sollicitaties in Delft en Den Haag, 18, 19 en 20 juli 1741 doc. "brochure (ISBN 90-75806-13-2).

De brochure telt 12 bladzijden en is nog te koop. Maak 2 euro over op rekeningnummer 9125297 tnv L.J. Meilink-Hoedemaker"
Inleiding en vraagstelling In de achttiende eeuw was de functie van stadsorganist en klokkenist een begerenswaardige plaats. De musici waren in dienst van de stedelijke en de kerkelijke overheid en het salaris was goed te noemen. Wanneer zo'n plaats vacant kwam nodigden de burgemeesters en de kerkmeesters gezamenlijk de kandidaten uit voor een proefspel. In juli 1741 vonden dergelijke procedures plaats in Delft en Den Haag op twee achtereenvolgende dagen. De examens vonden plaats op drie achtereenvolgende dagen, namelijk op 18 juli 1741 in Delft en op 19 en 20 juli 1741 in Den Haag. Het is niet bekend of bij het vaststellen van deze data overleg is gepleegd.
Van beide gebeurtenissen zijn verslagen bewaard gebleven. Noch in Delft, noch in Den Haag zijn namen bekend van de leden van de sollicitatiecommissie, maar des te meer is bekend over de deelnemers.
In publicaties over deze examens was tot nu toe de aandacht gericht op de procedure en de winnaars van het vergelijkend examen. Op deze webpublicatie komen beide examens opnieuw in de schijnwerper, maar nu wordt de aandacht gericht op de afgewezen sollicitanten. In totaal dertig kandidaten waren in deze vacatures geïnteresseerd. Ze waren afkomstig uit twintig verschillende woonplaatsen. In Den Haag waren twintig deelnemers en in Delft een en twintig. Elf candidaten traden in beide steden aan. Wat is er van hen terecht gekomen?

Janet Delve (2003). The College of Preceptors and the Educational Times: Changes for British mathematics education in the mid-nineteenth century

Historia Mathematica 30, 140-172. pdf

abstract Founded in Britain in 1846 to standardize the teaching profession, the College of Preceptors is little known today. The College was closely linked to the Educational Times (hereafter ET), a journal of "Education, Science and Literature " launched in 1847. This paper examines in detail a sample of College examinations, articles on mathematics education, and reviews of mathematics textbooks that appeared in the ET. Key figures in the mathematical discussion were William Whewell, Augustus De Morgan, and Thomas Tate. The paper shows how the discourse on mathematics education led to the introduction of entrance examinations for Oxford and Cambridge Universities.

Andrew Warwick (2003). Masters of Theory: Cambridge and the Rise of Mathematical Physics. University of Chicago Press.

Reviewed by Kathryn M. Olesko, Training for the tripos. American Scientist May-June 2004 html
- "Yet although copies of the Tripos examinations themselves are abundantly available, the answers—the real evidence for knowledge-in-the-making—are less so. Warwick successfully reconstructs the content and more so the experience of working through those answers by triangulating from sources that touch on the learning experience: diaries, correspondence, autobiographies and textbooks. Likewise, although Warwick has only two sets of coaching notes (Stephen Parkinson's from the 1850s and Robert Webb's from the 1890s), his wide-ranging knowledge of Cambridge culture permits him to extrapolate general observations about the coaching and training experience as a whole from his sources."
Reviewed by Ivor Grattan-Guinness (2004). The Tripos in a Century of Mathematical Physics at Cambridge University. SIAM News, 37, Number 4,
- "The many illustrations include not only portraits of some Wranglers (famous later or not) but also pertinent lecture notes, research publications, and Tripos examination questions and even candidates’ attempts to answer them. This last, particularly welcome, source is one result of the author's extensive use of manuscript collections."

More material on the mathematical tripos in the Wikipedia html

In the Elibron Classic series a lot of nineteenth century books on mathematics as well as on the examinations themselves have been republished recently. See here. For example:

William Walton, Charles Mackenzie. Solutions of the Problems and Riders Proposed in the Senate-House Examination for 1854. By the Moderators and Examiners. With an Appendix, Containing the Examination Papers in Full.
Elibron Classics, 2002, 238 pages.
ISBN 140216131X paperback
ISBN 1402128185 hardcover
Replica of 1854 edition by Macmillan and Co., Cambridge.

Lynn Thorndike (1940). Elementary and Secondary Education in the Middle Ages. Speculum, 15, 400-408. pdf JStor

In this paper I wish to uphold the thesis that in the period of developed mediaeval culture elementary and even secondary education was fairly widespread and general.
Villani tells us in his Chronicle that in Florence in 1283 there were between eight and ten thousand boys and girls learning to read, while six abacus schools (for training in reckoning preparatory to a business career) had between one thousand and twelve hundred attending, and four great or high schools for grammar and logic had from 550 to 600 pupils.
Recently I have been reading the rotograph of an anonymous and, I believe, hitherto unnoticed treatise on education in a Latin manuscript at the Vatican. Its author would have the boy begin the study of grammar at the age of seven in the springtime of the year and continue it as his chief study, with some music and arithmetic on the side, until the end of his fourteenth year, 'when the light of reason begins to shine.' The next septennium until the age of twenty-one would then be occupied with logic, rhetoric and an introduction to astronomy, and the third period of seven years to twenty-eight with natural science, metaphysics and Euclid, after which in subsequent years might come law or theology. In the case of the boys from seven to fourteen our anonymous author is solicitous to protect their tender limbs and susceptibility to cold and heat. He notes that in many northern regions two different classrooms are provided for summer and winter. Those with physical defects or contagious diseases should not be admitted. The complexion of the individual pupil should be carefully considered and one of the sanguine temperament treated in an entirely different fashion from one given to melancholy. Their relative capacity for learning should also be marked early, since some are bright, some even brighter, and some exceedingly bright, while others are dull, others duller yet, and others so stupid that the teacher despairs of them. All, however, should have a recess from study or a play-hour for sports and games, in order to raise their spirits, stir their blood, and recreate their minds.

Christopher Stray (2001). The Shift from Oral to Written Examination: Cambridge and Oxford 1700—1900. Assessment in Education: Principles, Policy and Practice, 8, 33-50

academia.edu
possibly the same article, or a revised version, reprinted in History of Universities, vol 20; PART 2, 76-130

Lee S. Shulman (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15 #2, 4-14. http://www.fisica.uniud.it/URDF/masterDidSciUD/materiali/pdf/Shulman_1986.pdf

Part of the article treats the subject historically.

Schoengen, M. (1898). Die Schule von Zwolle von ihren Anfängen bis zur Einführung der Reformation (1582). I. Von der den Anfängen bis zu dem Auftreten des Humanismus. Freiburg (Schweiz). html

links

Pictura Paedagogica Online, der Digitalen Bildarchiv zur Bildungsgeschichte. "Der Bestand umfasst derzeit mehrere 10.000 Buchillustrationen aus der Zeit vom Mittelalter bis 1850 sowie historische Postkarten des Zeitraums 1870 bis 1933." site

The Encyclopedia Britannica 1911 on examinations: http://encyclopedia.jrank.org/EUD_FAT/EXAMINATIONS.html

Francis Galton writes on the mathematical tripos in his Hereditary genius, see html [In fact, the whole book is made available there] Especially watch the fantastic figures in the table on page 19 here.

to be researched

In a way, the article itself is a small catalogue of theme's deserving further research.

May, 2006. A crucial historical development in assessment is where, next to assessment as an integral part of the instructional process, a kind of functional assessment develops, in connection with the universal lisence to teach everywhere. It would be a nice thing to have a thorough understanding of what is happening here, and to connect it with other developments in culture and society, especially the end of feodalism and the rise of more or less autonomous cities.
From this moment in history on, assessment is a schizophrenic thing, a beast with two souls, the two souls fighting each other in all possible ways, involving many actors on alle levels in society.
Functionalism itself is not the problem, remember Charles the Great establishing schools because of the need for clerks able to read and write, to say the least. It is the functionalism of this functionalism—the diploma disease—that introduces stresses in education.

Peter K. Bol (1997). Examinations and orthodoxies 1070 and 1313 compared. In Theodore Huters, R. Bin Wong & Pauline Yu (Eds): Culture & state in Chinese history. Conventions, accommodations, and critiques. (29-57) Stanford University Press.

Pierre Bourdieu et Jean-Claude Passeron (1970). La reproduction. Éléments pour une théorie du système d'enseignement. Paris: Les Éditions de Minuit.

wiki
ch. 1: Capital culturel et communication pédagogique (Inégalités devant la sélection et inégalités de sélection - De la logique du système à la logique de ses transformations)
ch. 2: Tradition lettrée et conservation sociale
ch. 3: Elimination et sélection (L'Examen dans la structure et l'histoire du système d'enseignement - Examen et élimination sans examen - Sé technique et sélection sociale)
ch. 4: La dépendance par l'indépendance))

J. McK. Cattell (1890). Mental tests and measurement. Mind, 15, 373-381. html

first paragraph "Psychology cannot attain the certainty and exactness of the physical sciences, unless it rests on a foundation of experiment and measurement. A step in this direction could be made by applying a series of mental tests and measurements to a large number of individuals. The results would be of considerable scientific value in discovering the constancy of mental processes, their interdependence, and their variation under different circumstances. Individuals, besides, would find their tests interesting, and, perhaps, useful in regard to training, mode of life or indication of disease. The scientific and practical value of such tests would be much increased should a uniform system be adopted, so that determinations made at different times and places could be compared and combined. With a view to obtaining agreement among those interested, I venture to suggest the following series of tests and measurements, together with methods of making them."
The site Classics in the history of psychology contains more full texts of key publication in psychological (and educational) measurement. The one by Cattell in a way marks the beginnen of psychometrics as an industry.

Wainer, H. (1987). The first four milennia of mental testing; from ancient China to the computer age. Educational Testing ServiceResearch report 87-34. 6 pages. [also: The Score, 13, 4-5, 11-13, April, 1990.] [ I have seen neither. ETS does not offer an online copy. Does anyone care to send me a pdf copy?]

May, 2006. It is an intentional omission in the 1997 article to leave out developments in the 20th century, because they are mainly of another kind, connected with the explosion in educational participation, and processes of assessment automation.
An article on these 20th century developments will have to deal with the spectacular development of psychological testing techniques and the way especially American education almost immediately gets infected with these techniques and their accompanying philosophies. Early in the twentieth century the crucial and seemingly irreversible turn taken in assessment is that assessment is a kind of measurement, and it is the better served the less subjective the measurement devices are. This philosophy goes against the grain of extensive experience in education that actors in education are strongly influenced in their behavior by the kind of examinations etcetera that society (politicis) confronts them with.

Ellen Condliffe Lagemann (2000). An elusive science: The troubling history of education research. University of Chicago Press. site

Agnes M. Lathe (1889). Written examinations—their abuse, and their use. Education; a monthly magazine devoted to the science, art, philosophy and literature of education, volume 9, 452-456. OCR of text Reprinted in John A. Laska and Tina Juarez (Eds) (1992). Grading and marking in American schools. Two centuries of debate. Springfield, Illinois: Thomas. contents

Daniel Starch (1916). Educational measurements. New York: Macmillan. 10Mb pdf

"If there are any products or by-products of education which are too subtle to be distinguished or judged as existing in greater or less amounts, or as having higher or lower quality, we may be suspicious of their actual existence. Any quality or ability of human nature that is detectable is also measurable. It remains only to discover more and more accurate means of measurement."
This is a spectacularly naive approach to assessment in education, justifying the use of psychometric methods for assessment also. Echo's of this harmful philosopy abound in the educational measurement literature, take for example a linguistically sophisticatd one:
John R. Bormuth (1970). On the theory of achievement test items. Chicago: University of Chicago Press.
- p. 81-2: "It is difficult to overemphasize the importance to instruction of research which attempts to analyze the cognitive processes underlying responses to item types. It is commonly accepted that, in many subject-matter areas, the learning of the knowledge explicitly taught by the instructional programs is less valued as a learning outcome than learning the complex cognitive processes by which that and other knowledge is discovered, evaluated, organized, and applied. Achievement test items which can test these complex processes are useful not only for evaluating the student's achievement and the effectiveness of his instruction, but also for providing the instructional exercises which force him to practice those processes. But few of these benefits can be reaped until we identify exactly what it is that the different classes of items test."

The point to make now is the following. Early in the 20th century the testing virus escapes from the psychological laboraties and infects the educational establishment. Philosophy and technology of testing are wholeheartedly taken in by the educational community. Large investments, many people en institutions are involved. What is happening is a lock-in on an enormous scale: educational assessment has become educational measurement using standardized achievement tests, or at least teacher-made tests resembling such standardized tests.
What is a lock-in? The QWERTY-keyboard is an example of a lock-in on a particular keyboard layout, an early pragmatic layout that later on proves to be the final one because all attempts to introduce better layouts fail. The VHS video technology is another example of an inferior technology winning the market against the superior Video 2000 technology developed by Philips (later Philips developed cd-laser-technology, earning huge profits on the patent).

A corollary of the lock-in hypothesis concerning educational measurement is the following. In the course of the 20th century there has been a flurry of research on educational testing, resulting in more efficient testing formats, more refined standards of (achievement) testing, highly sophisticated statistical techniques used in developing and evaluating tests, online testing technology, etcetera. Progressively better techniques and tests have become available, while the funding philosophies of this educational measurement, if they ever have been there in a form more elaborated than the passages quoted above, have stayed as inappropriate to the educational process as ever. It is highly exceptional for educational measurement specialists to rise to the issue, and propose new approaches recogizing the way assessments in education truly function. Names: Popham (US), Van Naerssen (Netherlands, html). Themes: backlash, feedforward. Research:

H. Becker, B. Geer and E. C. Hughes (1968). Making the grade: the academic side of college life. John Wiley. site

James S. Coleman (1990). Foundations of social theory. Harvard University Press. site

Zie ook Peter V. Marsden (2005). The sociology of James S. Coleman pdf

James S. Coleman (1-6-1994) (concept) What goes on in school: a student's perspective. paper 3/25/94; rev. 6/1/94. html

See numerous entries, articles etc. on this website www.benwilbrink.nl

Ansgar Allen (2012 online first). The examined life: On the formation of souls and schooling. American Educational Research Journal abstract

“ The purpose of this article is largely rhetorical. It seeks to demonstrate that even a relatively quick survey of the development of modern examination must cast doubt on contemporary efforts to ameliorate its effects. Specifically, it seeks to break down the current tendency in education to adjudicate between good and bad examining practices, between those examining techniques that are seen as oppressive, impersonal, and excessively mechanistic and those that are celebrated for their flexibility and attention to the needs of the child. It is argued that both summative and formative traditions in assessment help perpetuate in their respective techniques processes of subject formation that have as their object the construction of selves amenable to government.”

Dominique Julia (1994): Le choix des professeurs en France: vocation ou concours? 1700–1850. Paedagogica Historica: International Journal of the History of Education, 30, 175-205. abstract

The ‘agrégation’, installed at the end of the 18th century.

- Le choix des professeurs dans les congrégations enseignantes et à l'Université de Paris au XVIIle siècle
- La création du concours d'agrégation à la Faculté des Arts de Paris en 1766
- L'agrégation dans la première moitié du XIXe siècle

Dominiqu Julia (Ed.) (1994). Aux sources de la compétence professionelle: critères scolaires et classements sociaux dans les carrières intellectuelles en Europe, XVIIe - XIXe siècles. Paedagogica historica, International Journal of the History of Education, 30 #1, 9-459. contents

Gaat over meritocratie, het ontstaan daarvan, maar vooral (wat ik daar al zo vluchtig van heb gezien) hoe de schijn achteraf van meritocratische methoden bedriegt.

o.a.:
Dominique Julia: Présentation. 9-11 preview
Willem Frijhoff.: Inspiration, instruction, compétence? Questions autour de la sélection des pasteurs réformés aux Pays-Bas, XVIe-XVIIe siècles. 13-38. abstract
Antonio Vinao Frago: Les origines du corps professoral en Espagne: les Reales Estudios de San Isidro, 1770-1808. 119-174. abstract
Dominique Julia, D.: Le choix des professeurs en France: vocation ou concours? 1700-1850. 175- 206. abstract
Marina Roggero: Le métier de maître d’école. Problèmes et transformation dans les états italiens. 207-230. abstract
Gian Paolo Brizzi: Aux origines du système de mérite. Formation, recrutement et sélection des officiers de chancellerie de quelques grandes magistratures publiques italiennes, XVIIe-XVIIIe siècles. 249-266. abstract
Bernd Wunder: Les hauts fonctionnaires en Bade pendant la première moitié du XIXe siècle: recrutement, carrière et origine sociale. 267-280.abstract
Hannes Siegrist: Formal knowledge, public trust and state lawyers in Germany, Italy and Switzerland in the early 19th century. 325-340. abstract
Vincenzo Ferrone.: Les mécanismes de formation des élites de la maison de Savoie. Recrutement et sélection dans les écoles militaires du Piémont au XVIIIe siècle. 341-370. abstract
Cornelis Disco: Making the grade in Dutch civil engineering, 1780-1920. 371-410. abstract
Antoine Picon: De l’ingénieur-artiste au tchnologue: prcédures de sélection et notation des élèves à l’école des Ponts et Chaussées 1747-1851. 411-452. abstract
Dominique Julia: Conclusion. 453-460. preview

Rosalind Pritchard (2006). Trends in the restructuring of German universities. Comparative Education Review, 50, 90112. abstract

Robert Nelson & Phillip Dawson (2014). A contribution to the history of assessment: how a conversation simulator redeems Socratic method. Assessment & Evaluation in Higher Education preprint

Toby E. Huff (1993). The rise of early modern science. Islam, China, and the West. Cambridge University Press. info

The Arabic heritage in chapter 5, uses Makdisi. Of special interest: the section Madrasas: Islamic colleges. The method of disputation in Islam was directed at fault finding in the work of the master, not to develop new insights as was the case in the Western disputation method. Quite interesting point.

As training designed to make students of law proficient as muftis and jurisconsults qualified to iddue legal opinions, no doubt the system of disputations worked well in preservng the status quo and in constantly weeding out suspect opinions. It was even essential to Islam, acording to Professor Makdisi, because the “method was part and parcel of the Islamic orthodox proecess for determining orthodoxy.”
p. 159-60, citing Makdisi (1974). The scholastic method in medieval education: An inquiry into its origin in law and theology. Speculum, 49, 649.

Jack Schneider & Ethan Hutt , Journal of Curriculum Studies (2013): Making the grade: a history of the A-F marking scheme, Journal of Curriculum Studies, DOI: 10.1080/00220272.2013.790480 pdf

Early American grading systems owed much to the European model—focusing on constant competition, the awarding of prizes and rank order competition—and were largely used for pedagogical purposes. The introduction of mass compulsory schooling, however, changed things dramatically. Mass schooling placed the school at the centre of a society increasingly dominated by complex bureaucratic institutions, including the school system itself. Consequently, grading systems that had traditionally tended towards the local and the idiosyncratic, and which were designed for internal communication among teachers and families attached to a given school, became forms of external communication and organization as well. Increasingly, reformers saw grades as tools for system-building rather than as pedagogical devices–—a common language for communication about learning outcomes.
If grades were to communicate beyond the school site, marking systems had to be made more 'legible', more universal and more standardized. Driven by policy elites and eager administrators, grading systems expanded, reproduced and evolved. Without a central authority to mandate standardization or facilitate communication, they advanced in fits and starts varying across the regions and levels of schooling. Yet by the turn of the twentieth century, teachers, administrators, parents, college admissions officers and employers were turning to grades for basic information about academic aptitude and accomplishment.

Susan M. Brookhart, Thomas R. Guskey, Alex J. Bowers, James H. McMillan, Jeffrey K. Smith, Lisa F. Smith, Michael T. Stevens, Megan E. Welsh (2016). A Century of Grading Research. Meaning and Value in the Most Common Educational Measure. Review of Educational Research, 86, 803-848. free access

Robert Nelson & Phillip Dawson (2017). Competition, education and assessment: connecting history with recent scholarship. Assessment & Evaluation in Higher Education, 42, 304-315. abstract

David Tyack and William Tobin (1994). The "Grammar" of Schooling: Why Has it Been so Hard to Change? American Educational Research Journal, 31, 453-479. open access

Richard J. Shavelson (2007). A Brief History of Student Learning Assessment. How We Got Where We Are and a Proposal for Where to Go Next. Association of American Colleges and Universities. pdf

The fall of 1949 saw a landmark in student learning assessment: in a shift from testing content to testing general reasoning, ETS introduced a GRE Aptitude Test with the kind of verbal and quantitative sections we see today. Then, in 1952, it introduced the now standard scale for reporting scores (the normal distribution with mean 500 and standard deviation 100). In 1954, ETS continued the shift away from content and toward general reasoning by replacing both the Profile Tests and the Tests of General Education with the “Area Tests,” which served as a means of assessing broad outcomes of the liberal arts. The Area Tests focused on academic majors in the social and natural sciences and the humanities. They emphasized reading comprehension, understanding, and interpretation, often providing requisite content knowledge “because of the differences among institutions with regard to curriculum and the differences among students with regard to specific course selection” (ETS 1966, 3).
Richard J. Shavelson (2007). A Brief History of Student Learning Assessment. How We Got Where We Are and a Proposal for Where to Go Next. Association of American Colleges and Universities.

Yuval Noah Harari (2017). Homo Deus. A brief history of tomorrow. Vintage. 9781784703036 review

Refers to my article (note 6 in chapter 4). That is quite extraordinary. Thanks Thijs Jansen for signalling the reference!

Cameron Graham & Dean Neu (2004). Standardized testing and the construction of governable persons. Curriculum Studies, 36, 295-319. PDF

Standardized testing has gone through periods of ascendancy and decline in education (Wilbrink 1997), and Alberta is shown to be no exception.

Andrew P. Huddleston & Elizabeth C. Rockwell (). Assessment for the Masses: A Historical Critique of High-Stakes Testing in Reading Texas Journal of Literacy Education, 3 pdf

Karen R. Diller and Sue F. Phelps (2008). Learning Outcomes, Portfolios, and Rubrics, Oh My! Authentic Assessment of an Information Literacy Program Libraries and the Academy, Vol. 8, No. 1 (2008), pp. 75–89. pdf

History and Literature Review
In the Middle Ages at the University of Paris students studied under a master who set daily exercises for the students to learn grammar and philosophy before being admitted to the higher schools of medicine or law. Masters put one student in competition with another as a learning exercise; and, in this way, assessment happened daily, ending with one student being praised and the other ridiculed. Ultimately, the master nominated his student for examinations, which were public events in which the candidate was questioned and required to deliver a lecture on a text given him only hours before. The examination was designed so that the student could demonstrate his skill at lecturing, the skill for which he was being trained. These were the first authentic assessments of student learning.2 In the following centuries, as the number of students to be assessed increased dramatically, and the need for written proof of accomplishment became more important, assessment became more about knowledge gain and less about application of that knowledge. Examinations grew, and more authentic methods of assessing a student’s abilities to apply knowledge became rare.
2. Ben Wilbrink, “Assessment in Historical Perspective,” Studies in Educational Evaluation 23, 1 (1997): 31–48

Annique Smeding (2013). Reducing the Socio-Economic Status Achievement Gap at University by Promoting Mastery-Oriented Assessment. Plos One open access & tweet

Usually, the competition-based selection process favors resources-endowed high-SES students [5], [6], and indeed historical analyses show that current assessment practices were originally developed with the purpose of serving high-status groups [7].
[7] Willbrink B (1997) Assessment in historical perspective. Stud Educ Eval 23: 31–48.

These three studies provide convergent support for a novel approach to the SES achievement gap by focusing on the meaning of assessment practices that are used at most universities, rather than on individual factors. Using different but complementary methods, the three studies demonstrated that a focus on mastery goals in the assessment process made it possible to reduce the SES achievement gap at University. For the first time, empirical data support the idea that low-SES students can perform as well as high-SES students if they are led to understand assessment as part of the learning process rather than as a way to compare students to each other and select the best of them. Particularly the third study, which utilized an experimental design, revealed that this could be achieved with interventions that rely upon simple, albeit theory-driven instructions. Moreover, the present studies contribute to the achievement goals literature by showing that a focus on learning-based mastery goals during assessment is particularly beneficial for low-SES students. Finally, our findings may also be understood in light of the social identity threat literature [30]. Indeed, the present research suggests that some of the structural characteristics of academic functioning in terms of assessment practices may favor (i.e., selection orientation) versus reduce (i.e., mastery orientation) social identity threat for educationally-stigmatized individuals (i.e., low-SES students). Future research may investigate whether some of the mechanisms accounting for threat effects on performance (e.g., stress responses, working memory impairment; [31]) are also relevant for explaining the present findings.

Most of the time, assessment at University is associated with normative grades, ranking, and selection, but is rarely used as a genuine tool for education [32]. As our results suggest, classical performance-oriented evaluations are certainly very useful and particularly efficient in serving the selection function and maintaining the status quo [33]–[35]. However, the present research showed that mastery-oriented evaluations are far more efficient in serving the educational function and make University a place where success does not depend upon one’s social status.
General Discussion

Darren Grant & William B. Green (2013). Grades as incentives. Empirical incentives, 44, 1563-1592 pdf

Educational assessment originated toward the end of the medieval period, in order to group students within schools on the basis of mastery. It spread across Europe throughout the Renaissance, as the state tried to improve the quality of its civil servants, who were increasingly selected on the basis of merit instead of social class (Wilbrink 1997).

Matthew Militello & Joseph B. Berger (2010 forthcoming). Understanding Educational Leadership in Northwest China. International Journal of Leadership in Education. concept

However, education must be studied first in its historical context in order to understand how a particular practice was a solution to problems and tasks as perceived by historical actors (Wilbrink 1997).

Anne M. Dean (1998). Defining and achieving university student success: Faculty and student perceptions. Thesis. pdf

As the educative process developed, the measure used to determine if knowledge and understanding had been attained became gradually more formal and standardized (Wilbrink, 1997).
Both because the population of students is growing and possibly because of a continuing response to established societal norms, the measures of success throughout the educational system also became more objective. In time, grades became the mark of evaluation that was preferred, both by teachers and by their students. Standardized examinations became the measure by which the masses of students were organized by intelligence level, subject proficiency, and preparedness for college education (Wilbrink, 1997).

Frédérique Autin,* Anatolia Batruch & Fabrizio Butera (2015). Social justice in education: how the function of selection in educational institutions predicts support for (non)egalitarian assessment practices. Front Psychol. 2015; 6: 707. open access

The last sentence in the quote does not follow from my research: competition for the best ranks had a strong meritocratic flavour, already in Leuven early in the 16th century, and possibly much earlier.

Some historical and sociological analyses have proposed that normative assessment through testing and competitive examinations is rooted in traditions, methods, conceptions of knowledge and standards that serve the dominant groups (Wilbrink, 1997; Delandshere, 2001; Leathwood, 2005; Carson, 2007). The rankings and competence certification produced by normative assessment would thus participate to maintain the pre-existing social order.

Eva L. Baker, Gregory K. W. K. Chung & Li Cai (2016). Assessment Gaze, Refraction, and Blur: The Course of Achievement Testing in the Past 100 Years. Review of Research in Education March 2016, Vol. 40, pp. 94–142 DOI: 10.3102/0091732X16679806 open access

Robert J. Gregory (2004). The history of psychological testing. Ch. 1. in Robert J. Gregory: Psychological Testing: History, Principles, and Applications. open access

Dylan Wiliam (2017). Learning and assessment: a long and winding road? Assessment in Education: Principles, Policy & Practice. Volume 24, 2017 - Issue 3: Assessment and Learning open access

Deutsch, M. (1979). Educational and distributive justice: some reflections on grading systems. AP, 34, 391-401. 10.1037/0003-066X.34.5.391 abstract

Geisinger (1982, p. 1147): "His views flow from a comparative approach to grading. He describes grades as artificially scarce rewards, which are allocated on the basis of ability, drive, and character to fill the societal purposes of motivating and socializing children. He argues that comparative grading probably developed to foster a belief in the competitive, meritocratic ideology needed to legitimize socioeconomic inequalities. In contrast, he discusses other value systems through which grades could be zwarded and suggests that grades should serve society by helping students gradually make the transition from the family to the world of work."

R. J. Montgomery (1965). Examinations, An account of their evolution as administrative devices in England. London: Longmans Green.

Especially Chapter nine: Some reflections on the history of examinations. 242-270:
1. Examinations as instruments for a purpose
2. From expediency to principle
3. The shape of society
4. Some disadvantages
5. Other methods of examination
6. Who controls the system?
7. The right means of control

John Roach (1971). Public Examinations in England 1850-1900. Cambridge UP. https://doi.org/10.1017/CBO9780511896309 [University Library Leiden Closed Stack 3 1923 E 36] info

PART I - THE COMPETITIVE PRINCIPLE ESTABLISHED pp 1-2
1. Patronage and competition pp 3-34 https://doi.org/10.1017/CBO9780511896309.003
2. Middle-class education pp 35-55 https://doi.org/10.1017/CBO9780511896309.004
3. Examinations and schools – to 1857 pp 56-74 https://doi.org/10.1017/CBO9780511896309.005
  PART II - THE OXFORD AND CAMBRIDGE LOCALS AND NATIONAL EDUCATION, 1857–1900 pp 75-76
4. Beginnings, 1857–1860 pp 77-102 https://doi.org/10.1017/CBO9780511896309.006
5. The education of women pp 103-135 https://doi.org/10.1017/CBO9780511896309.007
6. Secondary schools and their studies pp 136-163 https://doi.org/10.1017/CBO9780511896309.008
7. The examiners and the examined pp 164-188 https://doi.org/10.1017/CBO9780511896309.009
  PART III - THE PUBLIC CONTEXT, 1855–1900 pp 189-190
8. The Civil Service Examinations: to 1870 pp 191-209 https://doi.org/10.1017/CBO9780511896309.010
9. The Civil Service Examinations: after 1870 pp 210-228 https://doi.org/10.1017/CBO9780511896309.011
10. School Examinations – from Taunton to Bryce pp 229-256 https://doi.org/10.1017/CBO9780511896309.012
11. Critics and criticisms pp 257-286 https://doi.org/10.1017/CBO9780511896309.013

John B. Carroll (1981): The measurement of intelligence pp 29-121 in: Robert J. Sternberg (Ed.) (1982). Handbook of human intelligence. Cambridge University Press. isbn 0521296870

Not having the slightest idea of the history of assessment before about 1900 is a serious problem for researchers like John Carroll, witness his comment (p. 32):

It might be interesting to trace the antecedents of the theory of individual differences in mental ability, and the methods of asssing these differences, in ancient, medieval, and early modern times; but there does not seem to be available any thoroughgoing study of these antecedents. From the works just cited, which include brief accounts of theories and trends prior to 1869, one gets the impression that the concept of individual differences in mental ability was very slow to develop, and the procedures for assessing such differences were very crude from the standpoint of current technology.p. 32

Brookhart, Susan M.; Guskey, Thomas R.; Bowers, Alex J.; McMillan, James H.; Smith, Jeffrey K.; Smith, Lisa F.; Stevens, Michael T.; and Welsh, Megan E., "A Century of Grading Research: Meaning and Value in the Most Common Educational Measure" (2016). Educational, School, and Counseling Psychology Faculty Publications. 2. https://uknowledge.uky.edu/edp_facpub/ concept

Maurice Crosland & Antonio Gálvez (1989). The emergence of research grants within the prize system of the French Academy of Sciences, 1795-1914. Social Studies of Science, 19, 71-99 https://doi.org/10.1177/030631289019001002. Reprinted in Maurice Crosland (1995). Studies in the culture of science in France and Britain since the enlightenment. Aldershot: Variorum. isbn 0860784983 abstract

chris stray (2012). Rank (dis)order in Cambridge 1753-1909: the Wooden Spoon History of Universities 2012, 26 (1): 163-201 academia.edu

Scott T. Meier (1994). The chronic crisis in psychological measurement and assessment. A historical survey. Academic Press. isbn 0124884407 info

Key.

John White (2006). Who needs examinations? A story of climbing ladders and dodging snakes academia.edu

John White (2011) The inventionof the secondary curriculum. Palgrave. pdf academia.edu

" target='_blank'>

Richard P. Phelps (2020). Down the Memory Hole: Evidence on Educational Testing [via https://twitter.com/RichardPPhelps/status/1451713734163017733]

Susan Embretson (2001). The second century of ability testing: some predictions and speculations. ETS brochure The ETS Policy Information Center is pleased to publish the seventh annual William H. Angoff Memorial Lecture, given at ETS on January 11, 2001, by Dr. Susan Embretson of the University of Kansas. open

Paul R. Deslandes (2002). Competitive examinations and the culture of masculinity in Oxbridge undergraduate life, 1850-1920. 10.1111/j.1748-5959.2002.tb00010.x extract

Weijers, O. (2005). Quelques observations sur les divers emplois du terme disputatio. Itinéraires de La Raison, 35–48. doi:10.1484/m.tema-eb.4.00194 scihub

Benjamin A. Elman (2000). A cultural history of civil examinations in late Imperial China. University of California Press 9780520215092 info

"In this multidimensional analysis, Benjamin A. Elman uses over a thousand newly available examination records from the Yuan, Ming, and Ch'ing dynasties, 1315-1904, to explore the social, political, and cultural dimensions of the civil examination system, one of the most important institutions in Chinese history. For over five hundred years, the most important positions within the dynastic government were usually filled through these difficult examinations, and every other year some one to two million people from all levels of society attempted them."

Robert J. Mislevy (2020). Statistical Theoreticians and Educational Assessment: Comments on Shelby Haberman’s NCME Career Contributions Award. DOI: 10.1111/jedm.12280 Journal of Educational Measurement scihub

Dit is een interessante uiteenzetting over beoordelen en onderwijs in de 20e eeuw. Voor mij bemoedigend is is dat de eerste referentie in dit artikel is naar Wilbrink (1997). En inderdaad, mijn geschiedenis stopt nadrukkelijk bij 1900, omdat de 20e eeuw gekenmerkt is door inzet van statistische en psychologische methodieken bij het beoordelen in het onderwijs. Twitter draadje: https://twitter.com/benwilbrink/status/1559139008327913472

Joseph C. M. Wachelder (1992). Universiteit tussen vorming en opleiding. De modernisering van de Nederlandse universiteiten in de negentiende eeuw. Hilversum: Verloren. isbn 9065503528 : open access

Ik heb de samenvatting even snel gelezen. In de 19e eeuw vindt een overgang plaats van een standenuniversiteit waarin nauwelijks werd beoordeeld, naar een op kennisoverdracht gerichte universiteit waarin examens een rol gingen spelen. Eigenlijk dus een paragraaf die ik aan mijn 1997 had kunnen toevoegen? Ik moet Wachelderer toc nog maar eens een keer op nalezen, ook omdat wat aan de universiten gebeurde, impact op het vo moet hebben gehad via leraren aan hbs en gymnasium die immers universitair waren opgeleid.

Brian E. Clauser, Michael B. Bunch (Eds.) (2021). The History of Educational Measurement. Key Advancements in Theory, Policy, and Practice. Routledge. info (The first chapter is in the preview: the very early history in the US) of read online

Etan Hutt & Jack Schneider (2018). A History of Achievement Testing in the United States. Or: Explaining the Persistence of Inadequacy. Teachers College Record pdf

April 2025 \ contact ben at at at benwilbrink.nl

http://www.benwilbrink.nl/publicaties/97AssessmentStEE.htm http://goo.gl/PQd1t

Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities html

Studies in Educational Evaluation, 1997, 23, 31-48. pdf (Based on earlier work presented at Earli congress 1995 Nijmegen, a.o. paper) (intended to be the first part of my dissertation [in Dutch]: 1995 Leren waarderen SVO-project met notenapparaat)

concept

Assessment in historical perspective

Ben Wilbrink

abstract

Assessment in historical perspective

What does it mean to know something ?

Joan Cele: 14th century originator of Western style education

Examinations at the medieval university of Paris

The disputation: a lost examination format

Punishment or reward? Ranking and marking systems

Competition and the state

England

France

Prussia

Chinese mandarin examinations a model for the West

Discussion

References

Correspondence

Previous work

Related issue: if grading really is a form of ranking, then what is a GPA?

Other, recent, or recently found, publications on the subject

links

to be researched

Studies in Educational Evaluation, 1997, 23, 31-48. pdf
(Based on earlier work presented at Earli congress 1995 Nijmegen, a.o. paper) (intended to be the first part of my dissertation [in Dutch]: 1995 Leren waarderen SVO-project met notenapparaat)