About

Welcome to the online Bank of Student Errors (BoSE). The website provides free access to a database of observed student errors originally compiled by the participants of the Cross-Institutional Testing Project in 1998. The data have been collected by about 30 teacher colleagues around Hungary, both at higher education institutions (universities and colleges) and at secondary schools.

Purpose
This bank of student (learner) errors has been compiled with the sole purpose to aid the design of use-of-English types of test. The bank will be most useful in the construction of proficiency types of test: in-house filter tests and screen tests etc. As will be seen below, many of our decisions were motivated by trying to make sure that the bank has the widest possible accessibility.

Who is the bank for?
The bank is for use by colleagues whose responsibilities include the design of use-of-English types of test. The bank is a web-based database, which any vistior can search with the help of our search page. The bank is open for further submissions. To submit a new error sample please navigate to the submit page. Note, however, that any samples submitted will be accepted to the bank only after a group of moderators have approved this. If you are interested in becoming a moderator please contact the BoSE development team.

Why the bank is useful: its rationale
Essentially, the bank is designed to increase the validity of the items that the test constructor designs. Quite often, test constructors rely on their own insights, recollections and experience, which they have built up over the years to construct items. Insights, recollections and experiences are also filtered through the test constructor's training. Of course, there is nothing wrong with relying on one's own insights, recollections and experience in the construction of use-of-English items, but whether those kinds of knowledge are enough for the construction of valid multiple choice and other item types is questionable. That way of constructing items has the limitation of sustaining a rather indirect connection between what students really do and what item the test constructor writes. Instead, the bank follows a data-driven approach, in which there is a more direct link between items and observed student language. Hence the possibility of constructing a more valid test. As we know, in multiple choice questions special attention must be paid to the quality of the distractors. The distractors must do their job i.e. they must distract. Similarly, the test constructor must be sure that any other restricted response type item (true/false, spot the error etc.) will work according to his/her intentions. The rationale of this bank posits that if the test constructor works from real observed error data, the chances that valid items are produced will be higher. The experience of the editors strongly suggests that with even a lot of experience it is difficult to correctly anticipate how a particular item will work in practice, whether a particular distractor he/she creates sounds like anything realistic to the test taker or not. Indeed, an inspection of the present database might reveal to the user that there are student errors that the user would never have dreamt up, while constructing a test.

Why raw material and not ready-made items?
The database includes student errors, as they were observed. The items in the bank have not been changed in any way. The rationale for the bank called for a bank of testing 'raw material', not ready-made items. The reasons for this were both professional and financial. The feasibility of ready-made item banks is questionable in a context of higher-education which is characterised by a large variety of item types. The construction of ready-made item banks is terribly expensive. Thus, a certain point had to be found in the test construction process at which the variety of item types various institutions use is not relevant yet, and the exorbitant costs of developing item banks are not incurred either. This point was found to be the stage of raw material for testing. Users of the bank will thus make searches in the bank and when they have found what they want, they will fashion it to the type of item they need. This database thus answers a need for testing raw material that item writers can readily use and the need for material that reflects real errors.

The classification system
The chief feature of the classification system is its eclecticism. It contains structuralist, notional and some functional categories. Also, the user will discover that, partly due to the eclecticism of the system, the categories in the classification are not discrete; they overlap. It did not seem to be possible to delineate categories, neither did it seem desirable to do so. Some linguists and purists may frown on such an approach, but once again, our main goal was to ensure as wide an accessibility for the bank as possible. We wanted to make sure that the bank is usable for test constructors, whatever their training in linguistics might have been, whatever beliefs and personal professional constructs they might have and whatever test construction matrix they may use. Thus, the classifications of a certain sample may not be taken as 'the right' classification, and for the very same reason there are nearly always more than one category labels attached to any one sample. Also, we think that underlying professional issues as well as their subsystems may be better followed in this way. A sample categorised as VAL, for example, would also likely be labelled as PP, GER or INF; or anything coded either ART and/or PRO must also be categorised as REF. Once again, it should be remembered that the editors' aim was to make every sample 'findable', one way or other.

The categorisations represent the judgement of the editors as a collective body. We tried to ensure by checking each other's classification that the different types of samples are categorised in the same way. In this way, we have certainly managed to bring our thinking closer, building an editors' team, but inconsistencies may still have remained in the bank.

What information does the bank contain?
There are 3044 recorded student language samples in the bank. Apart from the samples, each row (data record)includes information
:: whether the sample recorded was oral or written,
:: what geographical location it was observed in,
:: whether the sample was observed in a secondary or higher education institution,
:: what category of error the editors labelled it.


Thank you for using our service.