More on the Validity and Reliability of C-test Scores: A Meta-Analysis of C-test Studies
Hundreds of C-test studies have been published since Klein-Braley’s (1981) dissertation work in Duisburg, Germany (Grotjahn, 2016). C-tests are popular because many claim they are easy to develop, administer, and score. C-tests are widely used, and C-test studies vary in crucial ways: C-test scores are used to support different decisions, C-test users interpret scores as measuring different language constructs, and researchers construct and develop C-tests in many ways. Variations across C-test studies can pose unique challenges for C-test use.I report the results of a random-effects meta-analysis of C-test study information. I collect information about the types of decisions C-test scores are used to support, correlation coefficients and score reliabilities to shed light on what C-test scores measure, and information about steps C-test users took to construct and develop their C-tests. Studies were retrieved from five major search channels. Inclusion–exclusion decisions were made during eligibility and prescreening stages. The main study-coding phase involved five qualified coders, rigorous coder-training procedures (see Stock, 1994), the double coding of all studies, use of a FileMaker Pro 17 coding form, and assessing coder reliability. In addition to information about descriptive statistics, correlational analyses, and score-reliability estimates, the coding team gathered information about study features, language setting, participants, how C-test scores were used, and how C-tests were constructed and developed; variables in these categories were the basis for subgroup analyses. Score reliabilities were corrected for measurement artifacts, and correlation coefficients were corrected for attenuation before analysis.Following study screening, 239 studies were included in the dataset. Results show that, when study effects are grouped by criterion construct, C-test scores correlate most strongly with scores on other general language proficiency tests (r = .94 [.87; .97]). Too few study effects were available to examine the relationship between decisions made on the basis of C-test scores and the magnitude of correlation coefficients. Key findings regarding C-test construction and development steps are that reliabilities are higher when users explore alternate deletion rules, use a series of dashes to indicate deleted letters or syllables, use alternate-answer scoring schemes, analyze scores with factor models, and order C-test texts randomly. There was little evidence of bias in study results.
Showing items related by title, author, creator and subject.