Abstract:
Test items of public examinations are expected to be valid and reliable when good
frameworks are used. Besides, the scoring of the items must be guided by an objective
theoretical framework. Studies have shown that Mathematics Constructed-Response Test
Items (MCRTI) of the West African Examinations Council (WAEC) and National
Examinations Council (NECO) are scored using Classical Test Theories (CTT) framework
which have been adjudged subjective. This study was, therefore, designed to score and
compare students’ ability in WAEC and NECO MCRTI under CTT and Item Response
Theory (IRT). The dimensionality, equivalence of scores and differential item functioning
(between males and females) of WAEC and NECO MCRTI under CTT and IRT were also
examined.
The study was anchored to the Classical Test and Item Response Measurement Theories,
while descriptive survey design was adopted. Counterbalance procedure was employed in the
tests administration. One educational zone out of the two in Ibadan was sampled. All the five
Local Government Areas (LGAs) in the educational zone were used. Twenty-four co educational public Senior Secondary Schools (SSS) were randomly selected from the LGAs
using proportionate to size sampling technique. Two intact classes of Senior Secondary
School 3 (SSS3) in each school were used. In all, 1151 (565 males, 586 females) SSS3
students were sampled. The WAEC and NECO MCRTI (2013-2015) were used for data
collection. The reliability coefficient established were WAEC MCRTI (r = 0.72) and NECO
MCRTI (r = 0.71). Data were subjected to Exploratory Factor Analysis, Parallel Analysis
(CTT models), IRT-Generalized Partial Credit Model (GPCM) and Graded Response Model
(GRM), and correlated samples t-test at 0.05 significance level.
The two MCRTI were multi-dimensional. Under CTT, WAEC MCRTI had four dimensions,
while NECO MCRTI had three dimensions. Under IRT, WAEC MCRTI and NECO MCRTI
had three dimensions each. The WAEC MCRTI (GPCM=0.24, GRM=-3.16) were easier
than NECO MCRTI (GPCM=2.10, GRM=4.95). Students’ mean score in WAEC MCRTI
under CTT was lower ( ̅=35.88, SD=10.02) than under IRT (GPCM) ( ̅=41.70, SD=7.0).
The mean score in NECO MCRTI under CTT was lower ( ̅=33.49, SD= 12.39) than under
IRT (GPCM) ( ̅=41.67, SD=6.98). The mean differences were significant t (1150) = 34.83
(WAEC) and t(1150) = 33.32 (NECO). Students’ mean score in WAEC MCRTI under CTT
was lower ( ̅= 35.88, SD = 10.02) than under IRT (GRM) ( ̅=41.70, SD=7.04). The mean
score in NECO MCRTI under CTT was lower ( ̅=33.49, SD=12.39) than under IRT (GRM)
( ̅=41.67, SD=7.01). The mean differences were significant t(1150) = 34.86 (WAEC) and t(1150)
= 35.04 (NECO). The adjusted scores under CTT and IRT models were equal. Three items
out of 15 WAEC MCRTI exhibited DIF under CTT, while 14 exhibited DIF under IRT in
favour of males. None of the NECO MCRTI items exhibited DIF under CTT, while nine
exhibited DIF under IRT in favour of males.
Item Response Theory models were more effective than Classical Test Theory in scoring
constructed-response tests, equating and detecting differential item functioning. Public
examining bodies should score constructed-response test items using Item Response Theory
models.