Resumen: Research related to the fit evaluation at the item level involving cognitive diagnosis models (CDMs) has been scarce. According to the parsimony principle, balancing goodness of fit against model complexity is necessary. General CDMs require a larger sample size to be estimated reliably, and can lead to worse attribute classification accuracy than the appropriate reduced models when the sample size is small and the item quality is poor, which is typically the case in many empirical applications. The main purpose of this study was to systematically examine the statistical properties of four inferential item-fit statistics: S-X2, the likelihood ratio (LR) test, the Wald (W) test, and the Lagrange multiplier (LM) test. To evaluate the performance of the statistics, a comprehensive set of factors, namely, sample size, correlational structure, test length, item quality, and generating model, is systematically manipulated using Monte Carlo methods. Results show that the S-X2 statistic has unacceptable power. Type I error and power comparisons favor LR and W tests over the LM test. However, all the statistics are highly affected by the item quality. With a few exceptions, their performance is only acceptable when the item quality is high. In some cases, this effect can be ameliorated by an increase in sample size and test length. This implies that using the above statistics to assess item fit in practical settings when the item quality is low remains a challenge. Idioma: Inglés DOI: 10.1177/0146621617707510 Año: 2017 Publicado en: APPLIED PSYCHOLOGICAL MEASUREMENT 41, 8 (2017), [18 pp.] ISSN: 0146-6216 Factor impacto JCR: 0.923 (2017) Categ. JCR: SOCIAL SCIENCES, MATHEMATICAL METHODS rank: 34 / 49 = 0.694 (2017) - Q3 - T3 Categ. JCR: PSYCHOLOGY, MATHEMATICAL rank: 11 / 13 = 0.846 (2017) - Q4 - T3 Factor impacto SCIMAGO: 1.17 - Social Sciences (miscellaneous) (Q1) - Psychology (miscellaneous) (Q1)