Main Article Content
The purposes of this research were to develop essays rubrics such as three criterion scoring methods and to examine the generalized ability characteristics of the raters with different experiences and using different rubrics. The research instrument was the criteria for scoring. The sample consisted of English teachers with less than 5 years of teaching experience and more than 10 years of teaching experience and 120 high school students. Three types of scoring methods were used. The reliability scores ranged from .825 to .833 and the characteristics of the inspectors were as follows: 1) the raters’ characteristic using analytic rubrics was 2 severities, 1 leniency and 5 neutral. 2) There were 2 strict, 3 lenient and 3 neutral in holistic rubric using. 3) The criterion for checklists was strict 2 people, 4 lenient and 2 neutral. When compared with the experience group, different experienced raters did not make the feature score differently. There were 3 respondents (37.5%) having consistent characteristics when using any criteria. And no one who evaluates the extremes transport characteristic from severe to lenient or lenient to severe.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The owner of the article does not copy or violate any of its copyright. If any copyright infringement occurs or prosecution, in any case, the Editorial Board is not involved in all the rights to the owner of the article to be performed.
Bowsiripon, U. (2000). A comparison of generalizability coefficients of mathematics tests with different scoring methods, numbers of raters and experience of raters (Master thesis). Bangkok: Srinakharinwirot University. [in Thai]
Dunsmuir, S., Kyriacou, M., Batuwitage, S., Hinson, E., Ingram, V., & O’Sullivan, S. (2015). An evaluation of the writing assessment measure (WAM) for children’s narrative writing. Assessing Writing, 23, 1 – 18.
George, D., & Mallery, P. (2003). SPSS for Windows step by step: A simple guide and reference 11.0 update (4th ed.). Boston: Allyn & Bacon.
Intanate, N. (2011). Characteristics of the open-ended mathematics test scores for different numbers of raters and scoring patterns using generalizability model and many-facet Rasch model (Doctoral dissertation). Bangkok: Srinakharinwirot University. [in Thai]
Leckie, G., & Baird, J. A. (2011). Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience. Journal of Educational Measurement, 48, 399–418.
Linacre, J. M. (2012). Many-Facet Rasch Measurement: Facets Tutorial. http://winsteps.com/tutorials.htm
Mertler, C. A. (2001). Designing scoring rubrics for your classroom. Practical Assessment, Research & Evaluation, 7, 1-10.
Prieto, G., & Nieto, E. (2014). Analysis of rater severity on written expression exam using many faceted Rasch measurement. Journal of Psicologica, 35, 385-397.
Saenplue, B. (2013). The study of agreement of rater’s severity/leniency and differential rater functioning over time on essay composition by the elementary schooling grade 3 (Doctoral dissertation). Bangkok: Srinakharinwirot University. [in Thai]
Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25(4), 465-493.
Suaysod, K. (2011). A comparison of reliability and objectivity of score that scored by using and not using scoring rubric. Kasalongkham Research Journal, 5(2), 107-114. [in Thai]
Wongpanich, P. (2000). Measurement and evaluation in undergraduate education. Bangkok: Research and Development, office of the Higher Education Commission. [in Thai]