Monday, October 28, 2013

AN ANALYSIS OF ENGLISH GRAMMAR II TEST


I.  INTRODUCTION
1.1.   The Importance of Test Analysis
Item analysis demands a clear understanding on course planner about the information derivable from test when administered. Diagnostic information such as the difficulty of test to tester, the discriminating power or the effectiveness of distracters in case of objective tests are examples of such information. It is very important to analyze to test since it is provides some advantages such:
ü  To provide some quantitative evidence to reveal and/ or support the difficulty and discrimination indices of test items.
ü  To judge the worth or quality of a test
ü  To reveal to the test construction how his/her tests behaves so as to build a test file which is constantly being improved upon
ü  To make known and to determine what to do as regards making subsequent revisions of tests.
ü  To provide interesting and useful information on the achievement of individual tester which can be used as valuable data for diagnosis individual difficulties and prescribing remedial measures or planning future learning activities.
ü  To impress on the teacher the need for improvement biased on the resulting data (improvement in teaching and teaching resources will often be made obvious by the analysis)
ü  To provide a basis for discussing test results.
ü  To provide a learning experience for students, if students assist in or are told the results of item analysis.

1.2.   What are to Analyze? (general)
The analysis of the test sometimes focus on analyzing the information such as the validity, reliability, difficulty level, discriminating power, the effectiveness of distracters.


1.3.   What test and what are to analyze? (Specific)
In this chance, I will analyze the grammar II test. This analyzing purposes to analyze the information of content validity of the related grammar test, the difficulty level, the discriminating power, and the effectiveness of distracters.

II.  CONTENT VALIDITY
2.1.   Definition
Validity is concerned with the extent to which test result serve their intended use. Methods of determining validity:
1.      Validity refers to the interpretation of test (not the test itself)
2.      Validity is inferred from available evidence (not measured)
3.      Validity is specific to particular use (selection, placement, evaluation of learning, and so forth)
4.      Validity is expressed by degree (for example, high, moderate, or law)
Content Validity is especially important in achievement testing. We can built a test that has high content validity by (1) identifying the subject-matter topic and learning outcome to be measured, (2) preparing a set of specification, which defines the samples of items to be used, and (3) constructing a test that closely fits the set of specification.

2.2.   Table of Test Specification
No
Topic
Sub Topic
Test Type
Items Number
Number of Items
Percentage
1
Tenses
Past continuous
Simple present
Simple past
Present perfect
Future perfect
Past perfect
Multiple Choice
15
10
24
32,36,38
34,35
37
1
1
1
3
2
1
22,5%
2
Modal Aux and Similar Expressions
Would/should
Could
Have/ has/had
Must
Supposed to
Used to
Multiple Choice
1,2,6,8,12,23,27,29
11,13,19,21,28
3,4,25
5,7,14,16,17,18,20,22,26
9
30
8
5
3
9
1
1
67,5%
3
Passive Voice
Modal passive
Present Perfect
Multiple Choice
22
31,33,39
1
3
10%

3
14


40
100%

III.  DIFFICULTY LEVEL
3.1.   Definition
The difficulty of an item is understood as the proportion of the persons who answer a test item correctly. The higher this proportion, the lower the difficulty level. The higher the difficulty of an item, the lower its index. The formula of item difficulty (p) is shown as bellow:
p : Difficulty index of item
A: Number of correct answer to item
N: Number of correct answers plus number of incorrect answers to item


3.2.   Classification of Difficulty Level
Percentage of Difficulty Level
Item Meaning
100 %
Too easy
80% – 99%
Very Easy
71% - 79%
Easy
40% - 70%
Moderate
20% - 39 %
Difficult
1% - 19%
Very Difficult
0%
Too Difficult

IV.  DISCRIMINATING POWER
4.1.   Definition
One of the many purposes of testing is to distinguish knowledgeable examinees from less knowledgeable ones. Each item of the test, therefore, should contribute to accomplishing this aim. That is, each item in the test should have a certain degree of power to discriminate examinees on the basis of their knowledge. Item discrimination refers to this power in each item.
The higher the discrimination index, the better the item because such a value indicates that the item discriminates in favor of the upper group, which should get more items correct. It means the item has positive ID. When more students in the lower group than in the upper group select the right answer to an item, the item actually has negative validity. It means the item has negative ID.
4.2.   Formula
The discrimination index (D):

D                               =Discrimination index of item
GA correct answers  =Number of correct answers to item in upper group
GB correct answers  = Number of correct answers to item in lower group
½ N                           = ½ of the total number of responses
            In computing the discrimination index, D, first score each student's test and rank order the test scores. Next, the 27% of the students at the top and the 27% at the bottom are separated for the analysis.

4.3.   Criteria of Discriminating power

V.  EFFECTIVENESS OF DISTRACTERS
5.1.   Definition
Acceptable p and D values are two important requirements for a single item. However, these values are based on the number of correct and wrong responses given to an item. They are not concerned with the way distractors have operated. There are cases that an item shows acceptable p and D, but does not have challenging distractors. Therefore, the function of analyzing test is to examine the quality of the distractors.
A discrimination index or discrimination coefficient should be obtained for each option in order to determine each distractor's usefulness (Millman & Greene, 1993). Whereas the discrimination value of the correct answer should be positive, the discrimination values for the distractors should be lower and, preferably, negative. Distractors should be carefully examined when items show large positive D values. When one or more of the distractors looks extremely plausible to the informed reader and when recognition of the correct response depends on some extremely subtle point, it is possible that examinees will be penalized for partial knowledge.


VI.  CONCLUSION
The analysis of grammar II test, which is shown in previous explanation in this paper, has some points as the result that I can conclude as bellow:
ü  The topics of this grammar test consist of 22, 5% of tenses, 67, 5% of modal auxiliary and expression, and 10% of passive voice. These topics are included through 40 items, and the test is administered for 31 testees.
ü  Then as the result of students’ rank, the highest score is 35, and the lowest score is 8.
ü  After analyzing the 20 sample, it was found that this grammar test is in moderate level since 57,5% of 40 items show moderate difficulty level. Besides, the very difficulty level of item was found only 5% of 40 items that is representing by items number 21 and 31.
ü  For the result of discriminating power, this grammar test generally showed positive discrimination index. The excellent level discriminating power is 67,5% of 40 items. On the other hand, 5% of 40 items is the worst level that showed negative discrimination index which is shown in item number 10 and 13.
ü  For the last result of this grammar test analysis, the distracters showed 66% of 120 distracters are effective which have distracted successfully more students in lower group.





REFERENCES

Farhady, Hossein. (1986). Fundamental Concepts in Language Testing (3) Characteristics of Language Tests: Item Characteristics. Roshd Foreign Language Teaching Journal, 2 (2 & 3). 15-17

Alderson, J. C., C. M. Clapham & D. Wall (1995). Language Test Construction and Evaluation. Cambridge: Cambridge University Press.

Backhoff, E., Larrazolo, N., & Rosas, M. (2000). The level of difficulty and discrimination power of the Basic Knowledge and Skills Examination (EXHCOBA). Revista Electrónica de Investigación Educativa, 2 (1). Retrieved March 8, 2012 from: http://redie.uabc.mx/vol2no1/contents-backhoff.html

Hetzel, Susan Matlock. (1997). Basic Concepts in Item and Test Analysis. Texas: A&M University Press

Interpreting Test Result


I.            INTRODUCTION
The discussion in this chapter is about the elementary methods of interpreting test result that are commonly used with informal achievement tests, and it assumes that the test was specifically constructed to maximize the method of interpretation being used.

II.            SUMMARY OF CONTENT
Test result can be interpreted in two basic ways; criterion referenced interpretation describes the types of performance a student can demonstrate, norm referenced interpretation describes how a student’s performance compares to others. Both types of interpretation are sensible and each provides unique information of students’ achievement.

How the results are organized and presented depends to a large extent on the type of interpretation to be made. Combining the two types of interpretation is most likely to be effective where norm referenced interpretation is added to the performance description of a criterion referenced test. Since norm referenced tests typically cover a broad range of learning outcomes with few items per outcome, the performance descriptions tend to be sketchy and unreliable.

A criterion referenced interpretation can be limited to a simple description of the tasks that a student can perform, or it can involve a comparison of the student’s performance to some performance standard. In either case it does not require comparing the student’s performance to the performance of other.
·    Performance description: it is assumed that a standard or cut off score is required for interpretation because criterion referenced testing is commonly used to measure mastery or minimum competency. In setting performance standard, a relatively simple and practical procedure is to arbitrarily set a standard and then adjust it up or down as various conditions and experiences are considered.
·    Use of performance standard: the percentage correct score is widely used in judging whether objectives have been mastered and thus in reporting the results on criterion referenced mastery tests.

Norm reference interpretation involves some means of showing how an individual’s test score compares to the scores of others in some known group.
·      Simple ranking of raw scores: a common method of presenting the scores on a norm referenced test to a classroom group is to simply list the scores on the board. This is done by arranging the scores in rank order from high to low and making a frequency count to show the number (N) of students earning each score.
·      Percentile ranks: put raw scores on a scale that has the same meaning with different sized groups and that is readily understood by test users.
PR=number of students below score+12of students at scorenumber in group (N)X 100

For some purposes it is needed to describe a set of scores in brief form:
1.      The average score (central tendency)
·    Median: arranging the scores in order of size and counting up to the midpoint.
·    The mean: adding all scores and dividing by the total number of scores.
·    The mode: inspecting the frequency for each score.
2.      The spread of score (variability)
The range: simply the interval between the highest and lowest scores.
The standard deviation:
SD=sum of high sixth-sum of low sixthhalf the number of students

III.            CONCLUSION
Those are above the elementary methods of interpretation. Which they are norm referenced interpretation and criterion interpretation.

REFERENCES
Gronlund, Norman E. (1982). Constructing Achievement Tests. Englewood Cliffs: Prentice Hall Inc.