What type of reliability is obtained when the same test is given twice to the same group of examinees?

Nature of Reliability

1.     Reliability refers to the results obtained with an evaluation instrument and not to the instrument itself. 

2.     Reliability refers to a type of consistency.

3.     Reliability is a necessary but not a sufficient condition for validity. 

4.     Reliability merely provides the consistency that makes validity possible.

Interpreting Reliability

1.          Group variability affects the size of the reliability coefficient.  Higher coefficients result from heterogeneous groups than from homogeneous groups.

2.          Scoring reliability limits test reliability.

3.          All other factors being equal, the more items included in a test, the higher the test�s reliability.

4.          Reliability tends to decrease as tests become too easy or too difficult.

Factors Influencing Reliability

1.             Length of Test

2.             Spread of Scores

3.             Difficulty of Test

4.             Objectivity


 

Methods of Estimating Reliability

Test-retest Method                 Measure of Stability                Give the same test twice to the

                                                                                                same group with any time

                                                                                                interval  between tests, from

                                                                                                several minutes to several years.

Equivalent-forms Method     Measure of Equivalence       Give two forms of the test to the

                                                                                                same group in close succession.

Split-half Method                   Measure of Internal               Give test once.  Score two

                                                Consistency                            equivalent halves of test.  Use

                                                                                                the Spearman-Brown formula.

Kuder-Richardson                  Measure of Internal               Give test once.  Score test and

Method                                   Consistency                            apply Kuder-Richardson

formula.


 

Nature of Validity

1.    Validity refers to the appropriateness of the interpretation of the results of a test or evaluation instrument for a given group of individuals, and not to the instrument itself.

2.    Validity is a matter of degree; it does not exist on an all-or-none basis.

3.    Validity is always specific to some particular use or interpretation.

4.    Validity is a unitary concept based on various kinds of evidence.

Checking For Validity

Content Validity Examine the test to see if the questions correspond to what the user intended to test.

Criterion-Related Validity Scores from a test are correlated with an external criterion.

    Concurrent Criterion-Related Validity Two tests are given to the same group of examinees, one being the established test (criterion) and the other the new test.

    Predictive Validity Test that predicts some future behavior of an examinee.

Content-Related Validity

Classroom Instruction

Determines which intended learning outcomes are to be achieved by pupils.

Achievement Domain

Specifies and delimits a set of instructionally relevant learning tasks to be measured by a test.

Achievement Test

Provides a set of relevant test items designed to measure a representative sample of the tasks in the achievement domain.

What type of reliability is obtained when the same test is given twice to the same group of examinees?

Content Validity Example

Instructional Objective:

        The student will write the capitals of all fifty states in the United States of America.

Instructional Activities:

1.        The student makes �flash cards� with the state�s name on one side and the state�s capital on the other side.  The student will study these at home.

2.        Given a map of the United States, the student will write the name of capital in the appropriate state.

Test:

        The student will write the state capital�s name on a sheet of paper when the teacher reads the state�s name.

Concurrent Validity Example

Reading Achievement

        A multiple-choice test on reading achievement (which is designed to measure achievement at the time of testing and not designed to predict any future behavior) might be validated by comparing the scores on the test with teacher ratings of students� reading abilities.  The teachers� ratings are the criterion.

Factors Influencing Validity

1.     The Test Itself.

a.    Unclear directions. 

b.    Reading vocabulary and sentence structure too difficult.

c.    Inappropriate level of difficulty of the test items.

d.    Poorly constructed test items.

e.    Ambiguity.

f.        Test items inappropriate for the outcomes being measured.

g.    Inadequate time limits.

h.     Test too short.

i.          Improper arrangement of items.

j.          Identifiable pattern of answers.

2.     Factors in Test Administration and Scoring.

3.     Factors in Pupils� Responses.

4.     Environment.

Reliability & Validity Multiple-Choice

1.     The correlation between test scores and a criterion is a measure of:

a.     causation

b.     objectivity

c.     reliability

d.     validity

e.     variability

2.     The biggest obstacle to determining a test�s predictive validity is:

a.     devising tests with ingenious and well-constructed items

b.     administering the test under uniform conditions

c.      obtaining a sufficiently large sample of items

d.     obtaining a really adequate criterion measure

3.     For which of the following tests would one be most exclusively interested in predictive validity?

a.     a biographical data bank being used in picking airplane pilots

b.     a measure of attitudes towards Communism

c.      a diagnostic test of reading comprehension

d.     an introversion-extroversion questionnaire

4.     The type of validity that is most appropriate for aptitude tests is:

a.     content validity

b.     predictive validity

c.      concurrent validity

d.     face validity

5.     You have devised a new measure called the PITSS and correlate it with an existing procrastination inventory.  This is an example of:

a.     content validity

b.     predictive validity

c.      concurrent validity

d.     construct validity

6.     On which of the following tests is content validity most appropriate?

a.     The Alpha Aptitude Battery

b.     The Beta Achievement Test

c.      The Gamma Personality Inventory

d.     The Delta Intelligence Test

e.      The Epsilon Test of Creative Ability

7.     Which type of validity coefficient is most appropriately used for selection purposes?

a.     predictive

b.     concurrent

c.      construct

d.     content

8.     Decreasing the time interval between predictor and criterion measures:

a.     increases the validity coefficient

b.     decreases the validity coefficient

c.      has no effect on validity

d.     none of the above

9.     Which type of validity coefficient is most important of personality tests?

a.     predictive validity

b.     concurrent validity

c.      construct validity

d.     content validity

10.   Comparing test items with objectives refers to which type of  

       validity?

a.     content

b.     predictive

c.      concurrent

d.     construct

11.   Requires  a time interval for its determination:

a.     content validity

b.     predictive validity

c.      concurrent validity

d.     construct validity

12.   The number of items on the predictor is cut in half.  This:

a.     increases the validity coefficient

b.     decrease the validity coefficient

c.      has no effect on the validity coefficient

d.     cannot occur

13.   The content validity of a teacher-constructed achievement 

        test is:

a.     high if the teacher has matched items to objective

b.     usually unacceptable due to lack of expert input

c.      generally low despite the teacher�s knowledge of his class.

d.     About equivalent to that of similar standardized tests

14.  Comparing a newly formed anxiety scale with an existing 

       anxiety scale yields this type of validity coefficient.

a.     content validity

b.     predictive validity

c.      concurrent validity

d.     construct validity

15.    The new IQ test you have devised is administered to a gifted 

        class.  Its results are then correlated with end-of-year grades.  

        Compared with the correlation that would be obtained if it

        were correlated with grades from regular class students, this 

        correlation would be:

a.     lower

b.     higher

c.      curvilinear

d.     about the same

16.   To build reliability into a test, it is desirable to:

a.     write items of high difficulty level

b.     write items of various difficulty levels

c.      offer the poorer students rewards to heighten their attention

d.     write items in different areas of interest

17.   For speeded tests, the split-halves procedure of determining

       reliability will usually yield estimates that are:

a.     impossible to interpret

b.     statistically unstable

c.      quite accurate

d.     too high

18.    Instead of giving a test to a single grade level, it is

       administered to the whole school.  The reliability will:

a.     increase

b.     decrease

c.      be unaffected

d.     be very unpredictable

19.  This reliability coefficient is usually greater over a short time

       than a long term:

a.     test-retest

b.     alternate forms

c.      split-halves

d.     Kuder-Richarson

20.  Which of the following examples is not a method of building

       reliability into a test?

a.     adding items of good quality

b.     administering the test to a heterogeneous group

c.      comparing the test with existing measures

d.     controlling the conditions of test administration

21.  A teacher has just computed the reliability of a test she has

       made after a single administration.  What kind of reliability did 

       she compute?

a.     test-retest

b.     inter-rater

c.      internal consistency

d.     alternate forms

22.  Administering a test in the morning, rather that the 

       afternoon, will cause the reliability of the test to :

a.     increase

b.     decrease

c.      be questionable

d.     vary unpredictably

23.   Erroneously adding five points to each score on a test will 

       cause the reliability coefficient to:

a.     increase

b.     decrease

c.      remain the same

d.     vary unpredictably

24.  You administer the Quick and Dirty Personality Test on 

       January 1, 1984, and March 1, 1984, to the same group of 

       subjects and correlate the results.  This gives you an estimate 

       of:

a.     test-retest reliability

b.     alternate forms reliability

c.      predictive validity

d.     concurrent validity

25.  Involves the administration of two different tests at two

       different times:

a.     test-retest

b.     alternate forms

c.      split-half

d.     Kuder-Richardson

What type of reliability is referred when there is one test given twice?

Test-retest reliability measures the consistency of results when you repeat the same test on the same sample at a different point in time.

What type of reliability is where the test was administered twice to the same group with a time interval not to exceed 6 months?

Test-retest reliability The test-retest reliability method in research involves giving a group of people the same test more than once over a set period of time.

When the same measure is tested twice and shows the same result this example?

Reliability refers to the consistency of a measure. 1 A test is considered reliable if we get the same result repeatedly. For example, if a test is designed to measure a trait (such as introversion), then each time the test is administered to a subject, the results should be approximately the same.

What is test re test reliability repeated measures reliability?

Test-Retest Reliability (sometimes called retest reliability) measures test consistency — the reliability of a test measured over time. In other words, give the same test twice to the same people at different times to see if the scores are the same.