In compliance with the Uniform Guidelines on Employee Selection Procedures (1978)

Document Number 29 CFR 1607

Section 1607.15 B Criterion-related Validation Studies

1. User: Robus Inc.

Location: ***************

Dates of Study: Jan. 4-7, 1999

2. Problem and Setting: Small manufacturing companty with multi-trades maintenance staff (eight electrical/mechanical tradespeople) seeks test to qualify job applicants for works in maintenance shop. Company reviewed Benchmark Testware’s "Shop Apprentice" test and felt it it had potential for use as part of their selection process. To investigate the criterion-related validity of this instrument members of the plant maintenance staff each took the test, and were separately rated on a number of job-related criteria by their supervisor, Statistical analysis was conducted to determine the correlation between test scores and subjective ratings.

3. Job Analysis Procedure: Benchmark Testware has been working with manufacturing companies in a variety of industries since 1995 to develop tests for skilled tradespeople. Benchmark’s Research Director, Martin Green, is a professional engineer with fifteen years of background in plant maintenance. Through personal experience and discussion with numberous maintenance supervisors and workers, Benchmark has identified eight criteria that describe areas in which skilled workers can be subjectively appraised. Note that 29 CFR 1607.14B(3) approves the use of criteria without the need for full job analysis:

"Certain criteria may be used without a full job analysis if the user can show the importance of the criteria to the particular employment context. These criteria include but are not limited to production rate, error rate, tardiness, absenteeism, and length of service. A standardized rating of overall work performance may be used where a study of the job shows that it is an appropriate criterion."

Basis for Selection of Criteria: Review of job information shows the worksite to be a multi-trades maintenance shop with general skills similar to those required in many other medium-sized manufacturing plants. Therefore, the general criteria developed for other worksites was adopeted without change (see "Skilled Trades Evaluation Form", appendix A).

4. Job Titles/Codes: N/A

5. Bases for selection of criterion measure: The test instrument was designed to be part of an overall hiring procedure which would vary from one plant to another. The basis for selection of criterion measures was to provide an objective ranking outside of the test instrument against which the test results could be mathematically correlated. Therefore criterion measures were chosen which could be determined by supervisory appraisals.

Appraisal Forms: Prior to seeing the scores of the sample group, the supervisor was provided with evaluation forms listing eight criteria on which the workers were to be rated. A sample copy of the appraisal forms is shown in Appendix A. These forms were completed by the supervisor before he saw the results of the tests, and submitted to Benchmark Testware

Measures to Ensure Fairness of Ratings: The fairness of the ratings was dependent on lack of personal biases on the part of the supervisor. Because the sample group was racially homogenous, it was determined that any personal biases on the part of the supervisor could have little effect on the validation of the test, other than to artificially degrade the correlation. In other words, if the supervisor gave a poor rating to someone who deserved a good rating, and that individual subsequently performed well on the test, the effect would be simply to make the test look bad. It would not lead to a modification of the scoring scale which would bias the test against minorities. Since it was in the best interest of the supervisor to make the test look good, it would therefore be expected that he would attempt to make his ratings as objective as possible without regards to personal biases.

6. Sample Group Description: The sample group consisted of all seven multi-craft maintenance mechanics then on staff at Robus, plus a single member of the office staff who was included for purpose of comparison, but whose score was not included in the statistical analysis. The sample group was chosen so as to give the best available measure of the applicability of the test.

Demographic Make-up: None of the members of the sample group were reported to be members of protected classes.

7. Description of Selection Procedures: This report studies the test procedure known as "SkillsProfiler Shop Apprentice Tests" (copyright 1997 Benchmark Testware 477 Jarvis Ave. Winnipeg Canada) . This test is designed to be used by employers as one component of an overall hiring procedure.

8. Techniques and Results: The employer’s subjective evaluations are tabulated by Benchmark Testware and an overall rating is determined for each employee by taking the sum of the individual criterion ratings. The members of the sample group are then ranked according to thes ratings, and their raw test scores are placed in order beside these rankings.

A mathematical correlation value (between -1.0 and 1.0) is determined between the two columns of values, with a value of zero corresponding to random figures, i.e. no correlation. Subsequently, various sub-categories of the test questions are isolated, sub-scores calculated, and these sub-categories are then correlated with the supervisory ratings to determine which types of questions are most significant. It is these significant sub-categories, in addition to the raw scores, which are brought to the attention of the employer when examining the test results of job applicants in greater detail.

Statistical Results and Significance: A print-out of the spreadsheet analysis is provided in Appendix B. It may be noted that the raw score correlation obtained for the group was 0.83, and that the best correlation obtained by weighted scoring of sub-categories was that obtained in Method 5, "Shop Tools and Super Key", where the correlation between scores and evaluations was .96 (96%). To determine the statistical significance of these results, a spreadsheet simulation was carried out. This involved repeatedly simulating random test scores and calculating the resulting correlation co-efficients with respect to the supervisory evaluations. The analysis showed that out of 914 trials, a correlation of greater than .80 occured in only 14 instances. This indates a probability of less than .02 that a positive correlation at this level would have been obtained purely by chance.

Observed Differences according to Race and Gender: Because of the size and homogeneity of this sample group, no separate results are available for different minority groups.

Measures Taken to Ensure Fairness: The test is generally designed in accordance with professional test design practises. Multiple-choice questions are offerred with clear alternatives. English usage has been reviewed to ensure that the vocabulary is chosen to be as simple as possible in order to convey the necessary meaning of the questions. All questions cover material which an individual with no specific technical training might have been able to learn through working in the home or garage, rather than knowledge which would be available only to those with special education or industrial experience.

9. Alternative Procedures: The employer has been working until now with the simple alternative of hiring on job interviews. Because of the critical nature of the plant maintenance function with respect to the over-all profitability of the business, it is essential to use more accurate means to identify the best possible job applicants for these positions. The "do-nothing" alternative was therfore not acceptable to the employer; furthermore, the employer had no evidence that a "do-nothing" alternative would have less adverse impact on protected groups than implementation of a testing procedure.

10. Uses and Applications: The test procedure is for the employer to have all job applicants write the "Shop Apprentice" test on computer. The results are stored in a file, which is sent electronically to Benchmark Testware. The raw scores are tabulated and the detailed results are placed in a spreadsheet for breakdown of scores according to various sub-categories determined by Benchmark Testware. Based on this analysis, recommendations are made by Benchmark Testware to the employer concerning the abilities of the applicants. The final selection of applicants is made by the employer.

Evidence of Validity of Procedure: There are several elements of evidence which attest to the validity of the procedure. The most important is the significant correlation observed between test scores and supervisory ratings as measured on the sample group of existing Robus employees. Because this study was carried out on an existing workforce, one might question whether its validity applied only to people with experience on the job, or whether it is also applicable to new applicants. ("Concurrent" vs. predictive" studies).There are three answers to this question:

1. Maintenace workers in the plant are expected to come into the position with a high degree of knowledge already in place.

2. The knowledge attributes measured in this tests are specifically chosen to consist of things that a person with no industrial experience might have readily learned at home, e.g. household plumbing, wiring, carpentry, minor car repair, as opposed to topics which would have required specific industrial training or experience e.g. bearings, hydraulics, three-phase electrical power.

2. The Code of Federal Regulations recognizes the admissibility of concurrent validity studies where predictive studies are not feasible. Note 29 CFR1607.C(4):

"Representativeness of the sample. Whether the study is predictive or concurrent, the sample subjects should insofar as feasible be representative of the candidates normally available in the relevant labor market for the job or group of jobs in question, and should insofar as feasible include the races, sexes, and ethnic groups normally available in the relevant job market.

In determining the representativeness of the sample in a concurrent validity study, the user should take into account the extent to which the specific knowledges or skills which are the primary focus of the test are those which employees learn on the job."

11. Source Data: Information on the participants in the sample group are available from Robus.

12. Contact Person(s):

Robus Inc: Jack Michael

Benchmark Testware: Martin Green

13. Accuracy and Completenes: I hereby certify that this validation study was carried out to the best of my professional capabilities, and that no relevant information concerning this study has been supressed from this report.



Martin Green, M. Sc, P. Eng.

Research Director, Benchmark Testware