In compliance with the Uniform Guidelines on Employee Selection Procedures (1978)

Document Number 29 CFR 1607

Section 1607.15 B Criterion-related Validation Studies

1. User: Anderson Corporation

Location: ***************

Dates of Study: Mar 6-10, 2000

2. Problem and Setting: Small manufacturing companty with semi-skilled labor force (approximately 20 shop-floor personnel) seeks test to qualify job applicants for works as machine operators. Company reviewed Benchmark Testware’s "Shop Apprentice" test and felt it was to advanced for the skill level required of their operators; Benchmark Testware then suggested that a personality-based measurement tool, the Benchmark "Personality Profile", might be more suitable for use as part of their selection process. To investigate the criterion-related validity of this instrument each operator from the existing workforce was aksed to take the test, and was separately rated on a number of job-related criteria by his supervisor, Statistical analysis was conducted to determine the correlation between test scores and subjective ratings.

3. Job Analysis Procedure: Benchmark Testware has been working with manufacturing companies in a variety of industries since 1995 to develop tests for skilled tradespeople. Benchmark’s Research Director, Martin Green, is a professional engineer with fifteen years of background in plant maintenance. Through personal experience and discussion with numberous manufacturing supervisors and workers, Benchmark has identified ten criteria that describe areas in which unskilled or semi-skilled workers can be subjectively appraised. Note that 29 CFR 1607.14B(3) approves the use of criteria without the need for full job analysis:

"Certain criteria may be used without a full job analysis if the user can show the importance of the criteria to the particular employment context. These criteria include but are not limited to production rate, error rate, tardiness, absenteeism, and length of service. A standardized rating of overall work performance may be used where a study of the job shows that it is an appropriate criterion."

Basis for Selection of Criteria: Review of job information shows the worksite to be a manufacturing shop with general skills required to be greater than those of repetitive, assembly line work, but less than those of skilled maintenance technicians. Benchmark Testware has developed different sets of evaluation criteria depending on the skill level required. Based on the nature of the skills required in this worksiet, we adopted Benchmark’s standard "Machine Operator Evaluation Form (see appendix A).

4. Job Titles/Codes

Apprentice Extruder Trainee

Journeyman Extruder Operator


Lead Operator

Veneer Operator

Veneer Helper

5. Bases for selection of criterion measure: The test instrument was designed to be part of an overall hiring procedure which would vary from one plant to another. The basis for selection of criterion measures was to provide an objective ranking outside of the test instrument against which the test results could be mathematically correlated. Therefore criterion measures were chosen which could be determined by supervisory appraisals.

Appraisal Forms: Prior to seeing the scores of the sample group, the supervisor was provided with evaluation forms listing ten criteria on which the workers were to be rated. A sample copy of the appraisal forms is shown in Appendix A. These forms were completed by the supervisor before he saw the results of the tests, and submitted to Benchmark Testware

Measures to Ensure Fairness of Ratings: The fairness of the ratings was dependent on lack of personal biases on the part of the supervisor. Because the sample group was largely homogenous (fifteen out of sixteen white male), it was determined that any personal biases on the part of the supervisor could have little effect on the validation of the test, other than to artificially degrade the correlation. In other words, if the supervisor gave a poor rating to someone who deserved a good rating, and that individual subsequently performed well on the test, the effect would be simply to make the test look bad. It would not lead to a modification of the scoring scale which would bias the test against minorities. Since it was in the best interest of the supervisor to make the test look good, it would therefore be expected that he would attempt to make his ratings as objective as possible without regards to personal biases.

6. Sample Group Description: The sample group consisted of thirteen machine operators employed at Anderson, plus three manufacturing technicians drawn from staff at the next level of progression within the company who were included for purpose of comparison and to broaden the base of the study. The sample group was chosen so as to give the best available measure of the applicability of the test.

Demographic Make-up: One member of the sample group was an African-American, and the remaining fifteen where white males.

7. Description of Selection Procedures: This report studies the test procedure known as "SkillsProfiler Personality Profile" (copyright 1999 Benchmark Testware 477 Jarvis Ave. Winnipeg Canada) . This test is designed to be used by employers as one component of an overall hiring procedure.

8. Techniques and Results: The employer’s subjective evaluations are tabulated by Benchmark Testware and an overall rating is determined for each employee by taking the sum of the individual criterion ratings. The members of the sample group are then ranked according to thes ratings, and their raw test scores are placed in order beside these rankings.

A mathematical correlation value (between -1.0 and 1.0) is determined between the two columns of values, with a value of zero corresponding to random figures, i.e. no correlation. Subsequently, various sub-categories of the test questions are isolated, sub-scores calculated, and these sub-categories are then correlated with the supervisory ratings to determine which types of questions are most significant. It is these significant sub-categories, in addition to the raw scores, which are brought to the attention of the employer when examining the test results of job applicants in greater detail.

Statistical Results and Significance: A print-out of the spreadsheet analysis is provided in Appendix B. It may be noted that the raw score correlation obtained for the group was 0.43, and that the best correlation obtained by weighted scoring of sub-categories was that obtained in Category 2, "Integrity", where the correlation between scores and evaluations was .59 (59%). Based on the sample size (16) and the observed raw corellation, a statistical analysis was carried out to determine the significance. It was determined that there is a probability of less than .04 that a positive correlation at this level would have been obtained purely by chance.

Observed Differences according to Race and Gender: The lone African-American test subject was ranked by his supervisor to be in the top quartile of the sample group based on the criteria ratings.His test score also placed him in the top quartile; therefore, in this study there were no observed differenced according to race.


Measures Taken to Ensure Fairness: The test is generally designed in accordance with professional test design practises. Multiple-choice questions are offerred with clear alternatives. English usage has been reviewed to ensure that the vocabulary is chosen to be as simple as possible in order to convey the necessary meaning of the questions. All questions cover material which requires no specific technical training or education, rather than knowledge which would only be available to those with special education or industrial experience.

9. Alternative Procedures: The employer has been working until now with the simple alternative of hiring on job interviews. The results have not led to the most efficient outcome; as one supervisor explains, 80% of his time is spent dealing with 20% of his workers; in other words, there is significant value to the company to hire workers who will perform the job properly without the need for frequent monitorin. It is essential to use moreaccurate means to identify the best possible job applicants for these positions. The "do-nothing" alternative was therfore not acceptable to the employer; furthermore, the employer had no evidence that a "do-nothing" alternative would have less adverse impact on protected groups than implementation of a testing procedure.

10. Uses and Applications: The test procedure is for the employer to have all job applicants write the "Personality Profile" test on computer. The results are stored in a file, which is sent electronically to Benchmark Testware. The raw scores are tabulated and the detailed results are placed in a spreadsheet for breakdown of scores according to various sub-categories determined by Benchmark Testware. Based on this analysis, recommendations are made by Benchmark Testware to the employer concerning the abilities of the applicants. The final selection of applicants is made by the employer.

Evidence of Validity of Procedure: There are several elements of evidence which attest to the validity of the procedure. The most important is the significant correlation observed between test scores and supervisory ratings as measured on the sample group of existing Anderson employees. Because this study was carried out on an existing workforce, one might question whether its validity applied only to people with experience on the job, or whether it is also applicable to new applicants. ("Concurrent" vs. predictive" studies).There are two answers to this question:

1. The personality attributes measured in this test have little apparent connection to the specific job skills which employees learn after they are hired.

2. The Code of Federal Regulations recognizes the admissibility of concurrent validity studies where predictive studies are not feasible. Note 29 CFR1607.C(4):

"Representativeness of the sample. Whether the study is predictive or concurrent, the sample subjects should insofar as feasible be representative of the candidates normally available in the relevant labor market for the job or group of jobs in question, and should insofar as feasible include the races, sexes, and ethnic groups normally available in the relevant job market.

In determining the representativeness of the sample in a concurrent validity study, the user should take into account the extent to which the specific knowledges or skills which are the primary focus of the test are those which employees learn on the job."

11. Source Data: Information on the participants in the sample group are available from Anderson.

12. Contact Person(s):

Anderson Inc: William Stokes

Benchmark Testware: Martin Green

13. Accuracy and Completenes: I hereby certify that this validation study was carried out to the best of my professional capabilities, and that no relevant information concerning this study has been supressed from this report.



Martin Green, M. Sc, P. Eng.

Research Director, Benchmark Testware