INSBAT Intelligence Structure Battery

L.F. Hornke, M. Arendasy, M. Sommer, J. Häusler, M. Wagner-Menghin, G. Gittler, B. Bognar, M. Wenzl © SCHUHFRIED GmbH

A modular intelligence test battery constructed on theory-led principles and designed to measure work-related abilities both fairly and economically.

Assessment of intelligence level and intelligence structure, for respondents aged 14 and over.

Theoretical background
As a decision-oriented psychological assessment tool the INSBAT is constructed modularly. This means that only those subtests that are maximally informative for the purpose of the investigation need be presented. The INSBAT is based on the hierarchical intelligence model of Cattell-Horn-Carroll (Carroll, 1993; Horn, 1989; Horn & Noll, 1997). The model assumes that the intercorrelations between the subtests for measuring the primary factors can be explained by eight secondary factors that are broader in content than the primary ones. The correlations between the secondary factors are explained by a general factor of intelligence, which forms the peak or tip of the hierarchical intelligence model. The validity of this factor structure has been replicated in many studies from different countries (e.g. Arendasy, Hergovich & Sommer, 2008, Brickley, Keith & Wolfe, 1995; Carroll, 1989; Gustafsson, 1984; Horn & Stankov, 1982; Undheim & Gustafsson, 1987). For the construction of the INSBAT the following secondary factors were selected as being relevant to practical areas of application such as work psychology, commercial/industrial and organizational psychology and educational psychology:

Fluid intelligence: the ability to recognize relationships between stimuli, understand implications and draw valid logical conclusions (subtests: Numerical Inductive Reasoning, Figural Inductive Reasoning, Verbal Deductive Reasoning).

Crystallized intelligence: The breadth and depth of acquired cultural knowledge such as word fluency and the understanding of words. (Subtests: Lexical Knowledge, Verbal Fluency, Word Meaning).

Short-term memory: The ability to retain visual and verbal information in the short term and to reproduce it accurately (subtests: Visual Short-term Memory, Verbal Short-term Memory).

Long-term memory: The ability to retain information in the longer term, integrate it into one’s own knowledge base and recall it accurately (subtest: Long-term Memory).

Visual processing: The ability to imagine how objects will look after they have been mentally rotated or transformed (subtest: Visualisation). Quantitative thinking: The ability to understand and apply mathematical skills and concepts (subtests: Computational Estimation, Arithmetical Competence, Arithmetical Flexibility, Algebraic Reasoning).

In all there are therefore 14 subtests available. The items of these subtests were devised with the aid of various method of automatic item generation (AIG: Arendasy & Sommer, 2002; Irvine & Kyllonen, 2002), taking account of the findings of current research in the cognitive sciences, differential psychology and applied psychometrics. The items were constructed either by human item writers or completely automatically using item generators. With regard to the psychometric properties of the item material it was considered important that (1) the items of the individual subtests should be scaleable in accordance with the 1PL Rasch model and (2) the theoretical model on which the items are based should be able to explain at least 50 % of the variance in the item difficulty parameters. This has the advantages for the practitioner of (1) scaling fairness and (2) unambiguity of interpretation of  the individual subtest results.

The INSBAT has been designed as a modular intelligence test battery, so that in principle only those subtests that are relevant to the purpose of the particular assessment need be presented. This can best be done using the variable form (S2). In this form it is possible to select only the subtests that are relevant to the purpose of the investigation. It is also possible to change the order of the subtests and the end conditions of the adaptive tests in line with the requirements of the test situation. The global form (S3), by contrast, has been designed as a fixed test battery that enables a nuanced investigation of the individual’s intelligence level and intelligence structure. In this form each secondary factor is measured by a marker subtest (Figural Inductive Reasoning, Lexical Knowledge, Visual Short-term Memory, Long-term Memory and Spatial Perception) as well as by a second subtest that helps to ensure that the whole breadth of the second-order factor’s content is adequately covered. According to Schmidt und Hunter (1998), this procedure is particularly suitable for predicting the work-related performance of people in occupations involving very diverse and heterogeneous activities. If there is insufficient time for a differentiated assessment of a person’s ability, the user may choose either to administer the Intelligence Structure Battery - Short Form (INSSV) or to re-create the INSSV quickly and easily in Form S2. Each subtest is provided with standardized instructions and practice examples based on the principles of programmed instruction and “mastery learning”. Depending on the subtest, the respondent’s answers are given either in multiple-choice format or as automated free responses. The tasks in the individual subtests are presented partly in power test form and partly with a time limit on each item. In 12 of the subtests the items are presented as an adaptive test (CAT) with the test starting point being selected on the basis of sociodemographic data; this maximizes the information gained without using items that are either too easy or too difficult for the respondent.

Test forms
Two test forms are available: Form S2 (variable test form) and Form S3 (global form).

For each of the selected subtests the ability parameter in accordance with the Rasch model is reported. While the number of correctly worked items is merely a measure of an individual’s performance, the ability parameter makes it possible to estimate the underlying, latent ability dimension. This represents an important and at the same time diagnostically necessary inferential step. This inferential step is, however, linked to the fit of the Rasch model (see van der Linden & Hambleton, 1997), which has been demonstrated for the subtests of the INSBAT. If more than one subtest relating to a particular secondary factor has been selected, an ability parameter is calculated for that secondary factor. The ability parameter for the general factor General Intelligence (G) is calculated if at least one subtest has been selected for each of the six secondary factors Fluid Intelligence (Gf), Crystallized Intelligence (Gc), Visual Processing (Gv), Quantitative Reasoning (Gq), Short-term Memory (Gstm) and Long-term Memory (Gltm). Alongside the provision of the ability parameters and factor scores a norm comparison (percentile ranks and IQ; confidence interval) is automatically carried out. At the conclusion of testing the results are displayed both in tabular form and as a profile, and these can be printed out. In addition, a profile analysis based on the method of psychometric single-case assessment indicates the respondent’s particular diagnostically verified strengths and weaknesses. The INSBAT also includes provision for transferring the test results automatically into a report template. Details of the course of each subtest can be viewed in the test protocols.

Due to the adaptive presentation mode used in some subtests and the applicability of a probabilistic test model that this requires, any desired level of reliability can be achieved. For reasons of economy the reliability of the individual task groups lies between r=0.70 and r=0.95; in the variable form S2 it can be set by the assessor himself within these limits. The stability of the subtests after around 15 months fluctuates between r=0.63 and r=0.87.

The construct representation (Embretson, 1983) of the individual INSBAT subtests has been demonstrated in studies in which the item difficulties were predicted from task characteristics derived from the theoretical models for the solving of these types of tasks. The multiple correlations between the item difficulty parameters of the Rasch model (Rasch, 1980) and the item characteristics thus obtained fluctuate for the individual subtests between R=0.70 and R=0.97. This means that between 50 % and 94 % of the difference in the difficulties of the individual items can be explained by the theoretical models on which construction of the items in the individual subtests is based. In addition, a number of studies of the nomothetic span (Embretson, 1983) of the individual subtests are now available. A study by Sommer and Arendasy (2005; Sommer, Arendasy & Häusler, 2005) provided evidence of construct validity for the test battery as a whole and for the global form and short form. By means of a confirmatory factor analysis the authors were able to confirm the theory-led assignment of the individual subtests to the secondary factors of the Cattell-Horn-Carroll model. These results were supplemented by studies carried out by Arendasy and Sommer (2007) and Arendasy, Hergovich and Sommer (2008), in which the results previously reported were replicated on an independent sample using alternative subtests. Evidence of the criterion validity of the individual INSBAT subtests has come from the fields of aviation psychology (selection of trainee pilots) and educational counseling (prediction of student success at universities of applied sciences).

Norms are available for 904 adults aged between 16 and 73, as well as norms of a sample of 1595 young people aged between 12 and 15. Both norms are also available separated according to age, gender and education.


Arendasy, A., Sommer, M. & Hergovich, A. (2007). Automatische Zwei-Komponenten-Itemgenerierung am Beispiel eines neuen Aufgabentyps zur Messung der Numerischen Flexibilität. Diagnostica, 53, 119-130.

Arendasy, M. (2004). Automatisierte Itemgenerierung und psychometrische Qualitätssicherung am Beispiel des Matrizentests GEOM. Wien: Habilitationsschrift der Universität Wien.

Arendasy, M. (2005). Automatic generation of Rasch-calibrated items: Figural Matrices Test GEOM and Endless Loops Test EC. International Journal of Testing, 5, 197-224.

Arendasy, M., & Sommer, M. (2007). Automatic generation of quantitative reasoning items: A schema-based isomorphic approach. Learning and Individual Differences, 17, 366-383.

Arendasy, M., Sommer, M., Gittler, G. & Hergovich, A. (2006). Automatic generation of quantitative reasoning items:  pilot study. Journal of Individual Differences, 27, 2-14.

Daurer, U. D. (1997). Erstellung einer Jugendlichenform des Tests "Lexikonwissen". Unpublished Master Thesis, University of Vienna, Vienna.

Gittler, G. (1990). Dreidimensionaler Würfeltest. Ein Rasch-skalierter Test zur Messung des räumlichen Vorstellungsvermögens. Theoretische Grundlagen und Manual. Weinheim: Beltz.

Gittler, G. (1999). Manual Adaptiver Dreidimensionaler Würfeltest. Mödling: Dr. G. Schuhfried GmbH.

Hornke, L. F. (2002). Item-generation models for higher order cognitive functions. In S. H. Irvine & P. C. Kyllonen (Eds.), Item generation for test development (pp. 159-178). London: Lawrence Erlbaum.

Hornke, L.F., Etzel, S. & Küppers, A.. (2000). Konstruktion und Evaluation eines adaptiven Matrizentests. Diagnostica, 46, 182-188.

Wagner, M. M. (1999). Lexikon-Wissen-Test (LEWITE) Leistungstest- und/oder Objektiver Test zur Beurteilung der Realitätsangemessenheit der Selbsteinschätzung. Unpublished Dissertation, University of Vienna, Vienna.