Interpreting Action Research Arm Test Assessment Scores to Plan Treatment. Occupational Therapy Journal of Research; 2019.



Rasch keyforms can help interpret clinical assessment scores. The Action Research Arm Test (ARAT) is a commonly used assessment, yet no keyform currently exists.


To provide a keyform for the ARAT and demonstrate how a clinician can use the keyform to design optimally challenging rehabilitation sessions.


Secondary analysis of ARAT data (n=122) using confirmatory factor and Rasch analyses to examine the measurement properties and generate a keyform.


The Item standardized factor loadings were >0.40 (range 0.82-0.96) and R-square values were >0.60 (range 0.65-0.96). All items exhibited adequate infit statistics with point measure correlations >0.60 (range 0.72-0.97). Person reliability was 0.98 and person separation was 7.07. Item difficulty measures ranged from −2.78 logits to 2.64 logits.


The ARAT has strong measurement properties, and a keyform was provided. We showed how the keyform can be utilized by clinicians to interpret scores, set goals, and plan treatment.Keywords: stroke, rehabilitation, upper extremity, Rasch analysis, assessment

Approximately 85% of stroke survivors in the acute stage of stroke exhibit upper extremity (UE) hemiparesis and 55-75% stroke survivors in the chronic stage of stroke experience UE functional limitations (Mozaffarian et al., 2016Wolf et al., 2006). Thus, UE motor recovery is a priority rehabilitation goal (Barker & Brauer, 2005Bohannon, Andrews, & Smith, 1988). Rehabilitation professional practice guidelines (American Occupational Therapy Association, 2014American Physical Therapy Association, 2014) direct therapists to address impairments and limitations identified by standardized assessments. However, standardized assessments are infrequently used (Abrams et al., 2005Kay, Myers, & Huijbregts, 2001Menon-Nair, Korner-Bitensky, Wood-Dauphinee, & Robertson, 2006Swinkels, van Peppen, Wittink, Custers, & Beurskens, 2011Torenbeek, Caulfield, Garrett, & Van Harten, 2001) in part because of therapists’ difficulty interpreting scores (Jette, Halbert, Iverson, Miceli, & Shah, 2009Swinkels et al., 2011). Velozo and Woodbury (2011) argued that an assessment score, derived by summing ordinal item ratings, fails to inform clinical decisions because it does not detail specific behaviors that a client performs entirely or in part. The growing emphasis on clinical use of standardized assessments necessitates development of more clinically-interpretable scoring methods to inform care-planning.

The Action Research Arm Test (ARAT) (Lyle, 1981) is a reliable, valid (Lang, Wagner, Dromerick, & Edwards, 2006Platz et al., 2005) standardized assessment of post-stroke hemiparetic UE functional limitation. The instrument contains 19 items grouped into 4 subtests; grasp, grip, pinch, and gross motor. Items within the subtests are arranged in an item-difficulty order originally proposed by Lyle et al. (1981). The assessment can be administered in two ways; testing only the items relevant to the client’s ability as per original instructions (Lyle, 1981) or testing all items (van der Lee, Roorda, Beckerman, Lankhorst, & Bouter, 2002Yozbatiran, Der-Yeghiaian, & Cramer, 2008). Item performance is rated on a 4-point scale (0=unable; 1=partial; 2= abnormal; 3=normal) then item ratings are summed and reported out of 57 points with higher score indicating greater UE function.

The ARAT aggregate score has limited clinical interpretability because it does not indicate which items were easy, difficult, or optimally challenging for the client. For example, a score of 50 indicates less limitation than a score of 20, but does not specify if the client could grip cylindrical objects or pinch small objects. A therapist needs detailed behavioral information to tailor therapy to clients’ specific abilities. One way to gain this detailed information is to examine both the aggregate score and the client’s pattern of item responses.

Rasch analysis offers a framework and method to increase an assessment’s clinical interpretability with an output called a “keyform.” A keyform is a pencil-paper scoring template (Kielhofner, Dobria, Forsyth, & Basu, 2005Linacre, 1997Velozo, Warren, Hicks, & Berger, 2013) upon which a therapist records item ratings then examines the pattern of item responses. The keyform is formatted according to Rasch model expectations that a client will successfully accomplish easy items, have less success with difficult items, and have a 50% probability of success on items with a difficulty level similar to his/her ability level. The item response pattern will be consistent with this expectation; the client will have a consistent pattern of good ratings on easy items, a consistent pattern of poor ratings on difficult items, and a fluctuation of ratings on a region of the keyform called the transition zone (Velozo & Woodbury, 2011) where item difficulties match the clients’ ability level. The transition zone indicates behaviors (tested by assessment items) that are optimally difficult for an individual (Linacre, 1997). This information is important because a key ingredient of motor recovery therapy is task-specific practice (TSP) at the optimal, just-right, level of challenge (Guadagnoli & Lee, 2004). Therefore, the keyform allows a clinician to identify appropriately challenging behaviors, represented by items in the transition zone, to target in TSP sessions (Bode, Heinemann, Kozlowski, & Pretz, 2014Kielhofner et al., 2005Velozo & Woodbury, 2011Woodbury, Velozo, Richards, & Duncan, 2013). Rasch keyforms have been used to improve the scoring, interpretation and clinical utility of assessments of vision (Velozo et al., 2013), disorders of consciousness (Pape, Mallinson, & Guernon, 2014), functional independence (Bode et al, 2014), pediatric gross motor function (Avery, Russell, Raina, Walter, & Rosenbaum, 2003), and post-stroke UE impairment (Woodbury et al., 2016). To our knowledge, there is only one UE stroke assessment (Velozo & Woodbury, 2011Woodbury et al., 2016) with a keyform and there are no keyforms for measuring post-stroke UE function. Given the dearth of rehabilitation keyforms, few clinicians are aware that they exist or understand how they are used.

This paper, using the ARAT as an exemplar, demonstrates how keyforms may inform care planning. The purposes are to generate an ARAT keyform and demonstrate how a clinician can use it to design optimally challenging rehabilitation sessions. Several groups have applied Rasch or other item response analysis methods to the ARAT (Chen, Lin, Wu, & Chen, 2012Koh et al., 2006van der Lee et al., 2002) and found that the rating scale was adequate and that the ARAT is a precise and reliable measure. Therefore, we believe the ARAT is suitable for keyform development.Go to:


Participants and Study Design

A secondary analysis of existing data pooled from five stroke rehabilitation intervention studies conducted at three academic health/research centers was performed. Study sites were located in mid-sized cities in the United States which provided a diverse sample.

The full 19-item ARAT (i.e., all items tested) was administered at each site by trained therapists using similar standardized procedures. De-identified pre-intervention data were obtained from each site in accordance with local data sharing regulations. It was feasible to pool data because the studies had similar eligibility criteria. Participants were included if they experienced a stroke ≥3 months prior and had UE hemiparesis. Participants were excluded if they had severe hemiparesis (e.g., no palpable triceps contraction); severe spasticity (Modified Ashworth Scale >3 in elbow, wrist, or fingers); were unable to follow one-step commands (due to severe aphasia or cognitive impairment); or reported UE pain. All procedures were approved by local institutional review boards and adhered to the ethical standards of the revised Declaration of Helsinki. All participants or their proxies provided informed consent.

Data Analysis

Confirmatory Factor Analysis

Confirmatory factor analysis (CFA) tested the hypothesis that all items measure a single skill (unidimensionality) (Kline, 2005Velozo & Woodbury, 2011). Four models were fit; 1-factor, 2-factor (Factor 1: grasp, grip, pinch; Factor 2: gross motor) 3-factor (Factor 1: grasp, grip; Factor 2: pinch; Factor 3: gross motor), and 4-factor (Factor 1: grasp; Factor 2: grip; Factor 3: pinch; Factor 4: gross motor) (M-Plus version 6)(Muthén & Muthén, 2010) using weighted least-squares means with variance adjusted (Brown, 2006). Model fit was verified with the comparative fit index (CFI), Tucker Lewis index (TLI) and Root Mean Square Error of Approximation (RMSEA). The CFI and TLI compare the hypothesized model to a null model; values >0.95 indicate fit. The RMSEA compares actual to expected model parameters; values <0.06 indicate fit. The RMSEA is influenced by sample size (Kenny, Kaniskan, & McCoach, 2014) therefore, each item’s standardized factor loadings and R-square values were also examined to indicate the level of association between the item and underlying construct; factor loadings values >0.40 and item R-square values >0.60 indicate adequate association (Hu & Bentler, 1999Kline, 2005Marsh, Balla, & McDonald, 1988). The fit indices for each model were compared to determine the one that best fit the data.

Rasch Analysis

The Rasch rating scale model (Winsteps v.3.75) (Bond & Fox, 2007Wright & Stone, 1979) was applied to examine ARAT measurement properties (Linacre, 2012).

Rating Scale

The rating scale was examined to assure that lower ratings were consistently given to lower functioning individuals and vice versa. Following Linacre’s guidelines (Linacre, 2002), adequate rating scale diagnostics were defined as; >10 observations per rating category, average measures advance monotonically with each category, category thresholds increase with each category, and the outfit mean square residual value for each category <2.0.

Item Fit

Item infit statistics, reported as mean square residuals (MnSq) with associated standardized z-values, indicated how well the data fit the Rasch model (Wright & Stone, 1979), specifically with regards to items that closely match subjects’ ability (Wright, Linacre, Gustafson, & Martin-Lof, 1994). Clinical observation MnSq values <0.5 or >1.7 with Z score >2.0 =misfit (Wright et al., 1994).

Point Measure Correlations

Point measure correlations indicated the degree to which each item represented the underlying construct: <0.30=weak, 0.31-0.59=moderate, >0.60=strong (Andresen, 2000).

Person Reliability and Separation

Person reliability, analogous to Cronbach’s alpha, and person separation indicated how well the items differentiated ability levels. The person separation index was used to calculate the number of statistically distinct strata (i.e., ability levels) into which the assessment divided subjects using the equation: (4*Separation index +1)/3 (Fisher, 1992Wright & Masters, 1982). We considered reliability values >0.90, separation index values >2, and strata ≥3 as adequate for this clinical application (Wright & Masters, 1982).

Item Difficulty Hierarchy

Rasch analysis calculated item-difficulty and person-ability measures to a single metric (logit units which are equal unit intervals) so that items were ordered from easy-to-difficult and the sample was ordered from less-to-more ability. We examined item-person map to ascertain the congruency between the Rasch derived and originally proposed (Lyle, 1981) item difficulty hierarchies.

Keyform and its Use for Goal Setting and Treatment Planning

An ARAT keyform was generated from the “general keyforms” output option (Winsteps v.3.75) (Linacre, 1997). To illustrate the process of using the keyform to interpret the ARAT score, we randomly selected a participant with high ability (ARAT score 39-57), displayed his/her raw data on the keyform, identified a transition zone, described appropriately challenging therapy goals, and provided examples of how the goals could link to treatment.Go to:



The pooled n=122 dataset yielded a diverse sample with a wide range of ability (Table 1). Participants averaged 57 years of age (SD=13.7) with ischemic stroke and moderate UE impairment (ARAT total score, mean=27.3, SD=16.6).

Table 1

Demographic and descriptive data

SampleStudy 1Study 2Study 3Study 4Study 5
Age, years, M (SD)57.2 (13.7)54.4(10.4)66.7 (11.7)59.6 (12.0)46.7 (14.0)59.2 (12.9)
Female, n (%)63 (51.6)13 (54.2)10 (50.0)10 (50.0)15 (65.2)15 (42.9)
Caucasian, n (%)89 (73.0)12 (50.0)14 (70.0)16 (80.0)21 (91.3)26 (74.3)
Stroke Etiology
 Ischemic, n (%)75 (61.5)14 (58.3)17 (85.0)8 (40.0)16 (69.5)20 (57.1)
Stroke Hemisphere
 Right, n (%)50 (41.0)7 (29.2)11 (55.0)8 (40.0)14 (69.9)10 (28.6)
 Right, n (%)111 (91.0)21 (87.5)17 (85.0)19 (95.0)22 (95.7)32 (91.4)
Dominant side affected, n (%)60 (49.2)14 (66.7)7 (53.8)9 (45.0)10 (43.5)20 (57.1)
ARAT at Baseline
M (SD)27.3 (16.6)25.9 (19.7)31.2 (15.7)29.7 (15.9)23.0 (14.8)27.6 (16.3)
 Low*n (%)45 (36.9)10 (41.7)5 (25.0)5 (25.0)12 (52.2)13 (37.2)
 Moderaten (%)44 (36.1)8 (33.3)8 (40.0)8 (40)7 (30.4)13 (37.2)
 Highn (%)33 (27.0)6 (25.0)7 (35.0)7 (35.0)4 (17.4)9 (25.7)

Note. ARAT=Action Research Arm Test; ARAT score range, 0-57;*Low ability=scores 0-19;Moderate ability=scores 20-38;High ability=scores 39-57; Ability level cut-offs defined by authors for descriptive purposes.

Confirmatory Factor Analysis

The results for the 1-factor, 2-factor, 3-factor, and 4-factor analyses are presented in Table 2. All models had similar fit indices, thus we defined the most parsimonious 1-factor model as best fitting the data (Brown, 2006). The 1-factor CFI and TLI both met criterion (CFI=0.99, TLI=0.99). The RMSEA did not meet criterion (RMSEA=0.15), however all standardized estimates and R-square values exceeded criterion (estimates all >0.4, range 0.82-0.98; and R-square >0.6, range 0.65-0.96) thus supporting essential unidimensionality (Kline, 2005).

Table 2

Confirmatory Factor Analyses

CFITLIRMSEAChi-Square Model fitItem Standardized Factor Loadings (range)R-square values (range)

Note. CFI=Comparative fit index; TLI=Tucker Lewis index; RMSEA=Root mean square error of approximation.

Rating Scale

For each rating category: there were >10 observations, average measures increased monotonically, the category thresholds increased with each category, and the outfit MnSq values <2.0 (Table 3).

Table 3

Rating scale diagnostics based on Linacre’s criteria

CategoryObserved Count (n=1935)Percentage of CountsAverage Logit MeasureOutfitRasch-Andrich Threshold
0 (cannot perform)64728−3.541.26None
1 (partially performed)36316−1.580.64−1.97
2 (performed abnormally)860370.871.02−1.32
3 (performs normally)427194.361.093.29

Item Fit

All items exhibited adequate infit statistics (Table 4).

Table 4

Item fit statistics and item difficulty hierarchy by subtest

ItemStandardized Factor LoadingsInfit MnSqInfit ZstdPoint measure correlationOriginally proposed hierarchy for subtestItem Difficulty MeasureSE of the difficulty measure
Grasp Items
 Block 2.5 cm0.940.68−2.60.83Easiest−1.320.17
 Block 5 cm0.980.55−3.80.87−0.890.17
 Cricket ball0.950.58−3.40.88−0.440.17
 Block 7.5 cm0.970.72−2.10.89−0.200.17
 Block 10 cm0.950.77−1.60.88Most difficult0.510.17
Grip Items
 Tube 2.25 cm0.930.81−1.40.96Easiest−1.390.17
 Tube 1 cm0.940.74−1.90.84−1.160.17
 Pour water0.890.84−1.10.86Most difficult0.320.17
Pinch Items
 Marble index & thumb0.941.090.60.88Easiest0.380.17
 Marble middle & thumb0.950.92−0.50.891.050.18
 Marble ring & thumb0.931.000.10.841.360.18
 Ball bearing index & thumb0.931.191.20.941.640.18
 Ball bearing middle & thumb0.961.261.60.822.100.18
 Ball bearing ring & thumb0.921.673.70.87Most difficult2.640.19
Gross Movement Items
 Hand to mouth0.741.151.00.97−2.780.19
 Hand to top of head0.881.352.30.72−1.200.17
 Hand behind head0.821.533.20.72Most difficult−0.50.17

Open in a separate window

Point Measure Correlations

All point measure correlations were >0.60 (Table 4).

Person Reliability and Separation

Person reliability =0.98, person separation =7.07, and the subjects were divided into 9 statistically distinct levels.

Item Difficulty Hierarchy

Item difficulty measures (Table 4) ranged from −2.78 logits (hand to mouth) to 2.64 logits (ball bearing ring & thumb). Within each subtest, item difficulties were congruent with the original item difficulty order. For example, in the grasp subscale, the 2.5 cm block was the easiest and the 10 cm block was the most difficult item. Some items had similar item difficulty measures. For example, the washer and pouring water items from the grip subtest and the marble index and thumb item from the pinch subtest had similar difficulty levels (0.24, 0.32 and 0.38 logits respectively). The item-person map (Figure 1) illustrates that the items were well-matched to the range of the samples’ ability measures (i.e., mean of item difficulties was 0.0 and mean of person abilities was 0.13).Open in a separate windowFigure 1

ARAT 19 Person-Item Map. Person abilities (X) (left side of figure) and item difficulties (right side of figure) are displayed linearly on a logit (equal unit intervals) scale. Persons of high ability and difficult items are located at the top of the figure. Persons of low ability and easy items are located on the bottom of the figure. A wide distribution (−6 to 7 logits) of person abilities and item difficulties are present. Item difficulties are mean item difficulties rather than the spread of the rating scale threshold. The mean person abilities and item difficulties are similar indicating that item difficulties are well matched to person abilities. M, mean; S, 1 SD; T, 2 SDs.


Figure 2 presents an ARAT keyform for an individual with high ability (ARAT score=42). On the figure’s right side, items are arranged in descending difficulty order from hard (top) to easy (bottom). On the figure’s left side, the 4-point rating scale is plotted relative to the measurement metric at the figure’s base. The ratings stair-step upwards from left to right as item difficulties increase. The participant’s actual item ratings are circled. Consistent with Rasch model expectations, this person performed easy items well (consistent ratings =3 on easy items) but had difficulty with harder items (consistent ratings =2 on the majority of hard items). In the middle, the ratings deviate back and forth between adjacent ratings, e.g., 3 to/from 2. This region, the transition zone, indicates behaviors (tested by specific items) for which the participant has some, but not full, ability to perform. Conceptually this zone represents the point at which the patient is transitioning from one level ability to the next higher level of ability and mathematically has a 50% probability of receiving adjacent item ratings. Behaviors tested by items in the transition zone are expected to recover sooner than behaviors tested by items above the transition zone. Therefore, transition zone items (e.g. cricket ball, pour water) suggest grasp patterns (e.g., spherical grasp, cylindrical grasp) that can be addressed in shorter-term therapy goals. Items above the transition zone (e.g. ball bearing) suggest prehension patterns (e.g., palmar prehension) that can be addressed in longer-term goals. A second keyform is also provided as an example of an individual with moderate ability (Figure 2).Open in a separate windowFigure 2

Keyform for subject with high ability and subject with moderate ability.

Keyform for Treatment Planning

The goals suggested by the keyform transition zone can be linked to treatment activities. For example, the keyform (Figure 2) indicated that the therapist address the prehension patterns associated with the “cricket ball” (spherical grasp), “pouring water,” (cylindrical grasp), and “marble index and thumb” (pincher grasp) for short term goals. Table 5 links this information to possible therapy activities. For example, a therapist may set a functional goal (e.g. self-feeding) that requires use of a pincher grasp and then select and grade repetitive task-specific practice activities that promote practice and mastery of this prehension pattern.

Table 5

Linking ARAT scores to treatment

ARAT ItemPrehension or grasp patternExample of GoalPossible Treatment Activities
Cricket ballSpherical grasp
An external file that holds a picture, illustration, etc.
Object name is nihms961884t1.jpg
In two weeks, client will be able to use paretic UE to pick up and toss ball 90% of trials to participate in leisure activities.Repetitive task-specific practice using paretic UE to pick up objects of various sizes and weights (e.g. ball, orange), use paretic UE to stabilize container, practice using computer mouse; open lever handle doors
Pouring waterCylindrical grasp
An external file that holds a picture, illustration, etc.
Object name is nihms961884t2.jpg
In three weeks, client will be able prepare simple breakfast (cereal, juice) using paretic UE to pour cereal, milk, and juice without spilling contents.Repetitive task-specific practice using paretic UE to pick up objects of various sizes, weights, and contents (e.g. can, jar cup, water, sugar) and pour contents into container; water flowers; use hammer
Marble, index and thumbPincher grasp
An external file that holds a picture, illustration, etc.
Object name is nihms961884t3.jpg
In three weeks, the client will be able to use paretic UE to self-feed small finger foods 80% of trials.Repetitive task-specific practice using paretic UE to pick up/manipulate objects of various shapes and sizes (e.g. fasten buttons, place coins in pouch, play leisure game with small game pieces)

Open in a separate windowGo to:


This study aimed to generate an ARAT keyform and demonstrate its use for setting goals and planning treatment. First, we explored the ARAT’s item-level measurement properties to assure that it accurately measured UE function. While others have conducted factor analyses, item response and Rasch analyses of the ARAT, the unique contribution of this paper is the ARAT keyform. The ARAT keyform enables clinicians to link patient evaluation to the design of an individualized treatment plan.


We found that the full 19-item ARAT is unidimensional, a result consistent with the literature (Chen et al., 2012Koh et al., 2006van der Lee et al., 2002). Lyle (1981) suggested the assessment be scored as 4 sub-tests. Our analysis supports current practice which is to sum all items and report an aggregate score.

Rating scale

The 4-point rating scale was adequate in this sample, a finding inconsistent with Chen et al. (2012) who suggested combining categories 0 and 1 and Koh et al. (2006) who combined categories 1 and 2. Differences in sample characteristics may explain the discrepancies. For example, Chen’s (2012) sample had higher and less varied ARAT scores (mean=35.78, SD=16.7) suggesting they may have not had enough lower-ability participants to adequately evaluate lower rating categories. Similarly, Koh’s (2006) sample had lower ARAT scores (median=5.0, IQR=0-40) thus may not have had enough higher ability participants to evaluate higher ratings.

Item Fit

We showed that all items fit the Rasch model. In contrast, Chen et al. (2012) found 2 items (hand behind head, hand to top of head), Koh et al. (2006) found 3 items (ball bearing ring/thumb, marble ring/thumb, 10 cm block) and van der Lee et al. (2002) found 4 items (ball bearing ring/thumb, ball bearing middle/thumb, ball bearing index/thumb, marble ring/thumb) that failed to fit the model tested. In our study the ball bearing ring to thumb item approached misfit which, when taken together with the literature, may indicate “noise” in this item. One interpretation is that factors other than the construct being measured influence the response to a misfitting item. Perhaps biomechanical or anatomical factors influence a person’s ability to perform this prehension pattern more so than prehension patterns tested by other items.

Person Reliability and Separation

Similar to the body of literature (Chen et al., 2012), we found the ARAT to be reliable and precise. The person separation value (7.07) indicated the ARAT is able to differentiate the sample’s ability into 9 strata. In contrast, Chen et al. (2012) reported a lower value (3.83) and differentiation into 4 strata. The difference in results is likely because our sample had a wider range of ARAT scores, thus more ability strata. However, the clinical relevance of being able to detect 9 versus 4 ability strata remains unknown and could be explored in future studies.

Item Difficulty Hierarchy

The Rasch-derived item difficulty order was consistent with Lyle’s originally hypothesized order, a finding consistent with van der Lee et al. (2002). However, our results differ from Chen et al. (2012) and Koh et al. (2006) who found that pinch (Chen) and grip (Koh) subtest items were not consistent with Lyle’s order. Interestingly, our results indicate that several items had similar difficulty measures. van der Lee et al. (2002) removed 4 items having similar difficulty measures to improve the assessment’s clinically efficiency. However, from a clinical perspective similarly difficult items may provide important information. For example, two grip items and one pinch item had similar difficulty measures. But, item removal should be approached with caution because a therapist may require information about both gross motor and pinch skills in order to comprehensively understand the person’s ability.


Koh et al. (2006) discussed the importance of establishing the clinical usefulness of the ARAT score. We approached the issue of clinical interpretability by linking the ARAT’s qualitative content (the behaviors tested by each item) to its quantitative aspects (item ratings) via a new scoring method, the keyform. Our findings confirm the ARAT has strong measurement properties, which made it suitable for a keyform. Previous studies presented keyforms for other rehabilitation assessments and demonstrated their application to clinical practice (Bode et al., 2014Pretz et al., 2015Velozo et al., 2013Velozo & Woodbury, 2011). The ARAT keyform now joins this growing family of keyforms available to therapists.

We illustrated how a clinician can use the ARAT keyform to design treatment. As shown in the examples, a therapist could use a keyform to plan treatment by first administering the ARAT and circling the patient’s item ratings on the keyform. Next, the therapist would locate the transition zone by following the consistent pattern of ratings at the bottom of the keyform upwards until it deviates to the next lower adjacent rating (e.g. from a rating of 3 to 2 or from a rating of 2 to 1, or from a rating of 1 to 0). This deviation will mark the lower boundary of the transition zone which is defined as the first 6 consecutive items for which at least 4 of these items received the next lowest rating. These 6 items represent the expected next steps in the subject’s transition from a current skill level to a greater skill level and will therefore be the movements to target in the task practice therapy session. For example, the participant (higher ability) represented in Figure 2 received high ratings on easy items, lower ratings on more difficult items, and had fluctuating ability on moderately difficult items evidenced as a region of back-and-forth ratings in the middle of the keyform. This region is the transition zone because it represents the client’s transition from one ability level to the next higher ability level. ARAT items within this zone are neither too easy nor too difficult and indicate different grasp/prehension patterns at the “just right” challenge level. The therapist can then use this information to plan treatment because the transition zone displays which items a client is more likely to improve on in the short term thus forming the basis for therapy goals and intervention approaches. The goals and activities presented are examples of ways a clinician could use this information to personalize goals and treatment activities that are meaningful to each client. The ARAT keyform offers a unique and novel tool that clinicians can use to facilitate treatment planning which may enhance patient outcomes.

Study Limitations

A limitation of this study was the relatively small sample size and corresponding lack of statistical power used to conduct the CFA. In addition, keyforms are less useful for individuals of very high/low ability because there is greater measurement error at these extremes due to floor/ceiling effects (Velozo & Woodbury, 2011). Clinicians may find that using a keyform with these individuals is less effective however, these individuals are not being measured well by the ARAT regardless of whether there is a keyform. The keyform presented highlights potential short term/long term goals. However, the demarcation is inexact and provides a general guideline to illustrate how goals can be established using the keyform.

Future Studies

While we believe that the ARAT keyform is a useful tool for clinicians, future studies should examine the feasibility of using the keyform in clinical practice to determine whether the keyform has an impact on clinicians’ use of the ARAT. Future studies should also examine whether using the ARAT keyform enhances UE outcomes.

Implications for Occupational Therapy Practice

Our findings indicate that the ARAT has strong measurement properties and provides a logical basis to generate a keyform. The keyform may facilitate occupational therapists use of the ARAT in clinical practice due to its applicability in goal setting and treatment planning.Go to:


Supported by: National Institutes of Health (UL1RR024153, UL1TR000005, R21 AT002110-01, R01 AT004454-05); University of Pittsburgh, School of Health and Rehabilitation Science Research Development Fund; University of Pittsburgh’s Office of Research Central Research Development Fund; Ralph H. Johnson VA Medical Center and the Office of Research Development, Rehabilitation Research and Development, Department of Veterans Affairs; VA Merit Review Award (NO799-R); Allergan, Inc.Go to:


Conflict of Interest: The authors declare that there is no conflict of interest (Grattan, Velozo, Skidmore, Page & Woodbury).

Research Ethics and Patient Consent: All research procedures were approved by the Medical University of South Carolina Institutional Review Board (PRO00013941); University of Pittsburgh Institutional Review Board (PRO07070003; PRO07110071; PRO12040650; PRO11110467); The Ohio State University Institutional Review Board PRO2011H0216. Interpreting Action Research Arm Test assessment scores to plan treatmentGo to:

Contributor Information

Emily S. Grattan, Department of Health Sciences & Research, Medical University of South Carolina. Division of Occupational Therapy, Medical University of South Carolina.

Craig A. Velozo, Division of Occupational Therapy, Medical University of South Carolina.

Elizabeth R. Skidmore, Department of Occupational Therapy, University of Pittsburgh.

Stephen J. Page, Division of Occupational Therapy, The Ohio State University.

Michelle L. Woodbury, Department of Health Sciences & Research, Medical University of South Carolina, Division of Occupational Therapy, Medical University of South Carolina, Ralph H. Johnson VA Medical Center, Charleston, SC.Go to:


  • Abrams D, Davidson M, Harrick J, Harcourt P, Zylinski M, Clancy J. Monitoring the change: Current trends in outcome measure usage in physiotherapy. Manual Therapy. 2005;11(1):46–53. [PubMed] [Google Scholar]
  • American Occupational Therapy Association. Occupational Therapy Practice Framework: Domain and Process (3rd edition) American Journal of Occupational Therapy. 2014;68:S1–S48. [Google Scholar]
  • Andresen EM. Criteria for assessing the tools of disability outcomes research. Archives of physical medicine and rehabilitation. 2000;81(Supplement 2):S15–S20. [PubMed] [Google Scholar]
  • Avery LM, Russell DJ, Raina PS, Walter SD, Rosenbaum PL. Rasch analysis of the Gross Motor Function Measure: validating the assumptions of the Rasch model to create an interval-level measure. Archives of Physical Medicine & Rehabilitation. 2003;84(5):697–705. [PubMed] [Google Scholar]
  • Barker RN, Brauer SG. Upper limb recovery after stroke: The stroke survivors’ perspective. Disability and Rehabilitation. 2005;27(20):1213–1223. doi: 10.1080/09638280500075717. [PubMed] [CrossRef] [Google Scholar]
  • Bode RK, Heinemann AW, Kozlowski AJ, Pretz CR. Self-scoring templates for motor and cognitive subscales of the FIM instrument for persons with spinal cord injury. Arch Phys Med Rehabil. 2014;95(4):676–679.e675. doi: 10.1016/j.apmr.2013.11.009. [PubMed] [CrossRef] [Google Scholar]
  • Bohannon RW, Andrews AW, Smith MB. Rehabilitation goals of patients with hemiplegia. International Journal of Rehabilitation Research. 1988;11(2):181–184. [Google Scholar]
  • Bond TG, Fox CM. Applying the Rasch model: Fundamental measurement in the human sciences. 2nd. Routledge; 2007. [Google Scholar]
  • Brown TA. Confirmatory factor analysis for applied research. New York: Guilford Press; 2006. [Google Scholar]
  • Chen HF, Lin KC, Wu CY, Chen CL. Rasch validation and predictive validity of the action research arm test in patients receiving stroke rehabilitation. Arch Phys Med Rehabil. 2012;93(6):1039–1045. doi: 10.1016/j.apmr.2011.11.033. [PubMed] [CrossRef] [Google Scholar]
  • Fisher WP. Reliability statistics. Rasch measurement transactions. 1992;6(3):238. [Google Scholar]
  • Guadagnoli MA, Lee TD. Challenge point: a framework for conceptualizing the effects of various practice conditions in motor learning. J Mot Behav. 2004;36(2):212–224. doi: 10.3200/jmbr.36.2.212-224. [PubMed] [CrossRef] [Google Scholar]
  • Guide to Physical Therapy Practice 3.0. Alexandria, VA: American Physical Therapy Association; 2014. [Google Scholar]
  • Hu Lt, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural equation modeling: a multidisciplinary journal. 1999;6(1):1–55. [Google Scholar]
  • Jette DU, Halbert J, Iverson C, Miceli E, Shah P. Use of standardized outcome measures in physical therapist practice: perceptions and applications. Phys Ther. 2009;89(2):125–135. [PubMed] [Google Scholar]
  • Kay TM, Myers AM, Huijbregts MPJ. How far have we come since 1992? A comparative survey of physiotherapists’ use of outcome measures. Physiotherapy Canada. 2001;53:268–275. [Google Scholar]
  • Kenny DA, Kaniskan B, McCoach DB. The performance of RMSEA in models with small degrees of freedom. Sociological Methods & Research. 2014 0049124114543236. [Google Scholar]
  • Kielhofner G, Dobria L, Forsyth K, Basu S. The construction of keyforms for obtaining instantaneous measures from the occupational performance history interview rating scales. OTJR: Occupation, Participation and Health. 2005;25(1):23–32. [Google Scholar]
  • Kline RB. Principles and Practice of Structural Equation Modeling 2005. New York, NY: Guilford; 2005. [Google Scholar]
  • Koh CL, Hsueh IP, Wang WC, Sheu CF, Yu TY, Wang CH, Hsieh CL. Validation of the action research arm test using item response theory in patients after stroke. J Rehabil Med. 2006;38(6):375–380. doi: 10.1080/16501970600803252. [PubMed] [CrossRef] [Google Scholar]
  • Lang C, Wagner J, Dromerick A, Edwards D. Measurement of upper-extremity function early after stroke: Properties of the Action Research Arm Test. Archives of physical medicine and rehabilitation. 2006;87:1605–1610. [PubMed] [Google Scholar]
  • Linacre J. Instantaneous measurement and diagnosis. Physical Medicine and Rehabilitation. 1997;11:315–324. [Google Scholar]
  • Linacre J. A user’s guide to Winsteps and Ministeps Rasch-model computer programs. Chicago, IL: WINSTEPS; 2012. [Google Scholar]
  • Linacre JM. Optimizing rating scale category effectiveness. J Appl Meas. 2002;3(1):85–106. [PubMed] [Google Scholar]
  • Lyle RC. A performance test for assessment of upper limb function in physical rehabilitation treatment and research. International Journal of Rehabilitation Research. 1981;4(4):483–492. [PubMed] [Google Scholar]
  • Marsh HW, Balla JR, McDonald RP. Goodness-of-Fit Indexes in Confirmatory Factor Analysis: The Effect of Sample Size. Psychological Bulletin. 1988;103(3):391–410. [Google Scholar]
  • Menon-Nair A, Korner-Bitensky N, Wood-Dauphinee S, Robertson E. Assessment of unilateral spatial neglect post stroke in Canadian acute care hospitals: are we neglecting neglect? Clinical rehabilitation. 2006;20(7):623–634. [PubMed] [Google Scholar]
  • Mozaffarian D, Benjamin EJ, Go AS, Arnett DK, Blaha MJ, Cushman M, Turner MB. Executive Summary: Heart Disease and Stroke Statistics—2016 Update: A Report From the American Heart Association. Circulation. 2016;133(4):447–454. doi: 10.1161/cir.0000000000000366. [PubMed] [CrossRef] [Google Scholar]
  • Muthén B, Muthén L. Mplus (Version 6) Los Angeles, CA: Muthen & Muthen; 2010. [Google Scholar]
  • Pape TL, Mallinson T, Guernon A. Psychometric properties of the disorders of consciousness scale. Archives of physical medicine and rehabilitation. 2014;95(9):1672–1684. doi: 10.1016/j.apmr.2014.04.015. [PubMed] [CrossRef] [Google Scholar]
  • Platz T, Pinkowski C, van Wijck F, Kim I, di Bella P, G J. Reliability and validity of arm function assessment with standardized guidelines for the FugI-Meyer Test, Action Research Arm Test and Box and Block Test: A multicentre study. Clin Rehabil. 2005;19:404–411. [PubMed] [Google Scholar]
  • Pretz CR, Kean J, Heinemann A, Kozlowski AJ, Bode R, Gebhardt E. A Multidimensional Rasch Analysis of the Functional Independence Measure based on the NIDILRR Traumatic Brain Injury Model Systems National Database. J Neurotrauma. 2015 doi: 10.1089/neu.2015.4138. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
  • Swinkels RA, van Peppen RP, Wittink H, Custers JW, Beurskens AJ. Current use and barriers and facilitators for implementation of standardised measures in physical therapy in the Netherlands. BMC Musculoskelet Disord. 2011;12:106. doi: 10.1186/1471-2474-12-106. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
  • Torenbeek M, Caulfield B, Garrett M, Van Harten W. Current use of outcome measures for stroke and low back pain rehabilitation in five European countries: first results of the ACROSS project. International Journal of Rehabilitation Research. 2001;24(2):95–101. [PubMed] [Google Scholar]
  • van der Lee JH, Roorda LD, Beckerman H, Lankhorst GJ, Bouter LM. Improving the Action Research Arm test: a unidimensional hierarchical scale. Clin Rehabil. 2002;16(6):646–653. [PubMed] [Google Scholar]
  • Velozo CA, Warren M, Hicks E, Berger KA. Generating clinical outputs for self-reports of visual functioning. Optom Vis Sci. 2013;90(8):765–775. doi: 10.1097/opx.0000000000000007. [PubMed] [CrossRef] [Google Scholar]
  • Velozo CA, Woodbury ML. Translating measurement findings into rehabilitation practice: An example using the Fugl-Meyer Assessment of the Upper Extremity with clients following stroke. Journal of Rehabilitation Research and Development. 2011;48(10):1211–1222. [PubMed] [Google Scholar]
  • Wolf SL, Winstein CJ, Miller JP, Taub E, Uswatte G, Morris D, Investigators, E. Effect of constraint-induced movement therapy on upper extremity function 3 to 9 months after stroke: the EXCITE randomized clinical trial. JAMA. 2006;296(17):2095–2104. doi: 10.1001/jama.296.17.2095. [PubMed] [CrossRef] [Google Scholar]
  • Woodbury ML, Anderson K, Finetto C, Fortune A, Dellenbach B, Grattan E, Hutchison S. Matching Task Difficulty to Patient Ability During Task Practice Improves Upper Extremity Motor Skill After Stroke: A Proof-of-Concept Study. Arch Phys Med Rehabil. 2016;97(11):1863–1871. doi: 10.1016/j.apmr.2016.03.022. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
  • Woodbury ML, Velozo CA, Richards LG, Duncan PW. Rasch Analysis Staging Methodology to Classify Upper Extremity Movement Impairment After Stroke. Archives of physical medicine and rehabilitation. 2013;94(8):1527–1533. doi: 10.1016/j.apmr.2013.03.007. [PubMed] [CrossRef] [Google Scholar]
  • Wright BD, Linacre JM, Gustafson J, Martin-Lof P. Reasonable mean-square fit values. Rasch measurement transactions. 1994;8(3):370. [Google Scholar]
  • Wright BD, Masters GN. Rating Scale Analysis. Rasch Measurement. ERIC; 1982. [Google Scholar]
  • Wright BD, Stone MH. Best Test Design. Rasch Measurement 1979 [Google Scholar]
  • Yozbatiran N, Der-Yeghiaian L, Cramer SC. A standardized approach to performing the action research arm test. Neurorehabil Neural Repair. 2008;22(1):78–90. doi: 10.1177/1545968307305353. [PubMed] [CrossRef] [Google Scholar]

Table of Contents

Share on facebook
Share on twitter
Share on linkedin

Leave a Reply