Obsidian | Level 7

## Scoring survey data efficiently

Hello! I'm writing code to analyze results from a multiple choice survey. My code is below. It works fine, but I'm pretty sure there is a better way of doing this than writing out hcp_score=hcp_score+1 a bunch of times. Does anyone have any suggestions? Some survey questions have multiple answer options with different point values, so I can't just sum the variables. Thank you!

``````hcp_score=0;
if hcp_screen=1 then hcp_score=hcp_score+1;
else if hcp_screen=2 then hcp_score=hcp_score+0.5;
if hcp_home=1 then hcp_score=hcp_score+1;
else if hcp_home=2 then hcp_score=hcp_score+0.5;
if hcp_list=1 then hcp_score=hcp_score+1;
if hcp_cohort=1 then hcp_score=hcp_score+1;
if hcp_restricted=1 then hcp_score=hcp_score+1;
if hcp_singlefac=1 then hcp_score=hcp_score+1;
if hcp_edu_covid=1 then hcp_score=hcp_score+0.25;
if hcp_edu_sick=1 then hcp_score=hcp_score+0.25;
if hcp_edu_ip=1 then hcp_score=hcp_score+0.25;
if hcp_edu_change in (1,9) then hcp_score=hcp_score+0.25;``````
6 REPLIES 6
Super User

## Re: Scoring survey data efficiently

It looks like those are weights applied to each variable if it's 1 or 1/9 in the case of screen/home/educ_change?

Obsidian | Level 7

## Re: Scoring survey data efficiently

Hi! Sorry, the scoring wasn't really clear. Basically each question is worth one point. There are a few questions like hcp_screen and hcp_home that are always/sometimes/never questions, so "sometimes" is given 0.5 points. The last four questions are sub-questions within hcp_edu, which is worth one point total. The weird coding with hcp_edu_change is because there is an N/A option.
PROC Star

## Re: Scoring survey data efficiently

You could have an array of 10 variables, and a corresponding array of values to add (i.e. an array of 1's and 0.25, etc.), as in:

``````data want (drop=v);
set have;
array check1 {*} hcp_screen hcp_home hcp_list hcp_cohort hcp_restricted
hcp_singlefac hcp_edu_covid hcp_edu_sick hcp_edu_ip
hcp_edu_change ;
array value1 {10} _temporary_ (6*1,4*0.25) ;

hcp_score=0;
do v=1 to dim(check1);
if check1{v}=1 then hcp_score=hcp_score+value1{v} ;
end;

array check2 {*} hcp_screen hcp_home;
do v=1 to dim(check2);
if check2{v}=2 then hcp_score=hcp_score+0.5;
end;

if hcp_edu_change=9 then hcp_score=hcp_score+0.25;
run;
``````

The above has 3 loops, one for each value to look for: 1, 2, and 9 (yes the check for 9 is not actually a loop, but think of it as a loop over on variable).

And if you want to reduce it to a single loop, make a two-dimensional array, as in:

``````data want2 (drop=v);
set have;
array check {*} hcp_screen hcp_home hcp_list hcp_cohort hcp_restricted
hcp_singlefac hcp_edu_covid hcp_edu_sick hcp_edu_ip hcp_edu_change ;

array values {9,10} _temporary_
(6*1, 4*0.25    /*row 1, results for variable=1 */
,2*0.5,8*.      /*row 2, results for variable=2 */
,60*.           /*rows 3-8*/
,9*.,0.25 )     /*row 9*/ ;

hcp_score=0;
do v=1 to 10;
hcp_score=hcp_score+values{check{v},v};
end;
run;``````

The latter technique works only if the values to be searched for are integers.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Obsidian | Level 7

## Re: Scoring survey data efficiently

Thank you, this is very helpful!

Opal | Level 21

## Re: Scoring survey data efficiently

You could use the fact that SAS considers TRUE = 1 and FALSE = 0 to write a single scoring expression:

hcp_score =
(hcp_screen=1) * 1
+ (hcp_screen=2) * 0.5
+ ...
+ (hcp_edu_change in (1,9)) * 0.25;

PG
Super User

## Re: Scoring survey data efficiently

"Efficiency" may depend on how many variables with similar values.

Similar calculations for multiple variables often indicates an array approach might work. So the block of the comparisons to single values could be done a couple of ways.

Here is one:

```data trial;
set have;
hcp_score=0;
if hcp_screen=1 then hcp_score=hcp_score+1;
else if hcp_screen=2 then hcp_score=hcp_score+0.5;
if hcp_home=1 then hcp_score=hcp_score+1;
else if hcp_home=2 then hcp_score=hcp_score+0.5;
if hcp_edu_change in (1,9) then hcp_score=hcp_score+0.25;
/* below are the SINGLE value comparisons*/
array vars hcp_list hcp_cohort hcp_restricted hcp_singlefac
hcp_edu_covid hcp_edu_sick hcp_edu_ip ;
/* this array holds the COMPARISON values for the variables
IN ORDER*/
array vals {7} _temporary_ (1,1,1,1,1,1,1);
/* this has the score additions*/
array sc   {7} _temporary_ (1,1,1,1, 0.25,0.25,0.25);
do i=1 to dim(vars);
if vars[i]=vals[i] then hcp_score=hcp_score + sc[i];
end;
drop i;
run;```

The only change shown is for the Single value comparisons. The first array has the variables you need to compare, the second array, vals, contains the values that the variables are tested for equality and third has the amount to add to the score. The order of the variables, values and score additions must match in order.

You might see right off had that if I have to add 10 more variables I add them to the VARS list, then the value to compare, then score. The do loop with the number of elements in the vars list takes care of all of the conditional additions to the score total.

Note that this is really simple for single values. You could use it for multiple values by placing the variable on the list twice with the corresponding Values for comparsion and the corresponding score additions. That just is a tad harder to see right away.

If the above test code I show above works as expected then you could try

```data trial;
set have;
hcp_score=0;
array vars hcp_list hcp_cohort hcp_restricted hcp_singlefac
hcp_edu_covid hcp_edu_sick hcp_edu_ip
hcp_screen hcp_screen
hcp_home hcp_home
hcp_edu_change hcp_edu_change
;
/* this array holds the COMPARISON values for the variables
IN ORDER*/
array vals {13} _temporary_ (1,1,1,1,1,1,1,1,2,1,2,1,9);
/* this has the score additions*/
array sc   {13} _temporary_ (1,1,1,1, 0.25,0.25,0.25,1,0.5,1,0.5, 0.25,0.25);
do i=1 to dim(vars);
if vars[i]=vals[i] then hcp_score=hcp_score + sc[i];
end;
drop i;
run;```

Note that I just added the multi-value comparison variables to end of the list with the corresponding comparison and score values and adjusted the size of the temporary arrays to match.

One of the drawbacks of this approach is having a mismatched number of values and variables will likely cause  the error: Array Subscript out of range

And one or more warnings about "partial array initialization" (not enough values) or "Too many values for initialization of the array".

Discussion stats
• 6 replies
• 666 views
• 2 likes
• 5 in conversation