BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TX_STAR
Obsidian | Level 7

Hello Members,

 

I get stuck with a simple SAS problem.

 

I have a 60 variables. The first 30 numerical variables (resp_1-resp_30, with 2 categories: 0, 1) indicating students' score on 30 items, with 0 being incorrect and 1 being correct. The next 30 are categorical variables (cat_1-cat_30, with 3 categories: 1,2,3) indicating the content domains each of the 30 item measuring. Each item may measure 1, or 2, or 3 content domains. 

 

I want to calculate domain total score for domain 1, 2, and 3. How should I code?

 

Thanks so much!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Aha!  So each resp variable is in a single domain throughout the data set.

 

So imagine you could have

data want;
   set have;
   cat1_total=sum(of resp_1);
   cat2_total=sum(of resp_2 resp_3);
   cat3_total=sum(of resp_30);
run;

 

 

So all you need is a way to use the data (from just one observation) to generate lists of variable names inside the SUM functions.  This does that in the first data step, which makes 3 macrovars holding those 3 lists. 

/* Make 3 macrovars: listing RESP vars for cat1, cat2, cat3 */
data _null_ ;
  set have (keep=cat:);
  array list {3} $240;

  array ct {*} cat: ;
  do i=1 to dim(ct);
    list{ct{i}} = catx(' ',list{ct{i}},cats('resp_',i));    
  end;
  call symput('resp_cat1',trim(list{1}));
  call symput('resp_cat2',trim(list{2}));
  call symput('resp_cat3',trim(list{3}));
  stop;
run;

/* Show the macrovars */
%put &=resp_cat1;
%put &=resp_cat2;
%put &=resp_cat3;


/* Use the macrovars */
data want;
  set have ;

  cat1_total=sum(of &resp_cat1);
  cat2_total=sum(of &resp_cat2);
  cat3_total=sum(of &resp_cat3);
run;

 

For large datasets, I suspect this could save some time ordinarily used by looping through the 30 variables.  And it would probably save more time for even longer lists of variables.  Also it's easy to generalize to more categories.  And would be easy to generalize to overlapping categories.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

13 REPLIES 13
Reeza
Super User

Transpose your data so that you have a long structure of variables instead:

Student Question Response Category

 

PROC TRANSPOSE will work but it's likely easier to use a data step.

 

data want;
set have;

array grs(*) grad1-grad30;
array cat(*) category1-category30;

do question = 1 to 30;
   Question_Category = cat(question);
   Question_Correct = grad(question);
   output;
end;

keep id question_:;
run;
ballardw
Super User

Provide a worked example with starting data and the final desired result (is the result a data set or report for people to read).

You likely only need to provide 4 or 5 each of the RESP and CAT variables to demonstrate the basic concept. Show a few different scenarios with some of the ones that you are having difficulty with.

 

I suspect that you may want to add 3 domain variables indicating which domain is concerned which could perhaps be another 1/0 variable for each record but we likely need to see whether your cat variables are character or numeric and exactly what they contain.

TX_STAR
Obsidian | Level 7

 

Sorry, my mistake: each item should measure only one of the three domains. The data set looks like below. Resp_1 to Resp_30 are the students' responses (0 or 1) on 30 items. cat_1 to cat30 are domain indicators indicating which domain each item is measuring (1 or 2, or 3).

 

I want create three datasets for each of the three domains with response variables and a total score for that domain. 

 

Thank you!

 

obs resp_1 resp_2 resp_2 …… resp_30 cat_1 cat_2 cat_3 …… cat_30
1 1 1 0   1 1 2 2   3
2 0 1 1   0 1 2 2   3
3 0 0 0   0 1 2 2   3
4 1 1 1   1 1 2 2   3
……………………………………………………….              
                     
ballardw
Super User

If I understand the problem perhaps:

data have;
   input obs resp_1-resp_3 resp_30 cat_1-cat_3 cat_30;
datalines;
1 1 1 0 1 1 2 2 3
2 0 1 1 0 1 2 2 3
3 0 0 0 0 1 2 2 3
4 1 1 1 1 1 2 2 3
;
run;

data want;
   set have;
   array r resp_: ;
   array c cat_:  ;
   array domain {3}; 
   do i= 1 to dim(c);
      domain[c[i]]= sum(domain[c[i]],r[i]);
   end;
run;
   
proc means data=want sum;
   var domain: ;
run;

can get you started. You haven't shown what a final result would look like for three datasets for each of the three domains with response variables and a total score for that domain. 

 

 

If the purpose is to make report that people will read then you likely do not separate data set but kind of need to know what the appearance would actually be.

TX_STAR
Obsidian | Level 7

The desired look for the three datasets is that each data site has student item responses resp_1, resp_3,....,resp_30 (only items measuring that domain) for that domain and the computed domain total. If not too cumbersome, keep the domain indicator of that domain (the values should be the same) as well, cat_1, cat_3,..., cat30. 

 

Thanks!

Reeza
Super User
Please show an example of your expected output.
TX_STAR
Obsidian | Level 7

each dataset looks like below. Cat1--cat28 is not necessary but just for checking purpose. Thanks. 

 

obs resp_1 resp_5 resp_10 …… resp_28 cat_1 cat_5 cat_10 …… cat_28 cat1_tot
1 1 1 1   0 1 1 1 1 1 9
2 1 1 1   0 1 1 1 1 1 9
3 1 1 0   0 1 1 1 1 1 8
4 1 1 1   1 1 1 1 1 1 10
Reeza
Super User
Rather than have several back and forth conversations regarding logic, can you please take the sample data from BallardW post and show the EXACT output expected, including the calculations. I only see cat1_total here for example, but I assume you would want cat3_total and are those across ID or just for a single row at a time?
TX_STAR
Obsidian | Level 7

I am sorry for the confusion. 

Hope this time is clear:

 

There are 34 items in total in this test. the 34 items measures 3 content domains.

5 items measures domain 1: item1, item5, item10, item12, item28;

15 items measures domain 2: item2, item3, item4,..., item26;

14 items measure domain 3: item11, item13,..., item34;

 

I want to create one dataset for each domain. Each dataset should have binary item responses, item domain indicators (values should be the same for all indicator for the same domain) for checking purpose, and total score (sum of item response).

 

For example, for domain 1, the dataset should look like below. resp_1 to resp_28 are the item responses. cat1 to cat28 are domain indicator. domain1_tot is the sum of resp_1, resp_2, resp_10, resp_12, resp28.

 

create the same datasets for domain2 and domain3.

 

obs resp_1 resp_5 resp_10 resp_12 resp_28 cat_1 cat_5 cat_10 cat_12 cat_28 domain1_tot
1 1 1 1 1 0 1 1 1 1 1 4
2 1 1 1 1 0 1 1 1 1 1 4
3 1 1 0 1 0 1 1 1 1 1 3
4 1 1 1 0 1 1 1 1 1 1 4

 

Thanks!

Reeza
Super User

That clarifies a lot. 

 

data domain1;
set have (keep =  obs resp_1 resp_5 resp_10 pres_12 resp_28
                cat_1 cat_5 cat_10 cat_12 cat_28);

domain_total = sum(of resp_:);

run;

Change the variable names for each of the data sets you want and repeat this three times. This could be a little bit more dynamic but this works fine. 

 


@TX_STAR wrote:

I am sorry for the confusion. 

Hope this time is clear:

 

There are 34 items in total in this test. the 34 items measures 3 content domains.

5 items measures domain 1: item1, item5, item10, item12, item28;

15 items measures domain 2: item2, item3, item4,..., item26;

14 items measure domain 3: item11, item13,..., item34;

 

I want to create one dataset for each domain. Each dataset should have binary item responses, item domain indicators (values should be the same for all indicator for the same domain) for checking purpose, and total score (sum of item response).

 

For example, for domain 1, the dataset should look like below. resp_1 to resp_28 are the item responses. cat1 to cat28 are domain indicator. domain1_tot is the sum of resp_1, resp_2, resp_10, resp_12, resp28.

 

create the same datasets for domain2 and domain3.

 

obs resp_1 resp_5 resp_10 resp_12 resp_28 cat_1 cat_5 cat_10 cat_12 cat_28 domain1_tot
1 1 1 1 1 0 1 1 1 1 1 4
2 1 1 1 1 0 1 1 1 1 1 4
3 1 1 0 1 0 1 1 1 1 1 3
4 1 1 1 0 1 1 1 1 1 1 4

 

Thanks!


 

 

 

 

mkeintz
PROC Star

Aha!  So each resp variable is in a single domain throughout the data set.

 

So imagine you could have

data want;
   set have;
   cat1_total=sum(of resp_1);
   cat2_total=sum(of resp_2 resp_3);
   cat3_total=sum(of resp_30);
run;

 

 

So all you need is a way to use the data (from just one observation) to generate lists of variable names inside the SUM functions.  This does that in the first data step, which makes 3 macrovars holding those 3 lists. 

/* Make 3 macrovars: listing RESP vars for cat1, cat2, cat3 */
data _null_ ;
  set have (keep=cat:);
  array list {3} $240;

  array ct {*} cat: ;
  do i=1 to dim(ct);
    list{ct{i}} = catx(' ',list{ct{i}},cats('resp_',i));    
  end;
  call symput('resp_cat1',trim(list{1}));
  call symput('resp_cat2',trim(list{2}));
  call symput('resp_cat3',trim(list{3}));
  stop;
run;

/* Show the macrovars */
%put &=resp_cat1;
%put &=resp_cat2;
%put &=resp_cat3;


/* Use the macrovars */
data want;
  set have ;

  cat1_total=sum(of &resp_cat1);
  cat2_total=sum(of &resp_cat2);
  cat3_total=sum(of &resp_cat3);
run;

 

For large datasets, I suspect this could save some time ordinarily used by looping through the 30 variables.  And it would probably save more time for even longer lists of variables.  Also it's easy to generalize to more categories.  And would be easy to generalize to overlapping categories.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
TX_STAR
Obsidian | Level 7

With some modification, I was able to get the datasets I that need for different subjects with different number of subdomains (with different items) with the sample SAS code.

 

Thank you all for the help.

Reeza
Super User

@TX_STAR wrote:

Hello Members,

 

I get stuck with a simple SAS problem.

 

I have a 60 variables. The first 30 numerical variables (resp_1-resp_30, with 2 categories: 0, 1) indicating students' score on 30 items, with 0 being incorrect and 1 being correct. The next 30 are categorical variables (cat_1-cat_30, with 3 categories: 1,2,3) indicating the content domains each of the 30 item measuring. Each item may measure 1, or 2, or 3 content domains. 

 

I want to calculate domain total score for domain 1, 2, and 3. How should I code?

 

Thanks so much!

 

 


data have;
   input obs resp_1-resp_3  cat_1-cat_3;
datalines;
1 1 1 0  1 2 3 
2 0 1 1  1 2 2 
3 0 0 0  1 3 2 
4 1 1 1  1 2 3 
;
run;

data want;
set have;

array grs(*) resp_1-resp_3;
array ct(*) cat_1-cat_3;

do question = 1 to dim(grs);
   Question_Category = ct(question);
   Question_Correct = grs(question);
   output;
end;

keep id question_:;
run;

proc means data=want N Mean SUM maxdec=2;
class question_category;
var question_correct;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 4487 views
  • 0 likes
  • 4 in conversation