BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mandan414
Fluorite | Level 6

Hello,

I have a dataset with columns as below:

AA_005  AA_010  AA_015  AA_020  AA_025  AA_030  AA_035

     1           1             2            3             5              2          1

     2           2             5           1               3            3           2

     5           4            1             1              2             3          1 

 

 1= excellent      2= very good      3= good      4=same   5= bad

 

I want to create a variable of sum of observations responded 1 or 2 or 3 or 4 but not 5 for any variables (AA_005 to AA_035).

I know how to do it with the if ... then statement. But I'm looking for an easier/shorter way.

 

Thank you.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Suppose each observation represent one student.

without RETAIN you'll get non 5 per student (in the sample that is 4, 6, 4 ) while

with RETAIN the result will be: 4, 10 (=6+4), 14 (=4+10).

 

without RETAIN the SUM variables is assigned to missing each time a new observation is read,

RETAIN means to keep the value and not been reset on reading new observation but it can be

changed (including reset) programmatically in the code. 

View solution in original post

20 REPLIES 20
Reeza
Super User

Arrays? But that includes IF/THEN as well....it helps if you show what you've done, so not sure if that's an improvement over your current approach or not. 

 

Here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/

Shmuel
Garnet | Level 18

Is this what you are looking for:

data have;
  input AA_005  AA_010  AA_015  AA_020  AA_025  AA_030  AA_035;
datalines;
  1  1  2  3  5  2  1
  2  2  5  1  3  3  2
  5  4  1  1  2  3  1
; run;

data want;
 set have;
     retain sum1-sum5 0;
     array si {7} AA_:;
     array sm {5} sum1-sum5;
do i=1 to dim(sm); sm(i)=0; end; do i=1 to dim(si); sm(si(i)) = sm(si(i)) +1; end; keep sum1-sum5; label sum1=excellent sum2=very good sum3=good sum4=same sum5=bad ; run; proc print data=want label; run;

result list:

image.png

andreas_lds
Jade | Level 19

Because hardly anybody here is working for the NSA you have to post the code that you want to be optimized. And it is also not clear, what you expect as result, so you may want to explain what you are looking for.

And, as always, i would start by fixing the data-format you are using: long beats wide in almost all cases i have seen so far.

data have;

   input AA_005 AA_010 AA_015 AA_020 AA_025 AA_030 AA_035;
   datalines;
1 1 2 3 5 2 1
2 2 5 1 3 3 2
5 4 1 1 2 3 1 
;


data haveWithId;
   set have;
   
   id = _n_;
run;

proc transpose data=haveWithId out=transposed name=AA;
   by id;
   var AA:;
run;

proc format;
   value Excellent2Bad (notsorted)
      1 = 'excellent'
      2 = 'very good'
      3 = 'good'
      4 = 'same'
      5 = 'bad'
   ;
run;

proc report data=transposed;
   columns AA Col1;
   
   where Col1 < 5;
   
   define AA / group 'AA';
   define Col1 / across 'Rating' format= Excellent2Bad. order=data;
run;
mandan414
Fluorite | Level 6

Thank you for taking the time to respond.

As for if/then statement,  I mean to write an if/then statement for each condition, for example:

sum=0

if AA_005 ^= 5 or

   AA_010 ^= 5 or

   AA_015 ^= 5 or

   .

   .

   .

    AA_035 ^= 5    then

     sum=sum +1

...

mandan414
Fluorite | Level 6

I also made this code with Array, and I hope it's correct.

data new;

    set mydata;

    array myarray AA_:;

    sum = 0;

    do i=1 to dim(myarray);

    if myarray[i] not in 5 then do;

    sum = sum + 1

    end;

    end;

 

I'd like to know if there is an easier way to define AA_ % without going through all array statements. Thanks!

 

 

 

Shmuel
Garnet | Level 18

Your code seems to be correct, depending on expected result.

Do you sum per observation or for the accumulated for the whole dataset?

If you sum per dataset then you need the RETAIN statement otherwise SUM will be reset on every observation.

Pay attention that if you need just one statement in a condition you don't need the DO;...END; block.

Next code is equivalent to your code:

data new;
retain sum. set mydata; array myarray AA_:; sum = 0; do i=1 to dim(myarray); if myarray[i] not in 5 then sum = sum + 1; end; run;
Shmuel
Garnet | Level 18
Please post your expected result:
- is it a dataset or a report?
- if a report - in what format?
mandan414
Fluorite | Level 6

It is a dataset.

mandan414
Fluorite | Level 6

Thank you for your time. Would you please explain a little bit more about the retain? I had "retain" in my code but didn't make any difference. what I want is to accumulate the # of total observations from columns  AA_ %, if the response is not 5.

Shmuel
Garnet | Level 18

Suppose each observation represent one student.

without RETAIN you'll get non 5 per student (in the sample that is 4, 6, 4 ) while

with RETAIN the result will be: 4, 10 (=6+4), 14 (=4+10).

 

without RETAIN the SUM variables is assigned to missing each time a new observation is read,

RETAIN means to keep the value and not been reset on reading new observation but it can be

changed (including reset) programmatically in the code. 

mandan414
Fluorite | Level 6

Hi Samuel,

 

I don't want to sum up the value for AA_% variables. I want to sum up the number of observations. For example, I have  6000 observations responded to AA_% variables as 1=excellent, .......... 5=bad. I would like to calculate the total observations answered 1 or 2 or 3 or 4 to each of AA_% variable. Thanks.

andreas_lds
Jade | Level 19

@mandan414 wrote:

Hi Samuel,

 

I don't want to sum up the value for AA_% variables. I want to sum up the number of observations. For example, I have  6000 observations responded to AA_% variables as 1=excellent, .......... 5=bad. I would like to calculate the total observations answered 1 or 2 or 3 or 4 to each of AA_% variable. Thanks.


Please show the expected output using the data you have posted.

The transposed version of the data i have posted could be used as input for proc freq, like:

proc freq data=transposed;
   where Col1 < 5;
   table col1 / nocum nopercent;
   format col1 Excellent2Bad.;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 20 replies
  • 3353 views
  • 0 likes
  • 4 in conversation