Hello,
I have a dataset with columns as below:
AA_005 AA_010 AA_015 AA_020 AA_025 AA_030 AA_035
1 1 2 3 5 2 1
2 2 5 1 3 3 2
5 4 1 1 2 3 1
1= excellent 2= very good 3= good 4=same 5= bad
I want to create a variable of sum of observations responded 1 or 2 or 3 or 4 but not 5 for any variables (AA_005 to AA_035).
I know how to do it with the if ... then statement. But I'm looking for an easier/shorter way.
Thank you.
Suppose each observation represent one student.
without RETAIN you'll get non 5 per student (in the sample that is 4, 6, 4 ) while
with RETAIN the result will be: 4, 10 (=6+4), 14 (=4+10).
without RETAIN the SUM variables is assigned to missing each time a new observation is read,
RETAIN means to keep the value and not been reset on reading new observation but it can be
changed (including reset) programmatically in the code.
Arrays? But that includes IF/THEN as well....it helps if you show what you've done, so not sure if that's an improvement over your current approach or not.
Here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/
Is this what you are looking for:
data have;
input AA_005 AA_010 AA_015 AA_020 AA_025 AA_030 AA_035;
datalines;
1 1 2 3 5 2 1
2 2 5 1 3 3 2
5 4 1 1 2 3 1
; run;
data want;
set have;
retain sum1-sum5 0;
array si {7} AA_:;
array sm {5} sum1-sum5;
do i=1 to dim(sm); sm(i)=0; end;
do i=1 to dim(si);
sm(si(i)) = sm(si(i)) +1;
end;
keep sum1-sum5;
label
sum1=excellent
sum2=very good
sum3=good
sum4=same
sum5=bad
;
run;
proc print data=want label; run;
result list:
Because hardly anybody here is working for the NSA you have to post the code that you want to be optimized. And it is also not clear, what you expect as result, so you may want to explain what you are looking for.
And, as always, i would start by fixing the data-format you are using: long beats wide in almost all cases i have seen so far.
data have;
input AA_005 AA_010 AA_015 AA_020 AA_025 AA_030 AA_035;
datalines;
1 1 2 3 5 2 1
2 2 5 1 3 3 2
5 4 1 1 2 3 1
;
data haveWithId;
set have;
id = _n_;
run;
proc transpose data=haveWithId out=transposed name=AA;
by id;
var AA:;
run;
proc format;
value Excellent2Bad (notsorted)
1 = 'excellent'
2 = 'very good'
3 = 'good'
4 = 'same'
5 = 'bad'
;
run;
proc report data=transposed;
columns AA Col1;
where Col1 < 5;
define AA / group 'AA';
define Col1 / across 'Rating' format= Excellent2Bad. order=data;
run;
Thank you for taking the time to respond.
As for if/then statement, I mean to write an if/then statement for each condition, for example:
sum=0
if AA_005 ^= 5 or
AA_010 ^= 5 or
AA_015 ^= 5 or
.
.
.
AA_035 ^= 5 then
sum=sum +1
...
I also made this code with Array, and I hope it's correct.
data new;
set mydata;
array myarray AA_:;
sum = 0;
do i=1 to dim(myarray);
if myarray[i] not in 5 then do;
sum = sum + 1
end;
end;
I'd like to know if there is an easier way to define AA_ % without going through all array statements. Thanks!
Your code seems to be correct, depending on expected result.
Do you sum per observation or for the accumulated for the whole dataset?
If you sum per dataset then you need the RETAIN statement otherwise SUM will be reset on every observation.
Pay attention that if you need just one statement in a condition you don't need the DO;...END; block.
Next code is equivalent to your code:
data new;
retain sum.
set mydata;
array myarray AA_:;
sum = 0;
do i=1 to dim(myarray);
if myarray[i] not in 5 then sum = sum + 1;
end;
run;
It is a dataset.
Thank you for your time. Would you please explain a little bit more about the retain? I had "retain" in my code but didn't make any difference. what I want is to accumulate the # of total observations from columns AA_ %, if the response is not 5.
Suppose each observation represent one student.
without RETAIN you'll get non 5 per student (in the sample that is 4, 6, 4 ) while
with RETAIN the result will be: 4, 10 (=6+4), 14 (=4+10).
without RETAIN the SUM variables is assigned to missing each time a new observation is read,
RETAIN means to keep the value and not been reset on reading new observation but it can be
changed (including reset) programmatically in the code.
Thank you!
Hi Samuel,
I don't want to sum up the value for AA_% variables. I want to sum up the number of observations. For example, I have 6000 observations responded to AA_% variables as 1=excellent, .......... 5=bad. I would like to calculate the total observations answered 1 or 2 or 3 or 4 to each of AA_% variable. Thanks.
@mandan414 wrote:
Hi Samuel,
I don't want to sum up the value for AA_% variables. I want to sum up the number of observations. For example, I have 6000 observations responded to AA_% variables as 1=excellent, .......... 5=bad. I would like to calculate the total observations answered 1 or 2 or 3 or 4 to each of AA_% variable. Thanks.
Please show the expected output using the data you have posted.
The transposed version of the data i have posted could be used as input for proc freq, like:
proc freq data=transposed;
where Col1 < 5;
table col1 / nocum nopercent;
format col1 Excellent2Bad.;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.