Hi Community,
I have a long dataset while there are multiple records per id. I want to create a dataset and it is one record per id under the conditions as:
if condition a=1, then under the condition b=0, get a total sum of d. When the last id, use that total sum to be divided by e to get f and output.
I have test the codes using a single id is fine (where statement),however, if I use it into more ids, the answer is wrong. There are somethings I miss in the codes. Or please suggest a better way to do it. Your suggestion is appreciated.
data want;
set have;*sorted;
by id;
retain total;
/*where id =aa;*/
if a=1 then
do;
if b = 0 then
do;
if first.id then
total=d;
else total=sum(total,d);
end;
if last.id then
do;
f=total/e;
output;
end;
end;
run;
UNTESTED CODE since you did not provide data
/* UNTESTED CODE */
data want;
set have;
by id;
if first.id then total=0;
if a=1 and b=0 then total+d;
if last.id then do;
f=total/e;
output;
end;
run;
UNTESTED CODE since you did not provide data
/* UNTESTED CODE */
data want;
set have;
by id;
if first.id then total=0;
if a=1 and b=0 then total+d;
if last.id then do;
f=total/e;
output;
end;
run;
Here is the sample dataset:
have
ID | a | b | d | e |
aa | 1 | 0 | 10 | 3 |
aa | 2 | 0 | 9 | 3 |
aa | 1 | 1 | 7 | 3 |
aa | 1 | 0 | 4 | 3 |
aa | 2 | 1 | 8 | 3 |
aa | 1 | 0 | 3 | 3 |
aa | 2 | 1 | 8 | 3 |
bb | 1 | 0 | 1 | 2 |
bb | 1 | 0 | 4 | 2 |
bb | 2 | 1 | 3 | 2 |
bb | 1 | 0 | 6 | 2 |
bb | 1 | 0 | 6 | 2 |
bb | 1 | 1 | 9 | 2 |
bb | 2 | 0 | 4 | 2 |
Want
ID | a | b | d | e | f |
aa | 1 | 0 | 10 | 3 | 5.666667 |
bb | 1 | 0 | 12 | 2 | 15 |
Since you have the data and you have the code, I request that you test it.
I did. Thank you and it works.
Hi
I have a different condition but the first.id if it is in the condition (a=1 and b=1) cannot be set to 0, and it shoule be the value of c. How to deal with this situtation? Thank you for your helps.
Here is the sample data.
data have;
infile datalines dlm="09"x dsd truncover;
input id $ a b c d;
datalines;
aa 1 1 10 3
aa 2 0 9 3
aa 1 1 7 3
aa 1 0 4 3
aa 2 1 8 3
aa 1 0 3 3
aa 2 1 8 3
bb 1 0 6 2
bb 1 1 3 2
bb 2 1 7 2
bb 1 0 6 2
bb 1 0 6 2
bb 1 1 9 2
bb 2 0 4 2
bb 1 1 8 2
bb 1 0 4 2
bb 1 1 3 2
cc 1 1 4 2
cc 1 0 6 2
cc 1 1 9 2
cc 2 0 4 2
;
run;
The data I want is as
id | total | e |
aa | 17 | 5.666667 |
bb | 23 | 11.5 |
cc | 13 | 6.5 |
The condition of the sum is : within ID, sum(c) to get total for all records in which a=1 and b=1 and then divide total by d to get e.
In the cases, id="aa" or "cc", their first ids fit the condition of a=1 and b=1, while the first id of case "bb" did not fit the condition of a=1 and b=1.
I am able to get the result using codes like below.
data want (keep=id total e); set have; by id; where a=1; if b^=1 then sum=0; else if b=1 then sum=c; if first.id then total=0; total + sum; if last.id then do; e=total/d; output; end; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.