Dear all,
I am in deep need of some help with something that I think is quite basic and simple, but which I cannot figure out myself. Suppose I have a dataset with a large number of variables and four of them are of concern for this data step:
HAVE =
Var1 | Var2 | Var3 | Var4 |
A | 1 | C | 1 |
B | 1 | C | 2 |
TOT | 1 | C | . |
A | 2 | C | 4 |
B | 2 | C | 5 |
TOT | 2 | C | . |
A | 3 | C | 7 |
B | 3 | C | 8 |
TOT | 3 | C | . |
A | 1 | D | 90 |
B | 1 | D | 80 |
TOT | 1 | D | . |
A | 2 | D | 60 |
B | 2 | D | 50 |
TOT | 2 | D | . |
A | 3 | D | 30 |
B | 3 | D | 20 |
TOT | 3 | D | . |
Now I wish to fill out the missing values for the "TOT"-observation, by summing the observations of "A" and "B" in which Var2 and Var3 are equal. I.e., I need the red numbers in:
WANT =
Var1 | Var2 | Var3 | Var4 |
A | 1 | C | 1 |
B | 1 | C | 2 |
TOT | 1 | C | 3 |
A | 2 | C | 4 |
B | 2 | C | 5 |
TOT | 2 | C | 9 |
A | 3 | C | 7 |
B | 3 | C | 8 |
TOT | 3 | C | 15 |
A | 1 | D | 90 |
B | 1 | D | 80 |
TOT | 1 | D | 170 |
A | 2 | D | 60 |
B | 2 | D | 50 |
TOT | 2 | D | 110 |
A | 3 | D | 30 |
B | 3 | D | 20 |
TOT | 3 | D | 50 |
Is there anyone who knows how to solve this in a simple data step?
Thank you in advance!
Hi @MiniRadde Do you really have TOT obs in your HAVE dataset following every set of of A and B obs. Not a big deal as you rightly mentioned in datastep, but i find the requirement more alike a proc print/report. Can you clarify? Oh well
Hi @novinosrin! yes, I do have it already inte the HAVE dataset. a smentioned there are other variables in there where TOT have values. They are however not sorted as illustrated.
Providing meaningful example data in a readily usable fashion (data step with datalines) enhances the quality of answers and reduces the need to go back-and-forth several times.
Thank you @MiniRadde for clarifying. So assuming your data is a good representative, the sort is the most important part.
data have;
input Var1 $ Var2 Var3 $ Var4;
cards;
A 1 C 1
B 1 C 2
TOT 1 C .
A 2 C 4
B 2 C 5
TOT 2 C .
A 3 C 7
B 3 C 8
TOT 3 C .
A 1 D 90
B 1 D 80
TOT 1 D .
A 2 D 60
B 2 D 50
TOT 2 D .
A 3 D 30
B 3 D 20
TOT 3 D .
;
proc sort data=have out=_have;
by var3 var2 var1;
run;
data want;
set _have;
by var3 var2 var1;
if first.var2 then s=0;
s+var4;
if var1='TOT' and missing(var4) then var4=s;
drop s;
run;
Assuming your data is sorted properly by var3 and var2:
data want;
set have;
by var3 var2;
if first.var2 then sum=0;
sum+var4;
if var1='TOT' then var4=sum;
run;
Do something like this
data have;
input Var1 $ Var2 Var3 $ Var4;
datalines;
A 1 C 1
B 1 C 2
TOT 1 C .
A 2 C 4
B 2 C 5
TOT 2 C .
A 3 C 7
B 3 C 8
TOT 3 C .
A 1 D 90
B 1 D 80
TOT 1 D .
A 2 D 60
B 2 D 50
TOT 2 D .
A 3 D 30
B 3 D 20
TOT 3 D .
;
data want(drop=c);
set have;
c+Var4;
if missing(Var4) then do;
Var4=c;
c=0;
end;
run;
by-group processing, and retaining a helper variable:
data want;
set have;
by var2 var3;
retain _var4;
if first.var3
then _var4 = var4;
else if not last.var3
then _var4 + var4;
else var4 = _var4;
drop _var4;
run;
Thank you all for all the help you provided. The solutions seemed to be in a similar setting from all of you. I could not however make it work, as best it just copied the value from A or B but did not generate a sum. I solved the task by adding some data steps instead and could from there merge them back and make the sumation over variables instead. Ugly code - YES - but it solved my problem. Again, thanks for all the effort!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.