Dear all,
I am in deep need of some help with something that I think is quite basic and simple, but which I cannot figure out myself. Suppose I have a dataset with a large number of variables and four of them are of concern for this data step:
HAVE =
Var1 | Var2 | Var3 | Var4 |
A | 1 | C | 1 |
B | 1 | C | 2 |
TOT | 1 | C | . |
A | 2 | C | 4 |
B | 2 | C | 5 |
TOT | 2 | C | . |
A | 3 | C | 7 |
B | 3 | C | 8 |
TOT | 3 | C | . |
A | 1 | D | 90 |
B | 1 | D | 80 |
TOT | 1 | D | . |
A | 2 | D | 60 |
B | 2 | D | 50 |
TOT | 2 | D | . |
A | 3 | D | 30 |
B | 3 | D | 20 |
TOT | 3 | D | . |
Now I wish to fill out the missing values for the "TOT"-observation, by summing the observations of "A" and "B" in which Var2 and Var3 are equal. I.e., I need the red numbers in:
WANT =
Var1 | Var2 | Var3 | Var4 |
A | 1 | C | 1 |
B | 1 | C | 2 |
TOT | 1 | C | 3 |
A | 2 | C | 4 |
B | 2 | C | 5 |
TOT | 2 | C | 9 |
A | 3 | C | 7 |
B | 3 | C | 8 |
TOT | 3 | C | 15 |
A | 1 | D | 90 |
B | 1 | D | 80 |
TOT | 1 | D | 170 |
A | 2 | D | 60 |
B | 2 | D | 50 |
TOT | 2 | D | 110 |
A | 3 | D | 30 |
B | 3 | D | 20 |
TOT | 3 | D | 50 |
Is there anyone who knows how to solve this in a simple data step?
Thank you in advance!
Hi @MiniRadde Do you really have TOT obs in your HAVE dataset following every set of of A and B obs. Not a big deal as you rightly mentioned in datastep, but i find the requirement more alike a proc print/report. Can you clarify? Oh well
Hi @novinosrin! yes, I do have it already inte the HAVE dataset. a smentioned there are other variables in there where TOT have values. They are however not sorted as illustrated.
Providing meaningful example data in a readily usable fashion (data step with datalines) enhances the quality of answers and reduces the need to go back-and-forth several times.
Thank you @MiniRadde for clarifying. So assuming your data is a good representative, the sort is the most important part.
data have;
input Var1 $ Var2 Var3 $ Var4;
cards;
A 1 C 1
B 1 C 2
TOT 1 C .
A 2 C 4
B 2 C 5
TOT 2 C .
A 3 C 7
B 3 C 8
TOT 3 C .
A 1 D 90
B 1 D 80
TOT 1 D .
A 2 D 60
B 2 D 50
TOT 2 D .
A 3 D 30
B 3 D 20
TOT 3 D .
;
proc sort data=have out=_have;
by var3 var2 var1;
run;
data want;
set _have;
by var3 var2 var1;
if first.var2 then s=0;
s+var4;
if var1='TOT' and missing(var4) then var4=s;
drop s;
run;
Assuming your data is sorted properly by var3 and var2:
data want;
set have;
by var3 var2;
if first.var2 then sum=0;
sum+var4;
if var1='TOT' then var4=sum;
run;
Do something like this
data have;
input Var1 $ Var2 Var3 $ Var4;
datalines;
A 1 C 1
B 1 C 2
TOT 1 C .
A 2 C 4
B 2 C 5
TOT 2 C .
A 3 C 7
B 3 C 8
TOT 3 C .
A 1 D 90
B 1 D 80
TOT 1 D .
A 2 D 60
B 2 D 50
TOT 2 D .
A 3 D 30
B 3 D 20
TOT 3 D .
;
data want(drop=c);
set have;
c+Var4;
if missing(Var4) then do;
Var4=c;
c=0;
end;
run;
by-group processing, and retaining a helper variable:
data want;
set have;
by var2 var3;
retain _var4;
if first.var3
then _var4 = var4;
else if not last.var3
then _var4 + var4;
else var4 = _var4;
drop _var4;
run;
Thank you all for all the help you provided. The solutions seemed to be in a similar setting from all of you. I could not however make it work, as best it just copied the value from A or B but did not generate a sum. I solved the task by adding some data steps instead and could from there merge them back and make the sumation over variables instead. Ugly code - YES - but it solved my problem. Again, thanks for all the effort!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.