Hello,
I have a dataset with observations that need to be merged so that obs1 and obs2 are together, obs3 and obs4 together etc etc.
Each observation has a two numbers, the "family" id and the sub id.
For example:
have data:
family sub val1 val2
123 1 15 0
123 2 0 12
wanted data:
family sub val1 val2
123 1 15 12
Basically observations need to be brought back to the first. They always come in pairs, so my idea was checking sub % 2 if you get what I mean, but I don"t know how to proceed. Note that each family can have more than two lines, it can go up to 4 and they don't always have a 2nd observation alongside it. The only indicator that says that they need to be combined is that val2 is different than 0.
Here's the SAS code for the have and want data:
data have;
input num_fam num_sub val1 val2;
cards;
123 1 75 0
123 2 0 12
124 1 14 0
124 2 0 5
125 1 10 0
125 2 0 10
125 3 16 0
125 4 0 10
126 1 14 0
;
run;
data want;
input num_fam num_sub val1 val2;
cards;
123 1 75 12
124 1 14 5
125 1 10 10
125 3 16 10
126 1 14 0
;
run;
One way
data have;
input num_fam num_sub val1 val2;
cards;
123 1 75 0
123 2 0 12
124 1 14 0
124 2 0 5
125 1 10 0
125 2 0 10
125 3 16 0
125 4 0 10
126 1 14 0
;
run;
data want(drop=_:);
do until (last.num_fam);
set have;
by num_fam;
_val1=max(_val1, val1);
_val2=max(_val2, val2);
end;
val1=_val1;
val2=_val2;
run;
Result:
num_fam num_sub val1 val2; 123 2 75 12 124 2 14 5 125 4 16 10 126 1 14 0
Missed the last requirement, when reading want and have, so the following step might not produce the required result:
data want;
set have(rename=(
num_sub = _sub
val1 = _val1
val2 = _val2
));
by num_fam;
if 0 then set have;
retain num_sub val1 val2;
if mod(_sub, 2) = 1 then do;
num_sub = _sub;
val1 = _val1;
end;
if last.num_fam or mod(_sub, 2) = 0 then do;
val2 = _val2;
output;
end;
drop _:;
run;
Create a temporary sub-group, and use that to sum up the values:
data have;
input num_fam num_sub val1 val2;
cards;
123 1 75 0
123 2 0 12
124 1 14 0
124 2 0 5
125 1 10 0
125 2 0 10
125 3 16 0
125 4 0 10
126 1 14 0
;
run;
data int;
set have;
by num_fam;
retain count grp;
if first.num_fam then do;
count = 1;
grp = 0;
end;
else do;
count + 1;
if mod(count,2) = 1 then grp + 1;
end;
drop count;
run;
proc sql;
create table want (drop=grp) as
select num_fam, grp, min(num_sub) as num_sub, sum(val1) as val1, sum(val2) as val2
from int
group by num_fam, grp;
quit;
proc print data=want noobs;
run;
Result:
num_fam num_sub val1 val2 123 1 75 12 124 1 14 5 125 1 10 10 125 3 16 10 126 1 14 0
data have;
input num_fam num_sub val1 val2;
cards;
123 1 75 0
123 2 0 12
124 1 14 0
124 2 0 5
125 1 10 0
125 2 0 10
125 3 16 0
125 4 0 10
126 1 14 0
;
run;
data want;
merge have have(keep=val1 val2
rename=(val1=_val1 val2=_val2) firstobs=2);
val1=sum(val1,_val1);
val2=sum(val2,_val2);
if mod(_n_,2)=1;
drop _:;
run;
Your have data violates your rule, as num_fam=126 has only one observation:
data have;
input num_fam num_sub val1 val2;
cards;
123 1 75 0
123 2 0 12
124 1 14 0
124 2 0 5
125 1 10 0
125 2 0 10
125 3 16 0
125 4 0 10
126 1 14 0
;
If that happens in the middle of the dataset, it might invalidate solutions that expect an even number of observations for every num_fam.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.