Hi, I would like to merge multiple datasets by ID. Each dataset has different number of rows. Some datasets have more than one variable of interest, in this case is age, that is, the same dataset might have age1, age2, age3. The end goal is to be able to report the frequencies of variable age per ID and be able to know where each data point comes from (depicted by the variable ds). These are the sample datasets: data ds1;
input ID age1;
datalines;
1 12
3 11
4 10
4 15
;
run;
data ds2;
input ID age2 age3 age4;
datalines;
1 13 13 .
1 23 14 .
2 25 . 21
;
run;
data ds3;
input ID age5;
datalines;
3 22
3 32
3 33
4 12
;
run; The output that we are aiming for is this before using proc freq by ID: ID age ds 1 12 1 1 13 2 1 13 2 1 23 2 1 14 2 2 25 2 2 21 2 3 11 1 3 22 3 3 32 3 3 33 3 4 10 1 4 15 1 4 12 3 I tried this code but it omits the observation from ds2, ID1, age3=13: data ages_all;
/* First dataset */
set ds1(rename=(age1=age) in=in1 keep=ID age1);
/* Second dataset */
set ds2(rename=(age2=age age3=age_renamed1 age4=age_renamed2) in=in2 keep=ID age2 age3 age4);
/* Rename age_renamed1 and age_renamed2 to age */
if in2 then age = coalesce(age_renamed1, age_renamed2);
/* Third dataset */
set ds3(rename=(age5=age) in=in3 keep=ID age5);
by ID;
ds = in1 + 2*in2 + 3*in3;
drop age_renamed1 age_renamed2; /* Drop the temporary variables */
run;
... View more