I have three data sets (A,B,C) all of them have missing values but if i combine them together it should output a complete data set. All of the data set contains the variables(g1,g2,g3) and the data set kind of looks like this:
g1 g2 g3 g1 g2 g3 g1 g2 g3
1 1.1 2.2 1 . . . 1 . . .
2 2 3 4 2 . . . 2. . .
3 5 6 7 3 . . . 3 . . .
4 1 2 3 4 . . . 4 . . .
5 . . . 5 1 2 3 5 . . .
6 1 2 3 6 . . . 6. . .
7 . . . 7 5 6 7
I need the output data set to look like this:
g1 g2 g3
1 1.1 2 2
2 2 3 4
3 5 6 7
4 1 2 3
5 1 2 3
6 1 2 3
7 5 6 7
I have tried the code below:
data m;
merge a b c;
by g1 g2 g3;
retain tempg1 tempg2 tempg3;
if first.g1 then tempg1=.;
else if first.g2 then tempg2=.;
else if first.g3 then tempg3=.;
if g1=. then g1=tempg1;
else if g2=. then g2=tempg2;
else if g3=. then g3=tempg3;
if tempg1=g1;
if tempg2=g2;
if tempg3=g3;
run;
SAS says that the data sets have to be sorted but then the one data's missing values do not correspond non missing values in the other data set. Any help will be appreciated,
What's the name of the first variable, the one that numbers the observations 1 through 7?
Hi, it doesn't have a name I guess its just the observations.
WIthout such a variable combining the data sets is unreliable. You have different numbers of observations in each data set. How do you know which ones are the right matches?
Assuming you do have a variable named ID that holds the contents of the first column ...
data combined;
set c b a;
run;
proc sort data=combined;
by id;
run;
data want;
update combined (obs=0) combined;
by id;
Anywhere you use a BY statement the data set(s) must be sorted by all of the variables on the BY statement. From your example I don't believe that you want to use a by statement.
You need to provide some rules as to WHICH specific value you want when the variable has values in more than one data set.
If you need to compare in the existing order and want to pick the value from a if it exist, from b if it exists but not in a or from c if not in a or b then perhaps:
data want; set a (rename=(g1= ag1 g2=ag2 g3=ag3) ) b (rename=(g1= bg1 g2=bg2 g3=bg3) ) c (rename=(g1= cg1 g2=cg2 g3=cg3) ) ; g1= coalesce(ag1, bg1, cg1); g2= coalesce(ag2, bg2, cg2); g3= coalesce(ag3, bg3, cg3); keep g1 g2 g3; run;
if the variables are actually character values then the function would be coalescec
I believe you need a MERGE not a SET for your approach to work as intended.
first create an id variable in each data set and merge on that
data data1_;
retain id 0;
set data1;
id=id+1;
run;
and then merge all 3 datasets "by id". Youll need to rename variables as @Astounding says, and then the code is easy to replace the missings
data want;
merge
a (rename=(g1= ag1 g2=ag2 g3=ag3) )
b (rename=(g1= bg1 g2=bg2 g3=bg3) )
c (rename=(g1= cg1 g2=cg2 g3=cg3) )
;
g1= coalesce(ag1, bg1, cg1);
g2= coalesce(ag2, bg2, cg2);
g3= coalesce(ag3, bg3, cg3);
keep g1 g2 g3;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.