Re: merging dataset with missing values

azt5173 · Posted 06-22-2018 12:19 PM

I have three data sets (A,B,C) all of them have missing values but if i combine them together it should output a complete data set. All of the data set contains the variables(g1,g2,g3) and the data set kind of looks like this:

g1 g2 g3 g1 g2 g3 g1 g2 g3

1 1.1 2.2 1 . . . 1 . . .

2 2 3 4 2 . . . 2. . .

3 5 6 7 3 . . . 3 . . .

4 1 2 3 4 . . . 4 . . .

5 . . . 5 1 2 3 5 . . .

6 1 2 3 6 . . . 6. . .

7 . . . 7 5 6 7

I need the output data set to look like this:

g1 g2 g3

1 1.1 2 2

2 2 3 4

3 5 6 7

4 1 2 3

5 1 2 3

6 1 2 3

7 5 6 7

I have tried the code below:

data m;
merge a b c;
by g1 g2 g3;
retain tempg1 tempg2 tempg3;
if first.g1 then tempg1=.;
else if first.g2 then tempg2=.;
else if first.g3 then tempg3=.;
if g1=. then g1=tempg1;
else if g2=. then g2=tempg2;
else if g3=. then g3=tempg3;
if tempg1=g1;
if tempg2=g2;
if tempg3=g3;
run;

SAS says that the data sets have to be sorted but then the one data's missing values do not correspond non missing values in the other data set. Any help will be appreciated,

Astounding · Posted 06-22-2018 12:27 PM

What's the name of the first variable, the one that numbers the observations 1 through 7?

azt5173 · Posted 06-22-2018 12:31 PM

Hi, it doesn't have a name I guess its just the observations.

Astounding · Posted 06-22-2018 03:40 PM

WIthout such a variable combining the data sets is unreliable. You have different numbers of observations in each data set. How do you know which ones are the right matches?

Assuming you do have a variable named ID that holds the contents of the first column ...

data combined;

set c b a;

run;

proc sort data=combined;

by id;

run;

data want;

update combined (obs=0) combined;

by id;

ballardw · Posted 06-22-2018 12:35 PM

Anywhere you use a BY statement the data set(s) must be sorted by all of the variables on the BY statement. From your example I don't believe that you want to use a by statement.

You need to provide some rules as to WHICH specific value you want when the variable has values in more than one data set.

If you need to compare in the existing order and want to pick the value from a if it exist, from b if it exists but not in a or from c if not in a or b then perhaps:

data want; 
   set
      a (rename=(g1= ag1 g2=ag2 g3=ag3) )
      b (rename=(g1= bg1 g2=bg2 g3=bg3) )
      c (rename=(g1= cg1 g2=cg2 g3=cg3) )
   ;
   g1= coalesce(ag1, bg1, cg1);
   g2= coalesce(ag2, bg2, cg2);
   g3= coalesce(ag3, bg3, cg3);
   keep g1 g2 g3;
run;

if the variables are actually character values then the function would be coalescec

Patrick · Posted 06-23-2018 11:09 PM

@ballardw

I believe you need a MERGE not a SET for your approach to work as intended.

pau13rown · Posted 06-22-2018 01:40 PM

first create an id variable in each data set and merge on that

data data1_;

retain id 0;

set data1;

id=id+1;

run;

and then merge all 3 datasets "by id". Youll need to rename variables as @Astounding says, and then the code is easy to replace the missings

Ksharp · Posted 06-23-2018 07:02 AM

data want; 
   merge
      a (rename=(g1= ag1 g2=ag2 g3=ag3) )
      b (rename=(g1= bg1 g2=bg2 g3=bg3) )
      c (rename=(g1= cg1 g2=cg2 g3=cg3) )
   ;
   g1= coalesce(ag1, bg1, cg1);
   g2= coalesce(ag2, bg2, cg2);
   g3= coalesce(ag3, bg3, cg3);
   keep g1 g2 g3;
run;

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away