DATA Step, Macro, Functions and more

merging dataset with missing values

Reply
Occasional Contributor
Posts: 15

merging dataset with missing values

I have three data sets (A,B,C) all of them have missing values but if i combine them together it should output a complete data set. All of the data set contains the variables(g1,g2,g3) and the data set kind of looks like this:

 

      g1 g2 g3               g1 g2 g3             g1 g2 g3

1      1.1 2.2              1 .    .   .              1 .    .   .

2       2 3  4               2  .   .   .               2.   .   .

3       5 6 7               3  .    .   .              3 .    .   .

4      1  2 3               4  .  .   .                 4 .   .  .

5      .   .  .                5 1 2 3                 5  .  .  .

6     1 2  3                 6 .   . .                  6.   .  .

7     .   .   .                                              7  5 6 7

 

I need the output data set to look like this:

 

       g1 g2 g3

1     1.1 2  2

2     2  3   4

3     5 6 7

4     1  2 3

5    1  2  3

6    1  2 3

7    5   6 7

I have tried the code below:

 

data m;
merge a b  c;
by g1 g2 g3;
retain tempg1 tempg2 tempg3;
if first.g1 then tempg1=.;
else if first.g2 then tempg2=.;
else if first.g3 then tempg3=.;
if g1=. then g1=tempg1;
else if g2=. then g2=tempg2;
else if g3=. then g3=tempg3;
if tempg1=g1;
if tempg2=g2;
if tempg3=g3;
run;

 

SAS says that the data sets have to be sorted but then the one data's missing values do not correspond non missing values in the other data set. Any help will be appreciated,

Super User
Posts: 6,762

Re: merging dataset with missing values

What's the name of the first variable, the one that numbers the observations 1 through 7?

Occasional Contributor
Posts: 15

Re: merging dataset with missing values

Posted in reply to Astounding

Hi, it doesn't have a name I guess its just the observations. 

Super User
Posts: 6,762

Re: merging dataset with missing values

[ Edited ]

WIthout such a variable combining the data sets is unreliable.  You have different numbers of observations in each data set.  How do you know which ones are the right matches?

 

Assuming you do have a variable named ID that holds the contents of the first column ...

 

data combined;

set c b a;

run;

 

proc sort data=combined;

by id;

run;

 

data want;

update combined (obs=0) combined;

by id;

Super User
Posts: 13,523

Re: merging dataset with missing values

Anywhere you use a BY statement the data set(s) must be sorted by all of the variables on the BY statement. From your example I don't believe that you want to use a by statement.

 

You need to provide some rules as to WHICH specific value you want when the variable has values in more than one data set.

 

If you need to compare in the existing order and want to pick the value from a if it exist, from b if it exists but not in a or from c if not in a or b then perhaps:

 

data want; 
   set
      a (rename=(g1= ag1 g2=ag2 g3=ag3) )
      b (rename=(g1= bg1 g2=bg2 g3=bg3) )
      c (rename=(g1= cg1 g2=cg2 g3=cg3) )
   ;
   g1= coalesce(ag1, bg1, cg1);
   g2= coalesce(ag2, bg2, cg2);
   g3= coalesce(ag3, bg3, cg3);
   keep g1 g2 g3;
run;

if the variables are actually character values then the function would be coalescec

 

Respected Advisor
Posts: 4,736

Re: merging dataset with missing values

@ballardw

I believe you need a MERGE not a SET for your approach to work as intended.

Regular Contributor
Posts: 158

Re: merging dataset with missing values

first create an id variable in each data set and merge on that

 

data data1_;

retain id 0;

set data1;

id=id+1;

run;

 

and then merge all 3 datasets "by id". Youll need to rename variables as @Astounding says, and then the code is easy to replace the missings

--------------
blog: papersandprograms.com
Super User
Posts: 10,770

Re: merging dataset with missing values

data want; 
   merge
      a (rename=(g1= ag1 g2=ag2 g3=ag3) )
      b (rename=(g1= bg1 g2=bg2 g3=bg3) )
      c (rename=(g1= cg1 g2=cg2 g3=cg3) )
   ;
   g1= coalesce(ag1, bg1, cg1);
   g2= coalesce(ag2, bg2, cg2);
   g3= coalesce(ag3, bg3, cg3);
   keep g1 g2 g3;
run;
Ask a Question
Discussion stats
  • 7 replies
  • 123 views
  • 0 likes
  • 6 in conversation