DATA Step, Macro, Functions and more

how to get output expected when by variables have blank values

Reply
Super Contributor
Posts: 272

how to get output expected when by variables have blank values

Dear

I need to merge two data sets by 5 variables. For some OBs there are blank values in the data for the by variables. I am not getting the output when the by variables have blank values.

Data1

id             date                     RD               awk                  datapage

1            2016-07-20                                                        111000
2                                                                                      123456


data2

id              date                    RD                awk                    datapage                     vsyn

1             2016-07-20                                                         111000                          no
2                                                                                         123456                         no


code;

proc sort data=data1;
by id date rd awk datapage;
run;

proc sort data=data2;
by id date rd awk datapage;
run;

data dat3;
merge dat1(in=a) data2(in=b);
by id date rd awk datapage;
if a;
run;

Output needed;

ID              date                        datapage                                         vsyn
1            2016-07-20            111000                                                    no
2                                           123456                                                   no

 

output getting;

 

1              2016-07-20           111000                                                  no
2                                          123456

Super User
Posts: 19,855

Re: how to get output expected when by variables have blank values

Posted in reply to knveraraju91

Are you certain they're an exact match across all fields? 

Super User
Posts: 10,044

Re: how to get output expected when by variables have blank values

Posted in reply to knveraraju91

I get what you want. there are something wrong in your data.

 

data Data1;
input (id             date                     RD               awk                  datapage) (:$20.);
cards;
1            2016-07-20                    . .                                    111000
2                . . .                                                                      123456
;
run;

data data2;
input (id              date                    RD                awk                    datapage                     vsyn) (:$20.);
cards;
1             2016-07-20                      . .                                   111000                          no
2                   . . .                                                                      123456                         no
;
run;
data want;
 merge data1 data2;
 by id date rd awk datapage;
run;
Super Contributor
Posts: 272

Re: how to get output expected when by variables have blank values

Thank you. I found why. I used Length(date) to find why. When the date is missing,  the length are 1 and 10 in these datasets. I fixed the lengths and it worked.

Super User
Posts: 5,516

Re: how to get output expected when by variables have blank values

Posted in reply to knveraraju91

MERGE will do this ... but not for the case you illustrated.  A missing value in the second data set overwrites a valid value in the first data set.  

 

There are many ways to fix this, but the easiest is if you have a one-to-one match (or a one-to-zero match).  In that case, you can switch from MERGE to UPDATE.  

 

Another easy fix:  If the missing values are limited to just one data set, mention that data set first in the MERGE statement.

 

If you have a many-to-one match, and the missing values could be in either data set, you have to jump through some small hoops.  For example

merge dataset1 (in=a) dataset2 (in=b rename=(vsyn=vsyn2));

 

Now if there is a conflict in the values for vsyn, you have to choose which one you want.  But just to get a nonmissing value, you could code:

 

if vsyn=' ' then vsyn=vsyn2;

drop vsyn2;

Ask a Question
Discussion stats
  • 4 replies
  • 205 views
  • 3 likes
  • 4 in conversation