BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mayasak
Quartz | Level 8

I have a project to import multiple excel files daily into SAS and merge them into one SAS file. I want to do this using macro. The result should be match-merged and have all variables, observations, and values from both datasets. In the following example there are some missing values and what I need is to fill in the missing values from the dataset that have this value.

Here are two datasets as an example:

 

data Inc;
infile datalines truncover;
input Person_id Id MRN $ Last_Name $ First_Name $ specimen_source $ Gender $ Race $;
cards;
122345 56788 M776 Doe John blood M W
34567 12344 L87765 . Mickey nasal .
90876 36476 009875 Mouse Minnie blood F I
777654 227364 26376 Duck Donald . M B
56761 23676 M765 Pippin Mary . M W
667785 3624554 L97960 Smith Oliver nasal M W
345564 34567 . Johns Karen saliva F W
79908 123540 M78765 Doe Kevin . M A
677654 378754 L8544 Wayne Bruce . M B
run;

data Lab;
infile datalines truncover;
input ID Last_Name $ First_Name $ specimen_source $ Gender $;
datalines;
56788 Doe John blood M
127655 Johns Trevor nasal M
12344 Mouse Mickey nasal .
36476 Mouse Minnie . F
297654 Gatzby Jay saliva M
23676 Pippin Mary . F
3624554 Smith Oliver . M
34857 Miller Edward blood M
176255 Roberts Linda nasal .
378754 Wayne Bruce blood M
854324 Hill Jane . F
run;

 

I tried different codes but the one below was the best in terms of the result needed.

 

%MACRO MSORT1(DATASET,VAR);
PROC SORT DATA=&DATASET;
BY &VAR;
RUN;
%MEND MSORT1;

%MACRO MMERGE(DEST,SOURCE1,SOURCE2,VAR);

%msort1(&source1, &var);
%msort1(&source2, &var);

DATA &DEST;
MERGE &SOURCE1 &SOURCE2;
BY &VAR;
RUN;
%MEND MMERGE;
%MMERGE (WORK.labinc, WORK.lab, WORK.inc, id);

 

Here's the result I got:

 

mayasak_0-1659133202779.png

 

This code only got the values from "Inc" dataset and not from "lab". Please check the highlighted values. My question is: Can we manipulate the code so that it can get the values filled from both datasets?  For example: in the result above for observation 14 the specimen_source was filled in from "Inc" but I need to get the last name and specimen_source in observations 1 and 12 from "lab" dataset as well.

Thank you

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Why do you highlight the blank lastname cell for ID 12344?   The lastname is blank for this id in work.ind, and the id is completely absent from work.lab.

 

Now the blank for specimen for id 378754 is a different issue, because the code

 

DATA WORK.labinc;
   MERGE WORK.lab WORK.inc;
   BY id;
RUN;

will always have the value in the left dataset overwritten by the value in the right dataset, even if the value on the right is missing and the value on the left is not missing.

 

Instead use

DATA WORK.labinc;
   UPDATE WORK.lab WORK.inc;
   BY id;
RUN;

which will propagate the right-side value only when it is not missing.  I.e. a non-missing value will never be replaced by a missing value.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
Reeza
Super User

When you merge the data set on the right* will overwrite the ones on the left* if they have the same name. 
If you want a different handling you need to:

  • Rename the variables so they are different and explicitly deal with them
  • Use PROC SQL and COALESCE() to explicitly deal with this case. 

Or is there a data set you can assume will be the 'master' correct data set. If you can, then merge with that data set as the left most one. 

 

*right/left refer to order listed in the MERGE statement. 

 

 

mkeintz
PROC Star

Why do you highlight the blank lastname cell for ID 12344?   The lastname is blank for this id in work.ind, and the id is completely absent from work.lab.

 

Now the blank for specimen for id 378754 is a different issue, because the code

 

DATA WORK.labinc;
   MERGE WORK.lab WORK.inc;
   BY id;
RUN;

will always have the value in the left dataset overwritten by the value in the right dataset, even if the value on the right is missing and the value on the left is not missing.

 

Instead use

DATA WORK.labinc;
   UPDATE WORK.lab WORK.inc;
   BY id;
RUN;

which will propagate the right-side value only when it is not missing.  I.e. a non-missing value will never be replaced by a missing value.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mayasak
Quartz | Level 8
Thank you Mkeintz,
This is exactly what I needed.
Quick question. Does this work on more than one key variable such as first_name and last_name?
Thank you again
mkeintz
PROC Star

@mayasak wrote:
Thank you Mkeintz,
This is exactly what I needed.
Quick question. Does this work on more than one key variable such as first_name and last_name?
Thank you again

What do you mean by "key variable"?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mayasak
Quartz | Level 8
The variable that links both dataset. For example, I used ID.
Tom
Super User Tom
Super User

@mayasak wrote:
Thank you Mkeintz,
This is exactly what I needed.
Quick question. Does this work on more than one key variable such as first_name and last_name?
Thank you again

It does not matter how many variables it takes to form a unique key.  But they must form a unique key in the ORIGINAL dataset, that is there is only one observation per by group (per unique combination of the values of all variables listed in the BY statement).  Note that you can have multiple observations per by group in the TRANSACTIONS dataset, they will be applied in the order they appear.  The resulting dataset will have only one observation per by group.

data want;
  update original transactions;
  by id1 id2 id3;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1462 views
  • 3 likes
  • 5 in conversation