I have a project to import multiple excel files daily into SAS and merge them into one SAS file. I want to do this using macro. The result should be match-merged and have all variables, observations, and values from both datasets. In the following example there are some missing values and what I need is to fill in the missing values from the dataset that have this value.
Here are two datasets as an example:
data Inc;
infile datalines truncover;
input Person_id Id MRN $ Last_Name $ First_Name $ specimen_source $ Gender $ Race $;
cards;
122345 56788 M776 Doe John blood M W
34567 12344 L87765 . Mickey nasal .
90876 36476 009875 Mouse Minnie blood F I
777654 227364 26376 Duck Donald . M B
56761 23676 M765 Pippin Mary . M W
667785 3624554 L97960 Smith Oliver nasal M W
345564 34567 . Johns Karen saliva F W
79908 123540 M78765 Doe Kevin . M A
677654 378754 L8544 Wayne Bruce . M B
run;
data Lab;
infile datalines truncover;
input ID Last_Name $ First_Name $ specimen_source $ Gender $;
datalines;
56788 Doe John blood M
127655 Johns Trevor nasal M
12344 Mouse Mickey nasal .
36476 Mouse Minnie . F
297654 Gatzby Jay saliva M
23676 Pippin Mary . F
3624554 Smith Oliver . M
34857 Miller Edward blood M
176255 Roberts Linda nasal .
378754 Wayne Bruce blood M
854324 Hill Jane . F
run;
I tried different codes but the one below was the best in terms of the result needed.
%MACRO MSORT1(DATASET,VAR);
PROC SORT DATA=&DATASET;
BY &VAR;
RUN;
%MEND MSORT1;
%MACRO MMERGE(DEST,SOURCE1,SOURCE2,VAR);
%msort1(&source1, &var);
%msort1(&source2, &var);
DATA &DEST;
MERGE &SOURCE1 &SOURCE2;
BY &VAR;
RUN;
%MEND MMERGE;
%MMERGE (WORK.labinc, WORK.lab, WORK.inc, id);
Here's the result I got:
This code only got the values from "Inc" dataset and not from "lab". Please check the highlighted values. My question is: Can we manipulate the code so that it can get the values filled from both datasets? For example: in the result above for observation 14 the specimen_source was filled in from "Inc" but I need to get the last name and specimen_source in observations 1 and 12 from "lab" dataset as well.
Thank you
Why do you highlight the blank lastname cell for ID 12344? The lastname is blank for this id in work.ind, and the id is completely absent from work.lab.
Now the blank for specimen for id 378754 is a different issue, because the code
DATA WORK.labinc;
MERGE WORK.lab WORK.inc;
BY id;
RUN;
will always have the value in the left dataset overwritten by the value in the right dataset, even if the value on the right is missing and the value on the left is not missing.
Instead use
DATA WORK.labinc;
UPDATE WORK.lab WORK.inc;
BY id;
RUN;
which will propagate the right-side value only when it is not missing. I.e. a non-missing value will never be replaced by a missing value.
When you merge the data set on the right* will overwrite the ones on the left* if they have the same name.
If you want a different handling you need to:
Or is there a data set you can assume will be the 'master' correct data set. If you can, then merge with that data set as the left most one.
*right/left refer to order listed in the MERGE statement.
Why do you highlight the blank lastname cell for ID 12344? The lastname is blank for this id in work.ind, and the id is completely absent from work.lab.
Now the blank for specimen for id 378754 is a different issue, because the code
DATA WORK.labinc;
MERGE WORK.lab WORK.inc;
BY id;
RUN;
will always have the value in the left dataset overwritten by the value in the right dataset, even if the value on the right is missing and the value on the left is not missing.
Instead use
DATA WORK.labinc;
UPDATE WORK.lab WORK.inc;
BY id;
RUN;
which will propagate the right-side value only when it is not missing. I.e. a non-missing value will never be replaced by a missing value.
@mayasak wrote:
Thank you Mkeintz,
This is exactly what I needed.
Quick question. Does this work on more than one key variable such as first_name and last_name?
Thank you again
What do you mean by "key variable"?
The BY statement accepts a list of variables. Make sure you use the same BY when you sort the datasets.
@mayasak wrote:
Thank you Mkeintz,
This is exactly what I needed.
Quick question. Does this work on more than one key variable such as first_name and last_name?
Thank you again
It does not matter how many variables it takes to form a unique key. But they must form a unique key in the ORIGINAL dataset, that is there is only one observation per by group (per unique combination of the values of all variables listed in the BY statement). Note that you can have multiple observations per by group in the TRANSACTIONS dataset, they will be applied in the order they appear. The resulting dataset will have only one observation per by group.
data want;
update original transactions;
by id1 id2 id3;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.