Hi,
I would like to do Match Merge 2 datasets of different variables. Currently when I use the BY statement, any missing values are retained from last value for the match. That creates lot of duplicates.
DATA_A :
BOAVTS01 E082 E08
BOAVTS01 E242 E24
BOAVTS01 E282 E28
BOAVTS01 4102 F20
BOAVTS01 E522 F52
BOAVTS01 E102 G10
BOAVTS01 E302 G30
TXSVTS30 3302 1G
TXSVTS30 3002 3S
TXSVTS40 3302 1G
TXSVTS40 3002 3S
DATA_B;
BOAVTS01 UKDUKC01
BOAVTS01 UKDUKC02
BOAVTS01 UKDUKC03
BOAVTS01 UKDUKC04
TXSVTS30 KCDTXS36
TXSVTS30 KCDTXS38
TXSVTS30 TXDVTS35
TXSVTS30 TXDVTS37
TXSVTS40 KCDTXS44
TXSVTS40 KCDTXS46
TXSVTS40 TXDVTS43
TXSVTS40 TXDVTS45
DATA DATA_C ;
MERGE
DATA_A
DATA_B
;
BY COMPLIB ;
My current output:
BOAVTS01 E082 E08 UKDUKC01
BOAVTS01 E242 E24 UKDUKC02
BOAVTS01 E282 E28 UKDUKC03
BOAVTS01 4102 F20 UKDUKC04
BOAVTS01 E522 F52 UKDUKC04
BOAVTS01 E102 G10 UKDUKC04
BOAVTS01 E302 G30 UKDUKC04
TXSVTS30 3302 1G KCDTXS36
TXSVTS30 3002 3S KCDTXS38
TXSVTS30 3002 3S TXDVTS35
TXSVTS30 3002 3S TXDVTS37
TXSVTS40 3302 1G KCDTXS44
TXSVTS40 3002 3S KCDTXS46
TXSVTS40 3002 3S TXDVTS43
TXSVTS40 3002 3S TXDVTS45
When i do the above code to merge the output is like this which is ugly like below: The missing values are assigned as the last value on the BY variable.
The last observation of the by variable is repeating for the missing values. How to avoid that ? i want the missing values to be considered as missing rather than taking last observation. Any inputs are much appreciated:
My desired output:
BOAVTS01 E082 E08 UKDUKC01
BOAVTS01 E242 E24 UKDUKC02
BOAVTS01 E282 E28 UKDUKC03
BOAVTS01 4102 F20 UKDUKC04
BOAVTS01 E522 F52 .
BOAVTS01 E102 G10 .
BOAVTS01 E302 G30 .
TXSVTS30 3302 1G KCDTXS36
TXSVTS30 3002 3S KCDTXS38
TXSVTS30 . . TXDVTS35
TXSVTS30 . . TXDVTS37
TXSVTS40 3302 1G KCDTXS44
TXSVTS40 3002 3S KCDTXS46
TXSVTS40 . . TXDVTS43
TXSVTS40 . . TXDVTS45
Thank you,
Sathish
Sorry, I don't see why you would want to do this. It seems very random to just merge data to other data based on position. The reason you have the data merging to all the other data is that you by group, or matching variables are not enough to show a distinct merge. If for instance I sort the input slightly differently I would get different results, hence not a good idea. Personally I would investigate what actually needs to be merged together and use other variables, or assign variables to make that merge explicit. If you really only want position in the dataset (which as I mentioned could be different each time), then assign an order variable and merge on that as well, e.g.
DATA_A :
BOAVTS01 E082 E08 1
BOAVTS01 E242 E24 2
BOAVTS01 E282 E28 3
BOAVTS01 4102 F20 4
BOAVTS01 E522 F52 5
BOAVTS01 E102 G10 6
BOAVTS01 E302 G30 7
TXSVTS30 3302 1G 1
TXSVTS30 3002 3S 1
DATA_B;
BOAVTS01 UKDUKC01 1
BOAVTS01 UKDUKC02 2
BOAVTS01 UKDUKC03 3
BOAVTS01 UKDUKC04 4
TXSVTS30 KCDTXS36 1
TXSVTS30 KCDTXS38 2
I'm not sure why in your desired result the first column can be blank. I assume this first column is "COMPLIB" and as this is the key it should always have a value (as at least one of the 2 tables will provide such a key).
Make sure you fully understand how the result set differs in cases of many-to-many relationships between using a data step "merge" and a SQL join.
I believe below code does what you're asking for.
DATA DATA_C;
if 0 then set a b;
call missing(of _all_);
MERGE
DATA_A
DATA_B
;
BY COMPLIB;
run;
Thanks Patrick for yore response. I will try your method also for my learning.
Sorry, I don't see why you would want to do this. It seems very random to just merge data to other data based on position. The reason you have the data merging to all the other data is that you by group, or matching variables are not enough to show a distinct merge. If for instance I sort the input slightly differently I would get different results, hence not a good idea. Personally I would investigate what actually needs to be merged together and use other variables, or assign variables to make that merge explicit. If you really only want position in the dataset (which as I mentioned could be different each time), then assign an order variable and merge on that as well, e.g.
DATA_A :
BOAVTS01 E082 E08 1
BOAVTS01 E242 E24 2
BOAVTS01 E282 E28 3
BOAVTS01 4102 F20 4
BOAVTS01 E522 F52 5
BOAVTS01 E102 G10 6
BOAVTS01 E302 G30 7
TXSVTS30 3302 1G 1
TXSVTS30 3002 3S 1
DATA_B;
BOAVTS01 UKDUKC01 1
BOAVTS01 UKDUKC02 2
BOAVTS01 UKDUKC03 3
BOAVTS01 UKDUKC04 4
TXSVTS30 KCDTXS36 1
TXSVTS30 KCDTXS38 2
RW9, i tried using your suggestion of creating a new variable called SEQ and merged by COMPLIB SEQ to make it more distinct. It worked fine now. I got what i wanted. Thank you so much!
I'm so glad you found a solution, Sathish. Can you mark correct or helpful answers so other community members with a similar challenge can easily find it?
Thanks!
Anna
I did. Thanks
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.