Solved: How to create Match merge with n x n observations and Missing ?

sathishthangamani · Posted 07-04-2014 12:23 AM

Hi,

I would like to do Match Merge 2 datasets of different variables. Currently when I use the BY statement, any missing values are retained from last value for the match. That creates lot of duplicates.

DATA_A :

BOAVTS01 E082 E08

BOAVTS01 E242 E24

BOAVTS01 E282 E28

BOAVTS01 4102 F20

BOAVTS01 E522 F52

BOAVTS01 E102 G10

BOAVTS01 E302 G30

TXSVTS30 3302 1G

TXSVTS30 3002 3S

TXSVTS40 3302 1G

TXSVTS40 3002 3S

DATA_B;

BOAVTS01 UKDUKC01

BOAVTS01 UKDUKC02

BOAVTS01 UKDUKC03

BOAVTS01 UKDUKC04

TXSVTS30 KCDTXS36

TXSVTS30 KCDTXS38

TXSVTS30 TXDVTS35

TXSVTS30 TXDVTS37

TXSVTS40 KCDTXS44

TXSVTS40 KCDTXS46

TXSVTS40 TXDVTS43

TXSVTS40 TXDVTS45

DATA DATA_C ;

MERGE

DATA_A

DATA_B

;

BY COMPLIB ;

My current output:

BOAVTS01 E082 E08 UKDUKC01

BOAVTS01 E242 E24 UKDUKC02

BOAVTS01 E282 E28 UKDUKC03

BOAVTS01 4102 F20 UKDUKC04

BOAVTS01 E522 F52 UKDUKC04

BOAVTS01 E102 G10 UKDUKC04

BOAVTS01 E302 G30 UKDUKC04

TXSVTS30 3302 1G KCDTXS36

TXSVTS30 3002 3S KCDTXS38

TXSVTS30 3002 3S TXDVTS35

TXSVTS30 3002 3S TXDVTS37

TXSVTS40 3302 1G KCDTXS44

TXSVTS40 3002 3S KCDTXS46

TXSVTS40 3002 3S TXDVTS43

TXSVTS40 3002 3S TXDVTS45

When i do the above code to merge the output is like this which is ugly like below: The missing values are assigned as the last value on the BY variable.

The last observation of the by variable is repeating for the missing values. How to avoid that ? i want the missing values to be considered as missing rather than taking last observation. Any inputs are much appreciated:

My desired output:

BOAVTS01 E082 E08 UKDUKC01

BOAVTS01 E242 E24 UKDUKC02

BOAVTS01 E282 E28 UKDUKC03

BOAVTS01 4102 F20 UKDUKC04

BOAVTS01 E522 F52 .

BOAVTS01 E102 G10 .

BOAVTS01 E302 G30 .

TXSVTS30 3302 1G KCDTXS36

TXSVTS30 3002 3S KCDTXS38

TXSVTS30 . . TXDVTS35

TXSVTS30 . . TXDVTS37

TXSVTS40 3302 1G KCDTXS44

TXSVTS40 3002 3S KCDTXS46

TXSVTS40 . . TXDVTS43

TXSVTS40 . . TXDVTS45

Thank you,

Sathish

RW9 · Posted 07-04-2014 04:29 AM

Sorry, I don't see why you would want to do this. It seems very random to just merge data to other data based on position. The reason you have the data merging to all the other data is that you by group, or matching variables are not enough to show a distinct merge. If for instance I sort the input slightly differently I would get different results, hence not a good idea. Personally I would investigate what actually needs to be merged together and use other variables, or assign variables to make that merge explicit. If you really only want position in the dataset (which as I mentioned could be different each time), then assign an order variable and merge on that as well, e.g.

DATA_A :

BOAVTS01 E082 E08 1

BOAVTS01 E242 E24 2

BOAVTS01 E282 E28 3

BOAVTS01 4102 F20 4

BOAVTS01 E522 F52 5

BOAVTS01 E102 G10 6

BOAVTS01 E302 G30 7

TXSVTS30 3302 1G 1

TXSVTS30 3002 3S 1

DATA_B;

BOAVTS01 UKDUKC01 1

BOAVTS01 UKDUKC02 2

BOAVTS01 UKDUKC03 3

BOAVTS01 UKDUKC04 4

TXSVTS30 KCDTXS36 1

TXSVTS30 KCDTXS38 2

View solution in original post

Patrick · Posted 07-04-2014 04:01 AM

I'm not sure why in your desired result the first column can be blank. I assume this first column is "COMPLIB" and as this is the key it should always have a value (as at least one of the 2 tables will provide such a key).

Make sure you fully understand how the result set differs in cases of many-to-many relationships between using a data step "merge" and a SQL join.

I believe below code does what you're asking for.

DATA DATA_C;

if 0 then set a b;

call missing(of _all_);

MERGE

DATA_A

DATA_B

;

BY COMPLIB;

run;

sathishthangamani · Posted 07-04-2014 06:18 AM

Thanks Patrick for yore response. I will try your method also for my learning.

RW9 · Posted 07-04-2014 04:29 AM

Sorry, I don't see why you would want to do this. It seems very random to just merge data to other data based on position. The reason you have the data merging to all the other data is that you by group, or matching variables are not enough to show a distinct merge. If for instance I sort the input slightly differently I would get different results, hence not a good idea. Personally I would investigate what actually needs to be merged together and use other variables, or assign variables to make that merge explicit. If you really only want position in the dataset (which as I mentioned could be different each time), then assign an order variable and merge on that as well, e.g.

DATA_A :

BOAVTS01 E082 E08 1

BOAVTS01 E242 E24 2

BOAVTS01 E282 E28 3

BOAVTS01 4102 F20 4

BOAVTS01 E522 F52 5

BOAVTS01 E102 G10 6

BOAVTS01 E302 G30 7

TXSVTS30 3302 1G 1

TXSVTS30 3002 3S 1

DATA_B;

BOAVTS01 UKDUKC01 1

BOAVTS01 UKDUKC02 2

BOAVTS01 UKDUKC03 3

BOAVTS01 UKDUKC04 4

TXSVTS30 KCDTXS36 1

TXSVTS30 KCDTXS38 2

sathishthangamani · Posted 07-04-2014 06:17 AM

RW9, i tried using your suggestion of creating a new variable called SEQ and merged by COMPLIB SEQ to make it more distinct. It worked fine now. I got what i wanted. Thank you so much!

AnnaBrown · Posted 07-07-2014 09:44 AM

I'm so glad you found a solution, Sathish. Can you mark correct or helpful answers so other community members with a similar challenge can easily find it?

Thanks!

Anna

Access SAS Innovate on-demand content now!

sathishthangamani · Posted 07-07-2014 09:47 AM

I did. Thanks

How to create Match merge with n x n observations and Missing ?

Re: How to create Match merge with n x n observations and Missing ?

Re: How to create Match merge with n x n observations and Missing ?

Re: How to create Match merge with n x n observations and Missing ?

Re: How to create Match merge with n x n observations and Missing ?

Re: How to create Match merge with n x n observations and Missing ?

Re: How to create Match merge with n x n observations and Missing ?

Re: How to create Match merge with n x n observations and Missing ?

Registration is open