Solved: How to merge matching observations but have different rows for non-mat...

cdunlea · Posted 08-12-2020 10:33 AM

Hi all, I have the following two datasets I would like to merge.

ID_1	CODE
10059	11
10059	36
10077	7
10077	43
10233	29
10233	31
10437	32

ID_2	CODES
10059	38
10059	41
10077	21
10077	23
10077	25
10233	29
10437	32

When merging currently using the following code:

DATA Merged;
merge wave1 wave2 ; by id;
run;

It comes out looking like this:

	id	code	codes
1	10059	11	38

2

10059

36

41

3

10077

7

21

4

10077

43

25

5

10077

43

23

6

10233

29

Whereas this is what I would like it to look like. if the two "code" columns do not match I would like them to be in separate rows. Similar to stacking the data. However if they are MATCHING, I would like them to be merged in the same row (highlighted in red).

	id	code	codes
1	10059	11	.
2	10059	.	38

3	10059	36	.
4	10059	.	41

5	10077	7	.
6	10077	.	21

7	10077	43	.
8	10077	.	25

9	10077	43	.
10	10077	.	23

11

10233

29

12	10233	31	.
13	10233	.	29

14

10437

32

Any ideas would be appreciated!

Shmuel · Posted 08-12-2020 01:57 PM

Add next steps to previous code:

data  id2del;
   set want;  /*output of previous step */
        if code=codes;
        keep ID;
run;
proc sql; 
      create table want_new
      as select * from want
      where code ne codes and
        id in (select ID from id2del);
quit;

View solution in original post

Shmuel · Posted 08-12-2020 10:44 AM

Try next code:

data both / view=both;
  set table1 table2;
run;
proc sort data=both out=want nodupkey;
  by id code codes;
run;

cdunlea · Posted 08-12-2020 10:51 AM

Thanks for the reply, but this code results in the matching observations still being in separate rows.

The output looks good for those that do not match but it does not merge observations with the same value in each column (seen in red in original post).

Thanks again!

Shmuel · Posted 08-12-2020 11:32 AM

Then you need add some steps to your code:

DATA Merged;
merge wave1 wave2 ; 
  by id;
run;
data want;
 set merged;
     retain _code; drop _code;
     if code=codes then output; 
     else do;
           _code = codes;
           _codes = .;
           output;
           codes = _code;
           code=.;
           output;
     end;
run;

cdunlea · Posted 08-12-2020 11:53 AM

Thanks! That almost works!

The only thing is that I still want this "38" in the first row to be coded as missing and only found in row 2. (Or the "41 in row 3 to be coded as "." and only found in row 4.) How can I specify this?

	id	code	codes
1	10059	1	38

2

10059

.

38

3

10059

10

41

4

10059

.

41

Colleen

Shmuel · Posted 08-12-2020 12:09 PM

There was a typo in my code. Code tested should be:

DATA Merged;
merge wave1 wave2 ;   by id;
run;
data want;
 set merged;
     retain _code; drop _code;
     if code=codes then output; 
     else do;
           _code = codes;
           codes = .;
           output;
           codes = _code;
           code=.;
           output;
     end;
run;

cdunlea · Posted 08-12-2020 12:29 PM

This one almost works. However, it seems to be only working about half the time. It kept row 12, which should have been deleted. But in row 14, it did it perfectly. Perhaps it is the length of the observations in the "code" columns?

	id	code	codes
12	10233	.	242930

13

10233

242930

14

10437

3234

Shmuel · Posted 08-12-2020 12:51 PM

What is the rule to delete a record?

cdunlea · Posted 08-12-2020 01:32 PM

If the two "code" columns do not match I would like them to be in separate rows. This works.

If they are MATCHING codes, I would like them to be merged in the same row and the other row to be deleted (highlighted in red) so that it appears like row 14 here, just once.

	id	code	codes
12	10233	.	242930

13

10233

242930

14

10437

3234

Shmuel · Posted 08-12-2020 01:57 PM

Add next steps to previous code:

data  id2del;
   set want;  /*output of previous step */
        if code=codes;
        keep ID;
run;
proc sql; 
      create table want_new
      as select * from want
      where code ne codes and
        id in (select ID from id2del);
quit;

Ksharp · Posted 08-13-2020 09:12 AM

data a;
infile cards expandtabs;
input ID	CODE;
cards;
10059	11
10059	36
10077	7
10077	43
10233	29
10233	31
10437	32
;
 
data b;
infile cards expandtabs;
input ID	CODES;
cards;
10059	38
10059	41
10077	21
10077	23
10077	25
10233	29
10437	32
;

data temp;
 merge a b;
 by id ;
run;
data want;
 set temp;
 if code ne codes then do;
  temp=codes;
  call missing(codes);
  output;
  codes=temp;
  call missing(code);
  output;
 end;
 else output;
 drop temp;
run;

How to merge matching observations but have different rows for non-matching observations?

Re: How to merge matching observations but have different rows for non-matching observations?

Re: How to merge matching observations but have different rows for non-matching observations?

Re: How to merge matching observations but have different rows for non-matching observations?

Re: How to merge matching observations but have different rows for non-matching observations?

Re: How to merge matching observations but have different rows for non-matching observations?

Re: How to merge matching observations but have different rows for non-matching observations?

Re: How to merge matching observations but have different rows for non-matching observations?

Re: How to merge matching observations but have different rows for non-matching observations?

Re: How to merge matching observations but have different rows for non-matching observations?

Re: How to merge matching observations but have different rows for non-matching observations?

Re: How to merge matching observations but have different rows for non-matching observations?

Classroom Training Available!