Hi all, I have the following two datasets I would like to merge.
ID_1 | CODE |
10059 | 11 |
10059 | 36 |
10077 | 7 |
10077 | 43 |
10233 | 29 |
10233 | 31 |
10437 | 32 |
ID_2 | CODES |
10059 | 38 |
10059 | 41 |
10077 | 21 |
10077 | 23 |
10077 | 25 |
10233 | 29 |
10437 | 32 |
When merging currently using the following code:
DATA Merged;
merge wave1 wave2 ; by id;
run;
It comes out looking like this:
id | code | codes | |
1 | 10059 | 11 | 38 |
2 | 10059 | 36 | 41 |
3 | 10077 | 7 | 21 |
4 | 10077 | 43 | 25 |
5 | 10077 | 43 | 23 |
6 | 10233 | 29 | 29 |
Whereas this is what I would like it to look like. if the two "code" columns do not match I would like them to be in separate rows. Similar to stacking the data. However if they are MATCHING, I would like them to be merged in the same row (highlighted in red).
id | code | codes | |
1 | 10059 | 11 | . |
2 | 10059 | . | 38 |
3 | 10059 | 36 | . |
4 | 10059 | . | 41 |
5 | 10077 | 7 | . |
6 | 10077 | . | 21 |
7 | 10077 | 43 | . |
8 | 10077 | . | 25 |
9 | 10077 | 43 | . |
10 | 10077 | . | 23 |
11 | 10233 | 29 | 29 |
12 | 10233 | 31 | . |
13 | 10233 | . | 29 |
14 | 10437 | 32 | 32 |
Any ideas would be appreciated!
Add next steps to previous code:
data id2del;
set want; /*output of previous step */
if code=codes;
keep ID;
run;
proc sql;
create table want_new
as select * from want
where code ne codes and
id in (select ID from id2del);
quit;
Try next code:
data both / view=both;
set table1 table2;
run;
proc sort data=both out=want nodupkey;
by id code codes;
run;
Thanks for the reply, but this code results in the matching observations still being in separate rows.
The output looks good for those that do not match but it does not merge observations with the same value in each column (seen in red in original post).
Thanks again!
Then you need add some steps to your code:
DATA Merged;
merge wave1 wave2 ;
by id;
run;
data want;
set merged;
retain _code; drop _code;
if code=codes then output;
else do;
_code = codes;
_codes = .;
output;
codes = _code;
code=.;
output;
end;
run;
Thanks! That almost works!
The only thing is that I still want this "38" in the first row to be coded as missing and only found in row 2. (Or the "41 in row 3 to be coded as "." and only found in row 4.) How can I specify this?
id | code | codes | |
1 | 10059 | 1 | 38 |
2 | 10059 | . | 38 |
3 | 10059 | 10 | 41 |
4 | 10059 | . | 41 |
Colleen
There was a typo in my code. Code tested should be:
DATA Merged;
merge wave1 wave2 ; by id;
run;
data want;
set merged;
retain _code; drop _code;
if code=codes then output;
else do;
_code = codes;
codes = .;
output;
codes = _code;
code=.;
output;
end;
run;
This one almost works. However, it seems to be only working about half the time. It kept row 12, which should have been deleted. But in row 14, it did it perfectly. Perhaps it is the length of the observations in the "code" columns?
id | code | codes | |
12 | 10233 | . | 242930 |
13 | 10233 | 242930 | 242930 |
14 | 10437 | 3234 | 3234 |
What is the rule to delete a record?
If the two "code" columns do not match I would like them to be in separate rows. This works.
If they are MATCHING codes, I would like them to be merged in the same row and the other row to be deleted (highlighted in red) so that it appears like row 14 here, just once.
id | code | codes | |
12 | 10233 | . | 242930 |
13 | 10233 | 242930 | 242930 |
14 | 10437 | 3234 | 3234 |
Add next steps to previous code:
data id2del;
set want; /*output of previous step */
if code=codes;
keep ID;
run;
proc sql;
create table want_new
as select * from want
where code ne codes and
id in (select ID from id2del);
quit;
data a; infile cards expandtabs; input ID CODE; cards; 10059 11 10059 36 10077 7 10077 43 10233 29 10233 31 10437 32 ; data b; infile cards expandtabs; input ID CODES; cards; 10059 38 10059 41 10077 21 10077 23 10077 25 10233 29 10437 32 ; data temp; merge a b; by id ; run; data want; set temp; if code ne codes then do; temp=codes; call missing(codes); output; codes=temp; call missing(code); output; end; else output; drop temp; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.