I'm merging DATA 1 and DATA 2 on the "CODE" variable. I want DATA 2 information update the same name columns in DATA 1. It works fine in demo below. Resulting data WANT has all the columns updated from both datasets merged. However, it doesn't work like this using my actual datasets. Please see screen shot where flag_icd9pc flag takes value 1 indicating MERGE happened but DESC_CODE AND CLASS_CODE columns are not updated from the second data which has information for these two columns. I made sure all same column characteristics in both datasets have same length, format and informat while all being "characters". I formatted them and fixed the variables to the same length before data merge, as below. They all looked fine in Column Attributes in data view by each variables. They're all consistently identical to their lengths, format and informat before merge.
What's going wrong here? My sincere appreciation in advance for your time. Thanks a lot!
length variable $5;
format variable $5.;
informat variable $5.;
DATA DATA1;
INPUT CODE_SYSTEM $ CODE_DESC $ CODE;
CARDS;
. . 1
. . 2
. . 3
. . 4
. . 5
. . 6
HIPPS DISTANT 7
HIPPS LOCAL 8
BILLING DISTANT 9
;
DATA DATA2;
INPUT CODE_SYSTEM $ CODE_DESC $ CODE;
CARDS;
ICD10 LOCAL 1
ICD9 REGIONAL 2
CPT DISTANT 3
ICD10 LOCAL 4
ICD9 DISTANT 5
CPT REGIONAL 6
;
DATA WANT;
INPUT CODE_SYSTEM $ CODE_DESC $ CODE;
CARDS;
ICD10 LOCAL 1
ICD9 REGIONAL 2
CPT DISTANT 3
ICD10 LOCAL 4
ICD9 DISTANT 5
CPT REGIONAL 6
HIPPS DISTANT 7
HIPPS LOCAL 8
BILLING DISTANT 9
;
data WANT1;
merge DATA1 (in=A)
DATA2 (in=B);
by CODE;
if A and B then FLAG_LINKED=1; else FLAG_LINKED=1=0;
if A;
run;
You didn't provide an example of when it doesn't work, so we can only guess. One possibility is that you have some values in data1, but want to replace them with data two. If so, you may want to use modify rather than merge. e.g.:
DATA DATA1;
INPUT CODE_SYSTEM $ CODE_DESC $ CODE;
CARDS;
X Y 1
X Y 2
X Y 3
. . 4
. . 5
. . 6
HIPPS DISTANT 7
HIPPS LOCAL 8
BILLING DISTANT 9
;
DATA DATA2;
INPUT CODE_SYSTEM $ CODE_DESC $ CODE;
CARDS;
ICD10 LOCAL 1
ICD9 REGIONAL 2
CPT DISTANT 3
ICD10 LOCAL 4
ICD9 DISTANT 5
CPT REGIONAL 6
;
data want;
set data1;
run;
data want;
modify want data2 updatemode=nomissingcheck;
by code;
if _iorc_=0 then replace;
else do;
_error_=0;
output;
end;
run;
Art, CEO, AnalystFinder.com
@Cruise wrote:
I'm merging DATA 1 and DATA 2 on the "CODE" variable. I want DATA 2 information update the same name columns in DATA 1. It works fine in demo below. Resulting data WANT has all the columns updated from both datasets merged. However, it doesn't work like this using my actual datasets. Please see screen shot where flag_icd9pc flag takes value 1 indicating MERGE happened but DESC_CODE AND CLASS_CODE columns are not updated from the second data which has information for these two columns. I made sure all same column characteristics in both datasets have same length, format and informat while all being "characters". I formatted them and fixed the variables to the same length before data merge, as below. They all looked fine in Column Attributes in data view by each variables. They're all consistently identical to their lengths, format and informat before merge.
What's going wrong here? My sincere appreciation in advance for your time. Thanks a lot!
length variable $5;
format variable $5.;
informat variable $5.;
With a statement like "However, it doesn't work like this using my actual datasets." the first question is "Are the example values in your data1 and data2 the ones from your actual datasets that do not merge correctly? If not then provide them.
Note that if you have multiple records with the same value of CODE in both data sets you can get unexpected results. If CODE is duplicated in only one set then different results can happen depending on which set has the multiples.
Is Code duplicated in DATA1 in your actual data? If so, are all of the Code values supposed to get the same values from data2?
You didn't provide an example of when it doesn't work, so we can only guess. One possibility is that you have some values in data1, but want to replace them with data two. If so, you may want to use modify rather than merge. e.g.:
DATA DATA1;
INPUT CODE_SYSTEM $ CODE_DESC $ CODE;
CARDS;
X Y 1
X Y 2
X Y 3
. . 4
. . 5
. . 6
HIPPS DISTANT 7
HIPPS LOCAL 8
BILLING DISTANT 9
;
DATA DATA2;
INPUT CODE_SYSTEM $ CODE_DESC $ CODE;
CARDS;
ICD10 LOCAL 1
ICD9 REGIONAL 2
CPT DISTANT 3
ICD10 LOCAL 4
ICD9 DISTANT 5
CPT REGIONAL 6
;
data want;
set data1;
run;
data want;
modify want data2 updatemode=nomissingcheck;
by code;
if _iorc_=0 then replace;
else do;
_error_=0;
output;
end;
run;
Art, CEO, AnalystFinder.com
You don't need that step UNLESS you're unsure that the process will do what you want. I included it so that the original file would be left intact.
Art, CEO, AnalystFinder.com
Hi,
Does class_code have any blank values on data2?
Please post the log of the merge that is not working, showing the code and any messages.
Regards,
Amir.
Awesome. Now I can see my parent data is modified with all the information needed. How to flag the rows modified though? All the rows modified are all linked on the linkage variable? My input information is not unique to a data2. Modified dataset had included all rows from both datasets. How to keep data aligned to one of datasets like it's done in data merge?
Trying to flag the matched rows from data modify. below didn't create a flag for me. Any idea how to flag matched rows in "modify mode"? thanks in advance.
data n.want;
modify n.want(in=w)
n.Icd9_pc(in=icd) updatemode=nomissingcheck;
by omm_proc_cd1;
if _iorc_=0 then replace;
else do;
_error_=0;
output;
end;
if w and icd then flag=1;
run;
Read up on the modify statement (e.g., http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000173361.htm ).
Your answer will depend upon the combination of what your new file looks like and the updatemode you select.
Without seeing your actual datasets it's difficult to say what will do exactly what you want. You could use merge, update or modify. And, for each, you use options to control what will happen.
Art, CEO, AnalystFinder.com
I don't understand the question. Maybe because you provided a bad example?
If I just merge the two datasets you provided by CODE then I get exactly what you said you wanted.
data want;
merge data1 data2 ;
by code;
run;
Note that your example case both datasets have only one observation per value of CODE.
The merge will also work if DATA1 has multiple observations per value of CODE and DATA2 is unique.
But if DATA2 has multiple observation per value of CODE then you could get replication of the observations from DATA1.
Or if both have multiple observations per value of CODE then only partial replacement of the values from DATA1 will occur. SAS could run out of records to read from DATA2 for that specific value of CODE. and so not replace the values of the common variables read from DATA1.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.