Here is some test data:
data have;
infile datalines;
input RX_Chain $ Exclusion $ CLIENT $ ZIP;
datalines;
ALD854 N IL 22309
JX139D Y NE 21234
IL603X N MT 99087
AB2009 Y NM 20208
;
run;
data have2;
infile datalines;
input Network $ Chain $;
datalines;
ALD854 00455
JX139D 00139
JX307E 00540
IL603X 00651
;
run;
And here is some PROC SQL that will accomplish what I want:
proc sql;
create table want1 as
select t1.RX_Chain,
t1.CLIENT,
t1.ZIP,
(CASE
WHEN t1.RX_Chain = t2.NETWORK THEN t2.CHAIN
ELSE t1.RX_Chain
END) format=$6. AS RX_Chain
from have t1 left join have2 t2 on(t1.RX_Chain=t2.Network)
where t1.Exclusion = 'N';
quit;
But, I want to accomplish this with a merge. The trouble is with the naming: RX_Chain.
I attempted a start at the merge and it looks like this (after sorting initial datasets):
data want;
merge have(in=h) have2(in=h2 rename=(Network=RX_Chain));
by RX_Chain;
if h and h2 then RX_Chain=Chain;
else RX_Chain=RX_Chain;
if h and Exclusion='N';
keep RX_Claim Client ZIP;
run;
However, I am not getting the same results as the PROC SQL. I know it has to do with the naming of RX_Chain and trying to assign a value to it but basically I'm trying to change the value of RX_Chain in have dataset to Chain in have2 dataset when RX_Chain = Network, or leave it as same value it was if not.
Your SQL query contains a duplicate column RX_chain. Beside that, you should get what you want with:
data want;
merge
have(in=h1)
have2(in=h2 rename=Network=RX_Chain);
by RX_chain;
if h1 and Exclusion='N';
if h2 then RX_Chain=Chain;
keep RX_Chain Client ZIP;
run;
First, why do a merge when you got a working SQL?
Second, data step handles variable differently than SQL. By renaming Network its values will replace the values provided by Have.
DOn't rename, use Network in your if logic and assignments.
@LinusH But when I don't rename it, on the by statement, it throws an error "Variable RX_Chain is not on dataset have2". So it seems like I have to rename it..
Apparently I didn't thought it through.
So we are lucky that @PGStats came to the rescue
Your SQL query contains a duplicate column RX_chain. Beside that, you should get what you want with:
data want;
merge
have(in=h1)
have2(in=h2 rename=Network=RX_Chain);
by RX_chain;
if h1 and Exclusion='N';
if h2 then RX_Chain=Chain;
keep RX_Chain Client ZIP;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.