Solved: Re: PROC SQL vs Merge tricky naming

JediApprentice · Posted 11-07-2016 02:24 PM

Here is some test data:

data have;
  infile datalines;
  input RX_Chain $ Exclusion $ CLIENT $ ZIP;
  datalines;
ALD854 N IL 22309
JX139D Y NE 21234
IL603X N MT 99087
AB2009 Y NM 20208
;
run;

data have2;
  infile datalines;
  input Network $ Chain $;
  datalines;
ALD854 00455
JX139D 00139
JX307E 00540
IL603X 00651
;
run;

And here is some PROC SQL that will accomplish what I want:

proc sql;
  create table want1 as
  select t1.RX_Chain,
         t1.CLIENT,
		 t1.ZIP,
		 (CASE 
		    WHEN t1.RX_Chain = t2.NETWORK THEN t2.CHAIN
			ELSE t1.RX_Chain
		  END) format=$6. AS RX_Chain
    from have t1 left join have2 t2 on(t1.RX_Chain=t2.Network)
    where t1.Exclusion = 'N';
quit;

But, I want to accomplish this with a merge. The trouble is with the naming: RX_Chain.

I attempted a start at the merge and it looks like this (after sorting initial datasets):

data want;
  merge have(in=h) have2(in=h2 rename=(Network=RX_Chain));
  by RX_Chain;
  if h and h2 then RX_Chain=Chain;
  else RX_Chain=RX_Chain;
  if h and Exclusion='N';
  keep RX_Claim Client ZIP;
run;

However, I am not getting the same results as the PROC SQL. I know it has to do with the naming of RX_Chain and trying to assign a value to it but basically I'm trying to change the value of RX_Chain in have dataset to Chain in have2 dataset when RX_Chain = Network, or leave it as same value it was if not.

PGStats · Posted 11-07-2016 03:48 PM

Your SQL query contains a duplicate column RX_chain. Beside that, you should get what you want with:

data want;
merge 
	have(in=h1) 
	have2(in=h2 rename=Network=RX_Chain);
by RX_chain; 
if h1 and Exclusion='N';
if h2 then RX_Chain=Chain;
keep RX_Chain Client ZIP;
run;

PG

View solution in original post

LinusH · Posted 11-07-2016 03:39 PM

First, why do a merge when you got a working SQL?

Second, data step handles variable differently than SQL. By renaming Network its values will replace the values provided by Have.

DOn't rename, use Network in your if logic and assignments.

Data never sleeps

JediApprentice · Posted 11-07-2016 04:10 PM

@LinusH But when I don't rename it, on the by statement, it throws an error "Variable RX_Chain is not on dataset have2". So it seems like I have to rename it..

LinusH · Posted 11-08-2016 04:03 AM

Apparently I didn't thought it through.

So we are lucky that @PGStats came to the rescue

Data never sleeps

PGStats · Posted 11-07-2016 03:48 PM

Your SQL query contains a duplicate column RX_chain. Beside that, you should get what you want with:

data want;
merge 
	have(in=h1) 
	have2(in=h2 rename=Network=RX_Chain);
by RX_chain; 
if h1 and Exclusion='N';
if h2 then RX_Chain=Chain;
keep RX_Chain Client ZIP;
run;

PG

PROC SQL vs Merge tricky naming

Re: PROC SQL vs Merge tricky naming

Re: PROC SQL vs Merge tricky naming

Re: PROC SQL vs Merge tricky naming

Re: PROC SQL vs Merge tricky naming

Re: PROC SQL vs Merge tricky naming

Click image to register for webinar

Classroom Training Available!