Re: Convert Merge to SQL

zdassu · Posted 04-05-2019 06:30 AM

Hi How do I convert the following merge code to SQL?

data SAP_Bill.erdo_main(sortedby=opbel);
        merge   SAP_Bill.erdo_max
  SAP_Bill.erdk(in=a);
        by              opbel;
        label   erdk_frei_am='Invoice_Release_Date';
        if a and first.opbel then output;
run;

I have included the names bwlow of the columns from the input tables and the output table, i would like the SQL code to have the same Output

Erdo_Max Table (Input)
opbel	erdk_frei_am

ERDK Table (Input)
intopbel	vkont	opbel	total_amnt	budat	simulated	stokz	tobreleasd

ERDO_Main (Output)
opbel	erdk_frei_am	intopbel	vkont	total_amnt	budat	simulated	stokz	tobreleasd

Thanks In Advance

andreas_lds · Posted 04-05-2019 07:00 AM

If you have a working and tested data-step, don't waste your time to convert it to sql. I don't know any case in which sql performed better.

zdassu · Posted 04-05-2019 07:04 AM

Hi I agree, However I have to convert this to SQL because i have to re-write the Scripts in Hadoop

VDD · Posted 04-05-2019 07:30 AM

Why are you trying to elemental SAS?

zdassu · Posted 04-05-2019 08:13 AM

Our Servers are moving from SAS server to a Hadoop Server

Shivam · Posted 04-05-2019 08:29 AM

Hi,

What i understand from your code is, you want to left join both the tables and want only those records for the first instance of the id.

below is the code which can do it for you, you need to use your column names.

proc sql;
create table final as select *,monotonic() as rowid from test as a left join test1 as b 
on a.opbel=b.opbel
group by a.opbel
having rowid=min(calculated rowid);
quit;

zdassu · Posted 04-11-2019 05:04 AM

Hi I am a little confused how the code works, where would I put my column names

Shivam · Posted 04-11-2019 07:04 AM

columns are being selected by *(all columns) if you want to specify only few columns, then specify the names in place of *

Astounding · Posted 04-05-2019 09:06 AM

There is no way for SQL to replicate this logic, with a 100% guarantee. SQL makes no promises as to the order of the incoming observations, thus any tool that looks for the first observation might produce a different result in SQL vs. SAS (or even on different SQL runs of the same program). That includes the monotonic function.

If you will keep SAS on at least one server, what you can do is this:

data SAP_Bill.erdo_main (sortedby=opbel) / view=SAP_Bill.ordo_main;
   merge SAP_Bill.ordo_max SAP_Bil.erdk (in=a);
   by opbel;
   label erdk_frei_am='Invoice_Release_Date';
   if a abnd first.opbel then output;
run

That's the same program, but using a view instead of a data set. Then later:

proc sql;
   create table somewhere_on_hadoop as 
   select * from SAP_Bill.erdo_main;
quit;

I'm not sure if that's helpful or not, since it still requires SAS. But there is no way that I know of to have SQL guarantee which observation will be "first" within a group.

zdassu · Posted 04-11-2019 05:05 AM

Unfortunately, I am unable to access the SAS Server from Hadoop

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!