Datastep functionality first.<var> replacing with PROC SQL

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Datastep functionality first.<var> replacing with PROC SQL

        DATA rawdat.rpt_product (KEEP = case_id seq_num);
          SET rawdat.rpt_product;
          BY case_id sort_id;
          IF FIRST.case_id THEN OUTPUT;
        RUN;

Hello Community.

 

I am trying to optimize the performance of some macros. I couldn't find a solution to replace the first.case_id THEN OUTPUT with SQL.

 

Do you have any idea?

 

Greetings from Munich, Germany

Grüße aus München, Deutschland


Accepted Solutions
Solution
‎07-07-2016 07:24 AM
Super User
Posts: 5,353

Re: Datastep functionality first.<var> replacing with PROC SQL

My SQL isn't strong enough to give you the full syntax, but here's the idea.

 

group by case_id

where sort_id = min(sort_id)

 

To get identical results to the DATA step, you must have unique values for SORT_ID within each CASE_ID.  Per CASE_ID, you would get multiple records instead of 1 if there were tied records having the minimum SORT_ID.

View solution in original post


All Replies
Super User
Posts: 19,038

Re: Datastep functionality first.<var> replacing with PROC SQL

You typically can't. SQL doesn't have a concept of first. 

 

You can replicate it by creating a sub query to find the first case and then joining on it. 

Occasional Contributor
Posts: 10

Re: Datastep functionality first.<var> replacing with PROC SQL

Thank you Reeza for your quick response.

 

unfortunately i can't imagine how that should look like in my example. Could you show me in an example what you mean?

Respected Advisor
Posts: 4,130

Re: Datastep functionality first.<var> replacing with PROC SQL

@Reeza

Standard SQL doesn't have the concept but there are quite a few databases which implemented analytic functions as extensions; eg. Oracle https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions057.htm 

 

Super User
Posts: 19,038

Re: Datastep functionality first.<var> replacing with PROC SQL

@Patrick True. If you're connecting to a server using SQL pass through and native database SQL that could be an option. 

Solution
‎07-07-2016 07:24 AM
Super User
Posts: 5,353

Re: Datastep functionality first.<var> replacing with PROC SQL

My SQL isn't strong enough to give you the full syntax, but here's the idea.

 

group by case_id

where sort_id = min(sort_id)

 

To get identical results to the DATA step, you must have unique values for SORT_ID within each CASE_ID.  Per CASE_ID, you would get multiple records instead of 1 if there were tied records having the minimum SORT_ID.

Respected Advisor
Posts: 4,130

Re: Datastep functionality first.<var> replacing with PROC SQL

If these are all SAS tables then I don't expect a SQL to perform better than a Proc Sort / If First.<variable> combination. Actually there is a good chance that a SQL will perform worse.

 

IF it's all about performance AND your "first" rows fit into memory then using a hash approach would eventually be quicker (but code maintenance decreases).

 

Here how a hash approach could look like:

DATA have;
  do case_id=5,3,7,1;
    do seq_num=1 to 5;
      sort_id=ceil(ranuni(1)*10);
      output;
    end;
  end;
RUN;

data _null_;

  set have(rename=(sort_id=sort_id_in seq_num=seq_num_in)) end=last;

  if _n_=1 then
    do;
      if 0 then set have(keep=case_id sort_id seq_num);
      dcl hash h1(ordered:'y');
      rc=h1.defineKey('case_id');
      rc=h1.defineData('case_id','sort_id','seq_num');
      rc=h1.defineDone();
    end;

  if h1.find()=0 then
    do;
      if sort_id_in<sort_id then h1.replace(key:case_id, data:case_id, data:sort_id_in, data:seq_num_in);
    end;
  else 
    h1.add(key:case_id, data:case_id, data:sort_id_in, data:seq_num_in);
  
  if last then h1.output(dataset:'work.want');
run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 366 views
  • 5 likes
  • 4 in conversation