Solved: Combine steps

David_Billa · Posted 06-06-2022 09:45 PM

Any idea to combine these two steps into one?

proc sql;
  create table PT_QUAN as
  select distinct PLAN_TO , QUANNEL
  from SHIPMENT_PULL;
quit;

proc sql;
  create table MISSING_QUANS as
    select t1.*
    from PT_QUAN t1
      right join (
        select distinct PLAN_TO
        from PT_QUAN
        where MISSING(QUANNEL)
        ) t2 on (t1.PLAN_TO = t2.PLAN_TO)
    where not missing(t1.PLAN_TO)
    order by t1.PLAN_TO , t1.QUANNEL desc;
quit;

Kurt_Bremser · Posted 06-07-2022 04:05 AM

You do a RIGHT JOIN, but then you exclude all those where the "left" table does not supply any observations, making this in effect an inner join. But it's an inner join anyway, as you join pt_quan with itself

What you do is a lookup, finding all those groups where there is at least one missing QUANNEL in the dataset.

Try this:

proc sort
  data=shipment_pull (keep=plan_to quannel)
  out=pt_quan
  nodupkey
;
by plan_to quannel;
run;

data missing_quans;
do until (last.plan_to);
  set pt_quan;
  by plan_to;
  if missing (quannel) then flag = 1;
end;
do until (last.plan_to);
  set pt_quan;
  by plan_to;
  if flag then output;
end;
drop flag;
run;

(untested, for lack of usable example data)

This will outperform the SQL with sub-selects by orders of magnitude for larger datasets.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

SASKiwi · Posted 06-07-2022 12:12 AM

proc sql;
  create table MISSING_QUANS as
    select t1.PLAN_TO
          ,t1.QUANNEL
          ,count(*) as PLAN_Count
    from SHIPMENT_PULL t1
      right join (
        select PLAN_TO
              ,count(*) as Plan_Count
        from SHIPMENT_PULL
        where MISSING(QUANNEL)
        group by PLAN_TO
        ) t2 on (t1.PLAN_TO = t2.PLAN_TO)
    where not missing(t1.PLAN_TO)
   group by t1.PLAN_TO
           ,t1.QUANNEL
    order by t1.PLAN_TO , t1.QUANNEL desc;
quit;

Kurt_Bremser · Posted 06-07-2022 04:05 AM

You do a RIGHT JOIN, but then you exclude all those where the "left" table does not supply any observations, making this in effect an inner join. But it's an inner join anyway, as you join pt_quan with itself

What you do is a lookup, finding all those groups where there is at least one missing QUANNEL in the dataset.

Try this:

proc sort
  data=shipment_pull (keep=plan_to quannel)
  out=pt_quan
  nodupkey
;
by plan_to quannel;
run;

data missing_quans;
do until (last.plan_to);
  set pt_quan;
  by plan_to;
  if missing (quannel) then flag = 1;
end;
do until (last.plan_to);
  set pt_quan;
  by plan_to;
  if flag then output;
end;
drop flag;
run;

(untested, for lack of usable example data)

This will outperform the SQL with sub-selects by orders of magnitude for larger datasets.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 06-07-2022 04:07 AM

Rule of thumb: in SAS, multiple DATA /SORT steps usually outperform complex SQL queries, and are easier to maintain. Personally, I only use SQL when I need to do a many-to-many join with a cartesian product as result.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Combine steps

Re: Combine steps

Re: Combine steps

Re: Combine steps

Re: Combine steps

Registration is open

Combine steps

Re: Combine steps

Re: Combine steps

Re: Combine steps

Re: Combine steps

Registration is open

SAS Training: Just a Click Away