Any idea to combine these two steps into one?
proc sql; create table PT_QUAN as select distinct PLAN_TO , QUANNEL from SHIPMENT_PULL; quit; proc sql; create table MISSING_QUANS as select t1.* from PT_QUAN t1 right join ( select distinct PLAN_TO from PT_QUAN where MISSING(QUANNEL) ) t2 on (t1.PLAN_TO = t2.PLAN_TO) where not missing(t1.PLAN_TO) order by t1.PLAN_TO , t1.QUANNEL desc; quit;
You do a RIGHT JOIN, but then you exclude all those where the "left" table does not supply any observations, making this in effect an inner join. But it's an inner join anyway, as you join pt_quan with itself
What you do is a lookup, finding all those groups where there is at least one missing QUANNEL in the dataset.
Try this:
proc sort
data=shipment_pull (keep=plan_to quannel)
out=pt_quan
nodupkey
;
by plan_to quannel;
run;
data missing_quans;
do until (last.plan_to);
set pt_quan;
by plan_to;
if missing (quannel) then flag = 1;
end;
do until (last.plan_to);
set pt_quan;
by plan_to;
if flag then output;
end;
drop flag;
run;
(untested, for lack of usable example data)
This will outperform the SQL with sub-selects by orders of magnitude for larger datasets.
proc sql;
create table MISSING_QUANS as
select t1.PLAN_TO
,t1.QUANNEL
,count(*) as PLAN_Count
from SHIPMENT_PULL t1
right join (
select PLAN_TO
,count(*) as Plan_Count
from SHIPMENT_PULL
where MISSING(QUANNEL)
group by PLAN_TO
) t2 on (t1.PLAN_TO = t2.PLAN_TO)
where not missing(t1.PLAN_TO)
group by t1.PLAN_TO
,t1.QUANNEL
order by t1.PLAN_TO , t1.QUANNEL desc;
quit;
You do a RIGHT JOIN, but then you exclude all those where the "left" table does not supply any observations, making this in effect an inner join. But it's an inner join anyway, as you join pt_quan with itself
What you do is a lookup, finding all those groups where there is at least one missing QUANNEL in the dataset.
Try this:
proc sort
data=shipment_pull (keep=plan_to quannel)
out=pt_quan
nodupkey
;
by plan_to quannel;
run;
data missing_quans;
do until (last.plan_to);
set pt_quan;
by plan_to;
if missing (quannel) then flag = 1;
end;
do until (last.plan_to);
set pt_quan;
by plan_to;
if flag then output;
end;
drop flag;
run;
(untested, for lack of usable example data)
This will outperform the SQL with sub-selects by orders of magnitude for larger datasets.
Rule of thumb: in SAS, multiple DATA /SORT steps usually outperform complex SQL queries, and are easier to maintain. Personally, I only use SQL when I need to do a many-to-many join with a cartesian product as result.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.