Hello,
I’m trying to create a flag named prod in the below code which basically should compare column Status in each row by EMPID. The prod flag should be ‘Y’ if the EMPID has different values in status columns or else it should be ‘N’, meaning they have the same status in all the rows.
Source Data Set:
Data aa;
infile cards;
input EMPID $3. @5 Status $3.;
cards;
123 Opn
123 Opn
345 Cld
345 Opn
567 Cld
567 Cld
567 Cld
999 Opn
984 Cld
236 Cld
236 Cld
236 Opn
;
Run;
Desired output:
EMPID Status Prod
123 Opn N
123 Opn N
345 Cld Y
345 Opn Y
567 Cld N
567 Cld N
567 Cld N
999 Opn N
984 Cld N
236 Cld Y
236 Cld Y
236 Opn Y
My Code:
Data aa1;
retain PROD;
set aa;
by EMPID Status;
if first.Status = last.Status then PROD = 'N';
else PROD = 'N';
Run;
So in the above case, EMPID 236 Prod is ‘Y’ because they have 2 diff status values(Cld,Opn) whereas EMPID 123 is ‘N’ because the status is same for both the rows. I tried using first and last method but the output wasn’t the way I expected. I am beginner SAS programmer so wondering if I could get some help to code this one. Thanks in advance for your help!
proc sql;
create table want
as select a.*, case
when b.prodcount=1 then 'N'
when b.prodcount=2 then 'Y'
end as prod
from AA as a left join (select empid,count(*) as prodcount from (select distinct empid,status from aa)
group by empid) as b
on AA.Empid=b.empid;
quit;
Does the match but will result in data sorted on EmpID if that is a problem.
proc sql;
create table want
as select a.*, case
when b.prodcount=1 then 'N'
when b.prodcount=2 then 'Y'
end as prod
from AA as a left join (select empid,count(*) as prodcount from (select distinct empid,status from aa)
group by empid) as b
on AA.Empid=b.empid;
quit;
Does the match but will result in data sorted on EmpID if that is a problem.
I am getting output as expected. Thank you!. Wondering if this can be acheived using data step, asking just for my own learning.
data want;
do until (last.empid);
set aa;
by empid notsorted;
proc='N';
if not first.empid and status^=lag(status) then do;
proc='Y';
leave;
end;
end;
do until(last.empid);
set aa;
by empid notsorted;
output;
end;
run;
Alternative:
proc sql;
create table want as
select *,
case
when (select count(distinct status) from aa where a.empid=empid) >1 then 'Y'
else 'N'
end
as flag from aa a;
quit;
Haikuo
Haikuo - Thank you for your response. I ran your solution, it keeps running and never ends. My dataset size is close to 800K records.
You are right. Mine only makes academic sense, it is very very inefficient, it pulls the whole table to match each obs.
Haikuo
No worries. Thanks anyways for your prompt response.
Well, you just made me feel I owe you something , here is a data step solution, not necessarily slower than Proc SQL:
Data aa;
infile cards;
input EMPID $3. @5 Status $3.;
cards;
123 Opn
123 Opn
345 Cld
345 Opn
567 Cld
567 Cld
567 Cld
999 Opn
984 Cld
236 Cld
236 Cld
236 Opn
;
Run;
data want;
do until (last.empid);
set aa;
by empid notsorted;
if first.empid then DO; _s=status; flag='N';end;
if status ne _s then flag='Y';
end;
do until (last.empid);
set aa;
by empid notsorted;
output;
end;
drop _s;
run;
If you would, please bench mark both solutions ( 's and this one), let us know.
Thanks,
Haikuo
With some modification , make the statement of b.prodcount=2 into else 'Y'
Not sure how fast this code would be .
Data aa; infile cards; input EMPID $3. @5 Status $3.; cards; 123 Opn 123 Opn 345 Cld 345 Opn 567 Cld 567 Cld 567 Cld 999 Opn 984 Cld 236 Cld 236 Cld 236 Opn ; Run; proc sql; create table want as select *,count(distinct Status) as flag from aa group by empid; quit;
Xia Keshan
Hello,
here is a data step solution:
data want(drop=i j statusnew);
set aa;
by empid notsorted ;
retain flag statusnew j;
i+1;
if statusnew ne status and flag ne 'Y' then flag='Y';
if first.empid then do;
flag='N';
j=i;
end;
statusnew=status;
if last.empid then
do k=j to i;
set aa point=k;
output;
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.