I'm having difficulty in extracting unique tasks performed by workers in events arranged by time. The unique combination is defined by ID and Mode. Following dataset mimics the scenario :
ID Time Mode Event
23456 20120101 A Open
23456 20120101 B Closed
87690 20120311 G Closed
98000 20120201 B Open
98000 20120301 A Open
98000 20120101 A Open
87889 20121009 C Closed
87889 20120101 C Open
87900 20120411 A Closed
87900 20120102 A Closed
Hope for the following result:
ID Time Mode Event
23456 20120101 A Open
23456 20120101 B Closed
87690 20120311 G Closed
98000 20120201 B Open
98000 20120301 A Open
87889 20121009 C Closed
87900 20120411 A Closed
I will first sort by time in descending order:
proc sort data=df; by ID descending time; run;
Then I can use sort again to get unique combo by ID and Mode:
proc sort data=df dupout=nodup nodupkey;
by ID Mode; run;
In the last step, how do I ensure that the none-duped record is also the latest event?
Thanks!
data have;
input ID Time :yymmdd10. Mode $ Event $;
cards;
23456 20120101 A Open
23456 20120101 B Closed
87690 20120311 G Closed
98000 20120201 B Open
98000 20120301 A Open
98000 20120101 A Open
87889 20121009 C Closed
87889 20120101 C Open
87900 20120411 A Closed
87900 20120102 A Closed
;
proc sql;
create table want as
select *
from have
group by id ,mode
having time=max(time)
order by id,time;
quit;
data have;
input ID Time :yymmdd10. Mode $ Event $;
cards;
23456 20120101 A Open
23456 20120101 B Closed
87690 20120311 G Closed
98000 20120201 B Open
98000 20120301 A Open
98000 20120101 A Open
87889 20121009 C Closed
87889 20120101 C Open
87900 20120411 A Closed
87900 20120102 A Closed
;
proc sql;
create table want as
select *
from have
group by id ,mode
having time=max(time)
order by id,time;
quit;
Hi, Sorry I'm not with SAS. Just wondering, will this statement take care of duplicated combo of ID and mode? Thank you for the prompt reply! I think the first part of Sql arranges the data and the second part takes the latest date?
@lydiawawa Please test the code and let us know. If it doesn;t meet your needs , let's rework to meet your requirement. I can't claim anything from my side. I am hoping you test carefully and diligently.
Sort, then select last in by-group:
data have;
input ID Time :yymmdd10. Mode $ Event $;
format time yymmdd10.;
datalines;
23456 20120101 A Open
23456 20120101 B Closed
87690 20120311 G Closed
98000 20120201 B Open
98000 20120301 A Open
98000 20120101 A Open
87889 20121009 C Closed
87889 20120101 C Open
87900 20120411 A Closed
87900 20120102 A Closed
;
proc sort data=have; by ID mode time; run;
data want;
set have; by ID mode;
if last.mode;
run;
FWIW you need to be very careful discussing and naming variables related to time, dates and datetime values appropriately when dealing with SAS. Your "time" variable appears to be a date value. The reason is time in SAS values is measured in seconds and dates are measured in days. If you search this forum you find a number of discussion about code not behaving properly that eventually boiled down to misunderstanding this basic SAS measurement difference and why results were much different than expected.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.