I'm having difficulty in extracting unique tasks performed by workers in events arranged by time. The unique combination is defined by ID and Mode. Following dataset mimics the scenario :
ID Time Mode Event
23456 20120101 A Open
23456 20120101 B Closed
87690 20120311 G Closed
98000 20120201 B Open
98000 20120301 A Open
98000 20120101 A Open
87889 20121009 C Closed
87889 20120101 C Open
87900 20120411 A Closed
87900 20120102 A Closed
Hope for the following result:
ID Time Mode Event
23456 20120101 A Open
23456 20120101 B Closed
87690 20120311 G Closed
98000 20120201 B Open
98000 20120301 A Open
87889 20121009 C Closed
87900 20120411 A Closed
I will first sort by time in descending order:
proc sort data=df; by ID descending time; run;
Then I can use sort again to get unique combo by ID and Mode:
proc sort data=df dupout=nodup nodupkey;
by ID Mode; run;
In the last step, how do I ensure that the none-duped record is also the latest event?
Thanks!
data have;
input ID Time :yymmdd10. Mode $ Event $;
cards;
23456 20120101 A Open
23456 20120101 B Closed
87690 20120311 G Closed
98000 20120201 B Open
98000 20120301 A Open
98000 20120101 A Open
87889 20121009 C Closed
87889 20120101 C Open
87900 20120411 A Closed
87900 20120102 A Closed
;
proc sql;
create table want as
select *
from have
group by id ,mode
having time=max(time)
order by id,time;
quit;
data have;
input ID Time :yymmdd10. Mode $ Event $;
cards;
23456 20120101 A Open
23456 20120101 B Closed
87690 20120311 G Closed
98000 20120201 B Open
98000 20120301 A Open
98000 20120101 A Open
87889 20121009 C Closed
87889 20120101 C Open
87900 20120411 A Closed
87900 20120102 A Closed
;
proc sql;
create table want as
select *
from have
group by id ,mode
having time=max(time)
order by id,time;
quit;
Hi, Sorry I'm not with SAS. Just wondering, will this statement take care of duplicated combo of ID and mode? Thank you for the prompt reply! I think the first part of Sql arranges the data and the second part takes the latest date?
@lydiawawa Please test the code and let us know. If it doesn;t meet your needs , let's rework to meet your requirement. I can't claim anything from my side. I am hoping you test carefully and diligently.
Sort, then select last in by-group:
data have;
input ID Time :yymmdd10. Mode $ Event $;
format time yymmdd10.;
datalines;
23456 20120101 A Open
23456 20120101 B Closed
87690 20120311 G Closed
98000 20120201 B Open
98000 20120301 A Open
98000 20120101 A Open
87889 20121009 C Closed
87889 20120101 C Open
87900 20120411 A Closed
87900 20120102 A Closed
;
proc sort data=have; by ID mode time; run;
data want;
set have; by ID mode;
if last.mode;
run;
FWIW you need to be very careful discussing and naming variables related to time, dates and datetime values appropriately when dealing with SAS. Your "time" variable appears to be a date value. The reason is time in SAS values is measured in seconds and dates are measured in days. If you search this forum you find a number of discussion about code not behaving properly that eventually boiled down to misunderstanding this basic SAS measurement difference and why results were much different than expected.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.