BookmarkSubscribeRSS Feed
deleted_user
Not applicable
How do you deduplicate based on most recent date?

PROC SORT data=total;
by week ID;
run;
PROC FREQ DATA = total noprint;
by week;
table ID/ out = ID_DUPS (keep = week ID Count where = (Count > 1)) ;
run;
PROC PRINT DATA = ID_DUPS;
run;
9 REPLIES 9
Flip
Fluorite | Level 6
for the last week within an ID use:

PROC SORT data=total;
by ID week ;
run;

data total;
set total;
by ID week;
if last.ID;
run;
deleted_user
Not applicable
Thanks. How do you deduplicate selecting for the latest date within a week? There's a unique time/date stamp field within each week.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Separate the DATEPART (it's a function in SAS) to create a SAS DATE variable and use it with your BY processing.

Scott Barry
SBBWorks, Inc.
deleted_user
Not applicable
Thanks. Once I get SAS to create a SAS date, how do I ask SAS to identify the most recent date and select for it?
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Refer to your other related post for info on BY GROUP processing.

Scott
deleted_user
Not applicable
Thanks!
deleted_user
Not applicable
Consolidated from related post:



You have a couple of options - one is to perform two sorts with ay BY ID DESCENDING DATE; on the first sort and then followed with a SORT NODUPKEY EQUALS and a BY ID; statement.

The other option is to again sort with ID DESCENDING DATE, and then use a SAS DATA step with a SET and a BY ID; statement -- and use BY GROUP processing with IF FIRST.ID processing to subset your input and only capture the first occurence of ID values.

Scott Barry

Suggested Google advanced search argument on this topic:

data step by group processing site:sas.com
ballardw
Super User
> Thanks. How do you deduplicate selecting for the
> latest date within a week? There's a unique
> time/date stamp field within each week.

Depending on how you're using this:

Proc summary data=(input data) nway;
class id week;
var (variable with date/time stamp as a SAS datetime variable);
output out=(output data set name) max=;
run;

May want to have option (drop= _type_ _freq_) with the output data. The output dataset will have the largest date/time within each combination of ID and WEEK. As a side effect the output data will be sorted by ID and week but no sort is needed beforehand.
Patrick
Opal | Level 21
I hope this example code will point you into the right direction.

The main idea is to use intnx() to align your datestamps to the date of the last day of the week the datestamp value is from (Sunday) and use this additional variable "LastDayInWeek" for sorting.


data have;
id=1;
format DT datetime20.;
DT='19Dec2009 00:00:00'dt;
do while (DT lt '05Jan2010 00:00:00'dt);
output;
DT=sum(DT,ranuni(1)*86400);
end;
run;

proc sql;
create view VSortedByIDAndWeekAndDate as
select *, datepart(intnx('dtweek.2',DT,0,'E')) format=weekdatx. as LastDayInWeek
from have
order by id,LastDayInWeek,DT
;
quit;

data want;
set VSortedByIDAndWeekAndDate;
by id LastDayInWeek DT;
if last.LastDayInWeek then output;
run;

proc print data=want;
run;


HTH
Patrick

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1934 views
  • 0 likes
  • 5 in conversation