- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I am new to sas and I would like some help. I need to delete data that's outdated for the same user. I wanted to keep the record that the user is registered/most recent data. below is a sample data
Userid UserName Registered effective_date expiration_date
123 Mary Jane Y 1/1/2009 1/1/2014
123 Mary Jane N 1/1/2008 1/1/2014
234 John Doe Y 3/21/2010 1/1/2015
245 Jane Doe N 4/1/2007 1/1/2014
653 John Smith N 8/5/2010 1/1/2014
653 John Smith Y 9/14/2011 1/1/2014
900 Mike Adams N 1/1/2012 1/1/2015
This is the desired results
Userid UserName Registered effective_date expiration_date
123 Mary Jane Y 1/1/2009 1/1/2014
234 John Doe Y 3/21/2010 1/1/2015
245 Jane Doe N 4/1/2007 1/1/2014
653 John Smith Y 9/14/2011 1/1/2014
900 Mike Adams N 1/1/2012 1/1/2015
Thanks.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Step one: sort by userid and descending expiration_date
Step two: proc sort nodupkey by userid
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Step one: sort by userid and descending expiration_date
Step two: proc sort nodupkey by userid
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just to clarify further since you are new to SAS and I think effective_date may be what you want in the first sort:
/*Sort the data in the right order */
Proc sort data=have ; by userid descending effective_date; run;
/*Allows proc sort to eliminate the duplicates and keeps the first row it finds (which is why you needed to sort it to begin with)*/
/** this sorts the above data and creates two new datasets: one with the deduped results (have2) and one with the rows that were removed (dups)**/
proc sort data=have out=have2 dup=dups nodupkey; by userid; run;
Hope this helps!
EJ
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To keep it simple, based on the data you have provided try the following:
DATA HAVE;
LENGTH USERNAME $10;
INFILE DATALINES DLM=',';
INPUT USERID USERNAME $ REGISTERED $ EFFECTIVE_DATE:MMDDYY8. EXPIRATION_DATE:MMDDYY8.;
FORMAT EFFECTIVE_DATE EXPIRATION_DATE DATE9.;
DATALINES;
123,MARY JANE,Y,1/1/2009,1/1/2014
123,MARY JANE,N,1/1/2008,1/1/2014
234,JOHN DOE,Y,3/21/2010,1/1/2015
245,JANE DOE,N,4/1/2007,1/1/2014
653,JOHN SMITH,N,8/5/2010,1/1/2014
653,JOHN SMITH,Y,9/14/2011,1/1/2014
900,MIKE ADAMS,N,1/1/2012,1/1/2015
;
RUN;
PROC SORT DATA=HAVE OUT=SORTHAVE;
BY USERID EFFECTIVE_DATE;
RUN;
DATA WANT;
SET SORTHAVE;
BY USERID EFFECTIVE_DATE;
IF LAST.USERID;
RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you all for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
hi ... though you are a beginner, you can venture into PROC SQL ...
proc sql;
create table new as
select * from old
group userid
having effective_date = max(effective_date);
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Should be group by userid
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
hi ... GROUP works without the BY (as does ORDER)
I suppose it might not be good practice to leave out the BY
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Mike,
It is good to know. Thank you!