BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
amarikow57
Obsidian | Level 7

Relatively new to SAS so I'll need specifics. I am using SAS University, in case that makes any difference.

 

I have a lot of large data sets I am working with. I would like to do Survival analysis with them. To my understanding, I can use the following to get unique ID's with the latest date:

 

PROC SORT LIBRARY.DATA;
 BY ID DATE;
RUN;

DATA LIBRARY.DATA_DATES (keep = ID DATE);
 SET LIBRARY.DATA;
 BY ID;
 ENDDATE = MAX(DATE);
STARTDATE = MIN(DATE); *only for 2006 data; RUN;

Please let me know if the code is wrong. My dataset is extremely large, so I won't be able to manually check so easily. 

 

Now I want to merge the datasets (Years 2006-2013). When merging, I still want only the newest date, but I only want to keep IDs that are present in 2006 (start period). If I left merge, will the 2006 dates override the 2007-2013 dates? My other thought is to label the MAX DATE for each year (i.e. in own column) and then a separate column from that with the new MAX DATE. However, this doesn't sound the most efficient. Is there a better way to code for this?

 

The end goal with the dates is to use them for follow-up time. That is,

FOLLOWUP = MAX DATE - MIN 2006 DATE

 

I appreciate any help or feedback.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User
proc sort data=library.data; /* PROC SORT statement needs a DATA= option */
by id date;
run;

data library.data_dates;
set library.data (keep = id date);
by id;
format startdate enddate yymmdd10.;
retain startdate;
if first.id then startdate = date;
if last.id; /* will output only the last observation per id group */
enddate = date;
drop date;
run;

No need to shout at SAS, it will work when being talked to in normal voice.

 

In PROC SQL, it looks like this:

proc sql;
create table library.data_dates as
  select
    id,
    min(date) as startdate format=yymmdd10.
    max(date) as enddate format = yymmdd10.
  from library.dates
  group by id
;
quit;

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

Assuming your data are already sorted by id/date, then:

 

data want;
  merge library.data_dates (where=(Year(date)=2006)  in=in2006)
        library.data_dates ;
  by id;
  if in2006=1 and last.id=1;
run;

Let's say there are 5 obs from 2006 and 20 obs overall for a given id.  The first 4 obs from 2006 will be paired with the first 4 obs overall.  The last obs from 2006 will be paired with ALL the remaining overall obs.  So for the last obs, the IN2006 dummy will still be on, and the data will be output.

 

Also, when I say "paired" please note that the data values from 2006 will always be overwritten by the equivalent data variables from the overall - because the overall data set is to the right of the 2006 data subset in the merge statement.    Only if there were certain variables dropped in the second data set reference would that variable value from the first reference prevail in the output.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User
proc sort data=library.data; /* PROC SORT statement needs a DATA= option */
by id date;
run;

data library.data_dates;
set library.data (keep = id date);
by id;
format startdate enddate yymmdd10.;
retain startdate;
if first.id then startdate = date;
if last.id; /* will output only the last observation per id group */
enddate = date;
drop date;
run;

No need to shout at SAS, it will work when being talked to in normal voice.

 

In PROC SQL, it looks like this:

proc sql;
create table library.data_dates as
  select
    id,
    min(date) as startdate format=yymmdd10.
    max(date) as enddate format = yymmdd10.
  from library.dates
  group by id
;
quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1232 views
  • 2 likes
  • 3 in conversation