Hi Experts,
I have dataset loaded to CAS (Job runs every morning) which has structure (Sample data given below). I want to calculate days to first Activity for each ID and then aggregate these days (Average days) for each manager (Not in the sample data). To do that I need to create a new variable(Column) which keeps only First Activity_Date (First date) without creating new or aggregating dataset as I am aggregating later. Please advise or provide steps for Prepare-data level or Explore & Visualize level, either is fine.
Sample data:
ID | Created_Date | Activity_Date |
A001 | 29-Jun-20 | 5-Jul-20 |
A001 | 29-Jun-20 | 8-Aug-20 |
A001 | 29-Jun-20 | 1-Jul-20 |
A001 | 29-Jun-20 | 15-Jan-21 |
A005 | 1-Jan-21 | |
AD11 | 5-Aug-19 | 9-Feb-20 |
AD11 | 5-Aug-19 | 10-Feb-20 |
AD11 | 5-Aug-19 | 11-Feb-20 |
AD11 | 5-Aug-19 | 12-Feb-20 |
AD11 | 5-Aug-19 | 9-Jan-21 |
AD11 | 5-Aug-19 | 6-Sep-20 |
AD11 | 5-Aug-19 | 29-Feb-20 |
AD11 | 5-Aug-19 | 15-Sep-19 |
AD11 | 5-Aug-19 | 15-Sep-19 |
AD11 | 5-Aug-19 | 9-Jan-21 |
BC09 | 1-Jan-21 | 10-Apr-21 |
BC09 | 1-Jan-21 | 5-Feb-21 |
FDC1 | 10-May-21 | |
XY01 | 3-Apr-20 | 23-May-21 |
Output Desired:
ID | Created_Date | Activity_Date | FirstActivityDate | DaysToFirstActivity |
A001 | 29-Jun-20 | 1-Jul-20 | 1-Jul-20 | 2 |
A001 | 29-Jun-20 | 5-Jul-20 | ||
A001 | 29-Jun-20 | 8-Aug-20 | ||
A001 | 29-Jun-20 | 15-Jan-21 | ||
A005 | 1-Jan-21 | |||
AD11 | 5-Aug-19 | 15-Sep-19 | 15-Sep-19 | 41 |
AD11 | 5-Aug-19 | 15-Sep-19 | ||
AD11 | 5-Aug-19 | 9-Feb-20 | ||
AD11 | 5-Aug-19 | 10-Feb-20 | ||
AD11 | 5-Aug-19 | 11-Feb-20 | ||
AD11 | 5-Aug-19 | 12-Feb-20 | ||
AD11 | 5-Aug-19 | 29-Feb-20 | ||
AD11 | 5-Aug-19 | 6-Sep-20 | ||
AD11 | 5-Aug-19 | 9-Jan-21 | ||
AD11 | 5-Aug-19 | 9-Jan-21 | ||
BC09 | 1-Jan-21 | 5-Feb-21 | 5-Feb-21 | 35 |
BC09 | 1-Jan-21 | 10-Apr-21 | ||
FDC1 | 10-May-21 | |||
XY01 | 3-Apr-20 | 23-May-21 | 23-May-21 | 415 |
Thank you very much in advance.
-Hitesh
In a data step, this is simple. Assuming your data are grouped by id, then:
data want;
set have;
by id notsorted;
if first.id then do;
firstactivitydate=activitydate;
daystofirstactivity=firstactivitydate-created_date;
end;
format firstactivitydate date9. ;
run;
The BY statement tells SAS to expect the data to be physically grouped by ID, and creates dummy variable (first.id) indicating whether the record in hand is the first for a given ID. That's the only time the DO statements kick in. All the other records will have missing values for the new variables.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.
Find more tutorials on the SAS Users YouTube channel.