I have a dataset with repeated IDs and dates. I would first like to substract each following row from the first date by ID. Then I would like to add the obtained differences in each row with the one above. Any help is appreciated.
Have:
ID | Date |
1 | 15-May-14 |
1 | 20-May-14 |
1 | 25-May-14 |
2 | 14-Jul-17 |
2 | 24-Jul-17 |
3 | 18-Dec-18 |
Want:
ID | Date | Diff in days | Sum of diff |
1 | 15-May-14 | 0 | 0 |
1 | 20-May-14 | 5 | 5 |
1 | 25-May-14 | 10 | 15 |
2 | 14-Jul-17 | 0 | 0 |
2 | 24-Jul-17 | 10 | 10 |
3 | 18-Dec-18 | 0 | 0 |
First thing is to provide example data in the form of a data step that replicates your data. If your date values are not SAS date values, i.e. numeric type with a date format attached then the first step is to create dates.
Note: Bad things come to people that habitually use 2-digit years. Look up Y2K if not familiar with the term.
Something like:
data have; input ID date anydtdte.; format date date9.; datalines; 1 15-May-14 1 20-May-14 1 25-May-14 2 14-Jul-17 2 24-Jul-17 3 18-Dec-18 ;
I have to guess that 25 May is the 25th day in May not a value in the year 1925.
Next, your show difference is not the value compared to the previous date. 20 May to 25 May would be 5 days, not 10. 10 Days in your example would be the difference from the FIRST date, not the previous, for an id. So which is the, the previous date or the first date?
If the previous date then:
data want; set have; by id; dd = dif(date);
retain cumdif; if first.id then do; dd=0; cumdif=0; end; cumdif = sum(cumdif,dd); run;
If the data is not sorted by ID but grouped by it, then you would add NOTSORTED to the BY Id statement.
The by group creates automatic variables of First and Last for each variable on the By statement that have numeric values of 1 when true (is the first or last of a by group) and 0 when false so you can test the values to reset variables at the boundaries.
Retain keeps the value of a variable across data step boundaries.
If instead of previous you actually meant the First date, then the cumulative interpretation would be different. You would retain the value of the first date and compare with each date.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.