Dear SAS community,
I got stuck with the following question, and unfortunately I was not able to find any kind of solution up to now:
I have two datasets. The first one contains a unique ID and a date variable of the day when an event happened (=an accident). E.g.;
ID ACCIDENT_DATE
1 2020/11/05
2 2018/01/20
3 2022/07/10
Dataset two contains multiple rows of the ID variable (many of them not in dataset one), a financial transaction date, and the amount of money spend for a medical service, e.g.:
ID TRANSACTION_DATE MONEY_VALUE
1 2020/11/08 100
1 2020/11/20 500
1 2020/12/08 250
1 2021/01/08 600
1 2021/11/02 400
2 2018/01/22 500
2 2018/04/01 100
2 2018/08/11 900
3 2022/07/14 1000
3 2022/08/23 300
3 2022/10/10 200
3 2022/11/02 800
3 2023/01/11 500
3 2022/02/22 200
Now, the task would be to join dataset two on dataset one, by summing up the amount of money for different time intervals, meaning e.g. money spend up to 30 days after the accident date, 60 days after the accident, and so on... then saving the results into new variables.
Therefore, the output should look like this:
ID ACCIDENT_DATE COSTS_UNTIL_30_DAYS COSTS_UNTIL_90_DAYS COSTS_UNTIL_180_MONTHS COSTS_UNTIL_360_MONTHS
1 2020/11/05 ??? ??? ??? ???
2 2018/01/20 ??? ??? ??? ???
3 2022/07/10 ??? ??? ??? ???
This is the "easy" example.
It would be perfectly, if it would be also possible to take additionally different types of transactions into account, while summing up the costs, e.g.:
Dataset two:
ID TRANSACTION_DATE MONEY_VALUE TRANSACTION_TYPE
1 2020/11/08 100 A
1 2020/11/20 500 B
1 2020/12/08 250 A
1 2021/01/08 600 C
1 2021/11/02 400 A
[...]
with the result
ID ACCIDENT_DATE A_COSTS_UNTIL_30_DAYS A_COSTS_UNTIL_90_DAYS [...] B_COSTS_UNTIL_30_DAYS B_COSTS_UNTIL_90_DAYS
1 2020/11/05 ??? ??? [...] ??? ???
2 2018/01/20 ??? ??? [...] ??? ???
3 2022/07/10 ??? ??? [...] ??? ???
Can anyone help?
I would be very happy and grateful for any kind of hint or solution 😊
Best regrads
Lars
Below how you can get such sum variables by accident date and transaction type.
data accidents;
infile datalines truncover dsd;
input ID ACCIDENT_DATE :yymmdd10.;
format ACCIDENT_DATE date9.;
datalines;
1,2020/11/05
2,2018/01/20
3,2022/07/10
;
data claims;
infile datalines truncover dsd;
input ID TRANSACTION_DATE :yymmdd10. MONEY_VALUE TRANSACTION_TYPE :$1.;
format TRANSACTION_DATE date9.;
datalines;
1,2020/11/08,100,A
1,2020/11/20,500,B
1,2020/12/08,250,A
1,2021/01/08,600,C
1,2021/11/02,400,A
2,2018/01/22,500,X
2,2018/04/01,100,X
2,2018/08/11,900,X
3,2022/07/14,1000,X
3,2022/08/23,300,X
3,2022/10/10,200,X
3,2022/11/02,800,X
3,2023/01/11,500,X
3,2022/02/22,200,X
;
%let sv_missing=%sysfunc(getoption(missing,keyword));
options missing=' ';
proc sql;
/* create table inter as*/
select
a.id,
a.ACCIDENT_DATE,
c.TRANSACTION_TYPE,
sum(
case
when c.TRANSACTION_DATE-a.ACCIDENT_DATE <=30 then c.MONEY_VALUE
else .
end
) as COSTS_UNTIL_30_DAYS,
sum(
case
when c.TRANSACTION_DATE-a.ACCIDENT_DATE <=60 then c.MONEY_VALUE
else .
end
) as COSTS_UNTIL_60_DAYS,
sum(
case
when c.TRANSACTION_DATE-a.ACCIDENT_DATE <=90 then c.MONEY_VALUE
else .
end
) as COSTS_UNTIL_90_DAYS,
sum(c.MONEY_VALUE) as COSTS_ACCRUED_TOTAL
from accidents a inner join claims c
on a.id=c.id
group by a.id, a.ACCIDENT_DATE, c.TRANSACTION_TYPE
order by a.id, a.ACCIDENT_DATE, c.TRANSACTION_TYPE
;
quit;
options &sv_missing;
You could now transpose this table to get variables by transaction type but I wouldn't do it because you'd end up with a varying number of variables depending on the transaction types in your data - and changing table structures are really hard to work with on a coding level.
You can always use Proc's like Tabulate or Report to present your data in the way you have in mind - but don't create variables for it.
%let sv_missing=%sysfunc(getoption(missing,keyword));
options missing=' ';
proc tabulate data=inter;
class id ACCIDENT_DATE TRANSACTION_TYPE;
var COSTS_UNTIL_30_DAYS COSTS_UNTIL_60_DAYS COSTS_UNTIL_60_DAYS COSTS_ACCRUED_TOTAL;
keylabel sum=' ';
table id*ACCIDENT_DATE*TRANSACTION_TYPE,
(COSTS_UNTIL_30_DAYS COSTS_UNTIL_60_DAYS COSTS_UNTIL_60_DAYS COSTS_ACCRUED_TOTAL)
;
table id*ACCIDENT_DATE,
TRANSACTION_TYPE*(COSTS_UNTIL_30_DAYS COSTS_UNTIL_60_DAYS COSTS_UNTIL_60_DAYS COSTS_ACCRUED_TOTAL)
;
run;
options &sv_missing;
By 30 days do you mean exactly 30 days or by the same day in the following calendar month? 60 days and 2 months?
What exactly is the range of "and so on"? You show a variable implying 360 months. So there are a large number of possible combinations involved and I wouldn't even try to guess them.
You will have to walk us through examples of exactly how you think that "transaction type" gets used. I can see a couple of possibilities of how to interpret that.
Note that "wide" format you propose for the results is going to be very cumbersome for any actual use.
Please provide example data in the form of data step code pasted into a text box in future questions. Examples below.
This shows one way to get 30, 60 and 90 day totals. Not the only way. Also how many Id and accidents are you talking about? Large data sets may require some additional bits to deal with size of data issues for efficiency.
data accident; input ID ACCIDENT_DATE : yymmdd10.; format ACCIDENT_DATE yymmdd10.; datalines; 1 2020/11/05 2 2018/01/20 3 2022/07/10 ; data helpful; set accident; do days=30, 60, 90 ; lasttransdate = intnx('day',accident_date,days); output; end; format lasttransdate yymmdd10.; run; data transaction; input ID TRANSACTION_DATE :yymmdd10. MONEY_VALUE; format transaction_date yymmdd10.; datalines; 1 2020/11/08 100 1 2020/11/20 500 1 2020/12/08 250 1 2021/01/08 600 1 2021/11/02 400 2 2018/01/22 500 2 2018/04/01 100 2 2018/08/11 900 3 2022/07/14 1000 3 2022/08/23 300 3 2022/10/10 200 3 2022/11/02 800 3 2023/01/11 500 3 2022/02/22 200 ; proc sql ; create table need as select a.*,b.transaction_date, b.money_value from helpful as a left join transaction as b on a.id = b.id where b.transaction_date le a.lasttransdate order by a.id, a.accident_date, a.days ; quit; proc summary data=need nway; class id days; id accident_date; var money_value; output out=sums (drop=_: ) sum=; run;
Once you have data such as this it is easy to reshape as needed for reporting.
Hi ballardw,
thank you for your code! 😊 It helped me a lot too. Patrick posted an answer almost at the same time, which is the solution I think.
Thank you!
Best regards
Lars
Below how you can get such sum variables by accident date and transaction type.
data accidents;
infile datalines truncover dsd;
input ID ACCIDENT_DATE :yymmdd10.;
format ACCIDENT_DATE date9.;
datalines;
1,2020/11/05
2,2018/01/20
3,2022/07/10
;
data claims;
infile datalines truncover dsd;
input ID TRANSACTION_DATE :yymmdd10. MONEY_VALUE TRANSACTION_TYPE :$1.;
format TRANSACTION_DATE date9.;
datalines;
1,2020/11/08,100,A
1,2020/11/20,500,B
1,2020/12/08,250,A
1,2021/01/08,600,C
1,2021/11/02,400,A
2,2018/01/22,500,X
2,2018/04/01,100,X
2,2018/08/11,900,X
3,2022/07/14,1000,X
3,2022/08/23,300,X
3,2022/10/10,200,X
3,2022/11/02,800,X
3,2023/01/11,500,X
3,2022/02/22,200,X
;
%let sv_missing=%sysfunc(getoption(missing,keyword));
options missing=' ';
proc sql;
/* create table inter as*/
select
a.id,
a.ACCIDENT_DATE,
c.TRANSACTION_TYPE,
sum(
case
when c.TRANSACTION_DATE-a.ACCIDENT_DATE <=30 then c.MONEY_VALUE
else .
end
) as COSTS_UNTIL_30_DAYS,
sum(
case
when c.TRANSACTION_DATE-a.ACCIDENT_DATE <=60 then c.MONEY_VALUE
else .
end
) as COSTS_UNTIL_60_DAYS,
sum(
case
when c.TRANSACTION_DATE-a.ACCIDENT_DATE <=90 then c.MONEY_VALUE
else .
end
) as COSTS_UNTIL_90_DAYS,
sum(c.MONEY_VALUE) as COSTS_ACCRUED_TOTAL
from accidents a inner join claims c
on a.id=c.id
group by a.id, a.ACCIDENT_DATE, c.TRANSACTION_TYPE
order by a.id, a.ACCIDENT_DATE, c.TRANSACTION_TYPE
;
quit;
options &sv_missing;
You could now transpose this table to get variables by transaction type but I wouldn't do it because you'd end up with a varying number of variables depending on the transaction types in your data - and changing table structures are really hard to work with on a coding level.
You can always use Proc's like Tabulate or Report to present your data in the way you have in mind - but don't create variables for it.
%let sv_missing=%sysfunc(getoption(missing,keyword));
options missing=' ';
proc tabulate data=inter;
class id ACCIDENT_DATE TRANSACTION_TYPE;
var COSTS_UNTIL_30_DAYS COSTS_UNTIL_60_DAYS COSTS_UNTIL_60_DAYS COSTS_ACCRUED_TOTAL;
keylabel sum=' ';
table id*ACCIDENT_DATE*TRANSACTION_TYPE,
(COSTS_UNTIL_30_DAYS COSTS_UNTIL_60_DAYS COSTS_UNTIL_60_DAYS COSTS_ACCRUED_TOTAL)
;
table id*ACCIDENT_DATE,
TRANSACTION_TYPE*(COSTS_UNTIL_30_DAYS COSTS_UNTIL_60_DAYS COSTS_UNTIL_60_DAYS COSTS_ACCRUED_TOTAL)
;
run;
options &sv_missing;
Hi Patrick,
thank you very much! That is exactly what I was looking for 😊 It is unbelievable how quickly you could find the solution. I had been trying out different things already for a couple of days... Thank you!
Best regards
Lars
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.