BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ldataminer
Calcite | Level 5

Dear SAS community,

 

I got stuck with the following question, and unfortunately I was not able to find any kind of solution up to now:

I have two datasets. The first one contains a unique ID and a date variable of the day when an event happened (=an accident). E.g.;

ID  ACCIDENT_DATE
1   2020/11/05
2   2018/01/20
3   2022/07/10

Dataset two contains multiple rows of the ID variable (many of them not in dataset one), a financial transaction date, and the amount of money spend for a medical service, e.g.:

ID  TRANSACTION_DATE  MONEY_VALUE
1   2020/11/08        100
1   2020/11/20        500
1   2020/12/08        250
1   2021/01/08        600
1   2021/11/02        400
2   2018/01/22        500
2   2018/04/01        100
2   2018/08/11        900
3   2022/07/14        1000
3   2022/08/23        300
3   2022/10/10        200
3   2022/11/02        800
3   2023/01/11        500
3   2022/02/22        200

Now, the task would be to join dataset two on dataset one, by summing up the amount of money for different time intervals, meaning e.g. money spend up to 30 days after the accident date, 60 days after the accident, and so on... then saving the results into new variables.

Therefore, the output should look like this:

ID  ACCIDENT_DATE  COSTS_UNTIL_30_DAYS  COSTS_UNTIL_90_DAYS  COSTS_UNTIL_180_MONTHS  COSTS_UNTIL_360_MONTHS
1   2020/11/05     ???                  ???                  ???                     ???
2   2018/01/20     ???                  ???                  ???                     ???
3   2022/07/10     ???                  ???                  ???                     ???
 
This is the "easy" example.

It would be perfectly, if it would be also possible to take additionally different types of transactions into account, while summing up the costs, e.g.:

Dataset two:

ID  TRANSACTION_DATE  MONEY_VALUE  TRANSACTION_TYPE
1   2020/11/08        100          A
1   2020/11/20        500          B
1   2020/12/08        250          A
1   2021/01/08        600          C
1   2021/11/02        400          A
[...]

with the result

ID  ACCIDENT_DATE  A_COSTS_UNTIL_30_DAYS  A_COSTS_UNTIL_90_DAYS [...] B_COSTS_UNTIL_30_DAYS  B_COSTS_UNTIL_90_DAYS
1   2020/11/05     ???                    ???                   [...] ???                    ???
2   2018/01/20     ???                    ???                   [...] ???                    ???
3   2022/07/10     ???                    ???                   [...] ???                    ???


Can anyone help?

 

I would be very happy and grateful for any kind of hint or solution 😊

 

Best regrads

Lars

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Below how you can get such sum variables by accident date and transaction type.

data accidents;
  infile datalines truncover dsd;
  input ID ACCIDENT_DATE :yymmdd10.;
  format ACCIDENT_DATE date9.;
  datalines;
1,2020/11/05
2,2018/01/20
3,2022/07/10
;

data claims;
  infile datalines truncover dsd;
  input ID TRANSACTION_DATE :yymmdd10. MONEY_VALUE TRANSACTION_TYPE :$1.;
  format TRANSACTION_DATE date9.;
  datalines;
1,2020/11/08,100,A
1,2020/11/20,500,B
1,2020/12/08,250,A
1,2021/01/08,600,C
1,2021/11/02,400,A
2,2018/01/22,500,X
2,2018/04/01,100,X
2,2018/08/11,900,X
3,2022/07/14,1000,X
3,2022/08/23,300,X
3,2022/10/10,200,X
3,2022/11/02,800,X
3,2023/01/11,500,X
3,2022/02/22,200,X
;

%let sv_missing=%sysfunc(getoption(missing,keyword));
options missing=' ';
proc sql;
/*  create table inter as*/
  select
    a.id,
    a.ACCIDENT_DATE,
    c.TRANSACTION_TYPE,
    sum(
      case
        when c.TRANSACTION_DATE-a.ACCIDENT_DATE <=30 then c.MONEY_VALUE
        else .
        end
        ) as COSTS_UNTIL_30_DAYS,
    sum(
      case
        when c.TRANSACTION_DATE-a.ACCIDENT_DATE <=60 then c.MONEY_VALUE
        else .
        end
        ) as COSTS_UNTIL_60_DAYS,
    sum(
      case
        when c.TRANSACTION_DATE-a.ACCIDENT_DATE <=90 then c.MONEY_VALUE
        else .
        end
        ) as COSTS_UNTIL_90_DAYS,
    sum(c.MONEY_VALUE) as COSTS_ACCRUED_TOTAL
  from accidents a inner join claims c
    on a.id=c.id
  group by a.id, a.ACCIDENT_DATE, c.TRANSACTION_TYPE
  order by a.id, a.ACCIDENT_DATE, c.TRANSACTION_TYPE
  ;
quit;
options &sv_missing;

Patrick_0-1702567274532.png

You could now transpose this table to get variables by transaction type but I wouldn't do it because you'd end up with a varying number of variables depending on the transaction types in your data - and changing table structures are really hard to work with on a coding level.

You can always use Proc's like Tabulate or Report to present your data in the way you have in mind - but don't create variables for it.

%let sv_missing=%sysfunc(getoption(missing,keyword));
options missing=' ';
proc tabulate data=inter;
  class id ACCIDENT_DATE TRANSACTION_TYPE;
  var COSTS_UNTIL_30_DAYS COSTS_UNTIL_60_DAYS COSTS_UNTIL_60_DAYS COSTS_ACCRUED_TOTAL;
  keylabel sum=' ';
  table id*ACCIDENT_DATE*TRANSACTION_TYPE, 
        (COSTS_UNTIL_30_DAYS COSTS_UNTIL_60_DAYS COSTS_UNTIL_60_DAYS COSTS_ACCRUED_TOTAL)
        ;
  table id*ACCIDENT_DATE, 
        TRANSACTION_TYPE*(COSTS_UNTIL_30_DAYS COSTS_UNTIL_60_DAYS COSTS_UNTIL_60_DAYS COSTS_ACCRUED_TOTAL)
        ;
run;
options &sv_missing;

Patrick_1-1702567347121.png

 

 

 

View solution in original post

4 REPLIES 4
ballardw
Super User

By 30 days do you mean exactly 30 days or by the same day in the following calendar month? 60 days  and 2 months?

What exactly is the range of "and so on"? You show a variable implying 360 months. So there are a large number of possible combinations involved and I wouldn't even try to guess them.

 

You will have to walk us through examples of exactly how you think that "transaction type" gets used. I can see a couple of possibilities of how to interpret that.

Note that "wide" format you propose for the results is going to be very cumbersome for any actual use.

 

Please provide example data in the form of data step code pasted into a text box in future questions. Examples below.

This shows one way to get 30, 60 and 90 day totals. Not the only way. Also how many Id and accidents are  you talking about? Large data sets may require some additional bits to deal with size of data issues for efficiency.

data accident;
   input ID  ACCIDENT_DATE : yymmdd10.;
   format ACCIDENT_DATE yymmdd10.;
datalines;
1   2020/11/05
2   2018/01/20
3   2022/07/10
;

data helpful;
  set accident;
   do days=30, 60, 90 ;
      lasttransdate = intnx('day',accident_date,days);
      output;
   end;
   format lasttransdate yymmdd10.;
run;


data transaction;
   input ID  TRANSACTION_DATE :yymmdd10.  MONEY_VALUE;
   format transaction_date yymmdd10.;
datalines;
1   2020/11/08        100
1   2020/11/20        500
1   2020/12/08        250
1   2021/01/08        600
1   2021/11/02        400
2   2018/01/22        500
2   2018/04/01        100
2   2018/08/11        900
3   2022/07/14        1000
3   2022/08/23        300
3   2022/10/10        200
3   2022/11/02        800
3   2023/01/11        500
3   2022/02/22        200
;


proc sql ;
   create table need as
   select a.*,b.transaction_date, b.money_value
   from helpful as a
        left join
        transaction as b
        on a.id = b.id
   where b.transaction_date le a.lasttransdate
   order by a.id, a.accident_date, a.days
   ;
quit;

proc summary data=need nway;
   class id days;
   id accident_date;
   var money_value;
   output out=sums (drop=_: ) sum=;
run;

Once you have data such as this it is easy to reshape as needed for reporting.

ldataminer
Calcite | Level 5

Hi ballardw,

 

thank you for your code! 😊 It helped me a lot too. ‌‌Patrick posted an answer almost at the same time, which is the solution I think. 

 

Thank you!

 

Best regards
Lars

Patrick
Opal | Level 21

Below how you can get such sum variables by accident date and transaction type.

data accidents;
  infile datalines truncover dsd;
  input ID ACCIDENT_DATE :yymmdd10.;
  format ACCIDENT_DATE date9.;
  datalines;
1,2020/11/05
2,2018/01/20
3,2022/07/10
;

data claims;
  infile datalines truncover dsd;
  input ID TRANSACTION_DATE :yymmdd10. MONEY_VALUE TRANSACTION_TYPE :$1.;
  format TRANSACTION_DATE date9.;
  datalines;
1,2020/11/08,100,A
1,2020/11/20,500,B
1,2020/12/08,250,A
1,2021/01/08,600,C
1,2021/11/02,400,A
2,2018/01/22,500,X
2,2018/04/01,100,X
2,2018/08/11,900,X
3,2022/07/14,1000,X
3,2022/08/23,300,X
3,2022/10/10,200,X
3,2022/11/02,800,X
3,2023/01/11,500,X
3,2022/02/22,200,X
;

%let sv_missing=%sysfunc(getoption(missing,keyword));
options missing=' ';
proc sql;
/*  create table inter as*/
  select
    a.id,
    a.ACCIDENT_DATE,
    c.TRANSACTION_TYPE,
    sum(
      case
        when c.TRANSACTION_DATE-a.ACCIDENT_DATE <=30 then c.MONEY_VALUE
        else .
        end
        ) as COSTS_UNTIL_30_DAYS,
    sum(
      case
        when c.TRANSACTION_DATE-a.ACCIDENT_DATE <=60 then c.MONEY_VALUE
        else .
        end
        ) as COSTS_UNTIL_60_DAYS,
    sum(
      case
        when c.TRANSACTION_DATE-a.ACCIDENT_DATE <=90 then c.MONEY_VALUE
        else .
        end
        ) as COSTS_UNTIL_90_DAYS,
    sum(c.MONEY_VALUE) as COSTS_ACCRUED_TOTAL
  from accidents a inner join claims c
    on a.id=c.id
  group by a.id, a.ACCIDENT_DATE, c.TRANSACTION_TYPE
  order by a.id, a.ACCIDENT_DATE, c.TRANSACTION_TYPE
  ;
quit;
options &sv_missing;

Patrick_0-1702567274532.png

You could now transpose this table to get variables by transaction type but I wouldn't do it because you'd end up with a varying number of variables depending on the transaction types in your data - and changing table structures are really hard to work with on a coding level.

You can always use Proc's like Tabulate or Report to present your data in the way you have in mind - but don't create variables for it.

%let sv_missing=%sysfunc(getoption(missing,keyword));
options missing=' ';
proc tabulate data=inter;
  class id ACCIDENT_DATE TRANSACTION_TYPE;
  var COSTS_UNTIL_30_DAYS COSTS_UNTIL_60_DAYS COSTS_UNTIL_60_DAYS COSTS_ACCRUED_TOTAL;
  keylabel sum=' ';
  table id*ACCIDENT_DATE*TRANSACTION_TYPE, 
        (COSTS_UNTIL_30_DAYS COSTS_UNTIL_60_DAYS COSTS_UNTIL_60_DAYS COSTS_ACCRUED_TOTAL)
        ;
  table id*ACCIDENT_DATE, 
        TRANSACTION_TYPE*(COSTS_UNTIL_30_DAYS COSTS_UNTIL_60_DAYS COSTS_UNTIL_60_DAYS COSTS_ACCRUED_TOTAL)
        ;
run;
options &sv_missing;

Patrick_1-1702567347121.png

 

 

 

ldataminer
Calcite | Level 5

Hi Patrick,

 

thank you very much! That is exactly what I was looking for 😊 It is unbelievable how quickly you could find the solution. I had been trying out different things already for a couple of days... Thank you!

 

Best regards

Lars

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 525 views
  • 2 likes
  • 3 in conversation