Solved: Re: subtracting the most recent results from the previous ones

Gieorgie · Posted 11-02-2021 02:57 AM

After adding the value, I would like to subtract the results of the previous code run from the current one and this is best demonstrated in proc reports. Always with previous with new one . But when he tries to do this, I don't know why, but he gets very strange results.This is my output which i get

POLICY_VINTAGE	NUMBER_POLICY	Run_Date	diff_value
2021-01	77	1	95
2021-01	77	2	77

This is what i would like

POLICY_VINTAGE	NUMBER_POLICY	Run_Date	diff_value
2021-01	77	1	0
2021-01	77	2	0

PROC SQL;     
create table PolisyEnd as 
    select distinct       
     t4.spr_NRB as NRB
     ,intnx('month',datepart(t1.PRP_END_DATE),0,'b') format=yymmd7. as POLICY_VINTAGE,
case
	when datepart(t1.PRP_END_DATE) IS NOT NULL and datepart(t1.PRP_END_DATE) - &gv_date_dly. < 0 THEN 'W' 
    when datepart(t1.PRP_END_DATE) IS NOT NULL and datepart(t1.PRP_END_DATE) - &gv_date_dly. >30 then 'U' 
	when datepart(t1.PRP_END_DATE) IS NOT NULL and datepart(t1.PRP_END_DATE) - &gv_date_dly. >= 30 THEN 'A' 
	when datepart(t1.PRP_END_DATE) IS NULL THEN 'NO INFO' 

	end as POLIS

from 
    cmz.WMDTZDP_BH t1
left join 
        (select distinct kontr_id,obj_oid from cmz.BH_D_ZAB_X_ALOK_&thismonth) t2 
on t2.obj_oid = t1.obj_oid

left join 
		(select distinct data_danych, kontr_id, kre_nrb from dm.BH_WMDTKRE_&thismonth) t3
on t3.kontr_id = t2.kontr_id
 
left join 
		(select distinct spr_NRB, spr_STATUS from _mart.mart_kred) t4
on t4.spr_NRB = t3.kre_nrb 


where datepart(t1.date_danych) between '5Aug2019'd and &gv_date_dly. and t1.Actual = "T"
and t4.spr_STATUS  ="A"


;

quit;
PROC SQL;   
   create table POLICY_VINTAGE_WEEKLY as
   select 
   POLICY_VINTAGE
  ,count(NRB) as NUMBER_POLICY
  ,today() as Run_Date format weeku.
   from PolisyEnd
   where POLIS="W"
   group by 
   POLICY_VINTAGE
;

Quit;
proc append base=_work.policy_vintage_weekly data=policy_vintage_weekly force;
run;

data _work.policy_vintage_weekly;
  set _work.policy_vintage_weekly; 
by  run_date;
diff_value= lag(number_policy);
if first.run_date then diff_value =.;
run;

Kurt_Bremser · Posted 11-02-2021 04:25 AM

Since you need to (physically) read the whole existing dataset first to get the last values for all groups, there is no difference between calculating the difference before appending or after. But you can save yourself the necessary sort by using an interleave instead of appending:

proc sql;   
create table policy_vintage_weekly as
  select 
    policy_vintage
    ,count(NRB) as number_policy
    ,today() as Run_Date format weeku.
  from PolisyEnd
  where POLIS = "W"
  group by policy_vintage
;
quit;

data policy_vintage_weekly_all;
set
  _work.policy_vintage_weekly (where=(run_date ne today()))
  policy_vintage_weekly
;
by policy_vintage run_date;
run;

%if &syscc. = 0
%then %do;

data _work.policy_vintage_weekly;
set policy_vintage_weekly_all;
by policy_vintage;
diff_value = dif(number_policy);
if first.policy_vintage then diff_value = .;
run;

%end;

Your existing dataset needs to be sorted by policy_vintage and run_date once, after that, the order will always be kept.

The final step that replaces the existing dataset is made error-proof by wrapping it in the %IF.

A SET with multiple datasets and a BY does an "interleave". All datasets are read synchronously, and observations are selected according to the BY. The WHERE= dataset option on the existing dataset makes sure that you can run the step repeatedly on a given day without creating unwanted duplicates.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 11-02-2021 03:04 AM

You need to sort by policy_vintage and run_date, and run your data step by policy_vintage.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Gieorgie · Posted 11-02-2021 03:16 AM

Thanks Kurt ! You always help me here! can you tell me if it is possible to automate this date step so that it always subtracts the result from the latest run_date from the previous one

Kurt_Bremser · Posted 11-02-2021 04:25 AM

Since you need to (physically) read the whole existing dataset first to get the last values for all groups, there is no difference between calculating the difference before appending or after. But you can save yourself the necessary sort by using an interleave instead of appending:

proc sql;   
create table policy_vintage_weekly as
  select 
    policy_vintage
    ,count(NRB) as number_policy
    ,today() as Run_Date format weeku.
  from PolisyEnd
  where POLIS = "W"
  group by policy_vintage
;
quit;

data policy_vintage_weekly_all;
set
  _work.policy_vintage_weekly (where=(run_date ne today()))
  policy_vintage_weekly
;
by policy_vintage run_date;
run;

%if &syscc. = 0
%then %do;

data _work.policy_vintage_weekly;
set policy_vintage_weekly_all;
by policy_vintage;
diff_value = dif(number_policy);
if first.policy_vintage then diff_value = .;
run;

%end;

Your existing dataset needs to be sorted by policy_vintage and run_date once, after that, the order will always be kept.

The final step that replaces the existing dataset is made error-proof by wrapping it in the %IF.

A SET with multiple datasets and a BY does an "interleave". All datasets are read synchronously, and observations are selected according to the BY. The WHERE= dataset option on the existing dataset makes sure that you can run the step repeatedly on a given day without creating unwanted duplicates.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Gieorgie · Posted 11-02-2021 06:16 AM

Thank you so much. If I wanted to add diff_value to proc reports to show how the values have changed from the last to the current ?

proc report data=_work.policy_vintage_weekly;
column run_date policy_vintage,number_policy,diff_value;
define run_date / group;
define policy_vintage / across;
define number_policy / mean;
define diff_value / ?;
run;

Kurt_Bremser · Posted 11-02-2021 07:19 AM

You don't need REPORT for the report in your initial question, that is just a PROC PRINT:

/* make up date */
data policy_vintage_weekly;
input
  policy_vintage :$7.
  run_date
  number_policy
;
datalines;
2021-01 1 77
2021-01 2 77
;

/* calculate diff value */
data _policy_vintage_weekly;
set policy_vintage_weekly;
by policy_vintage;
diff_value = dif(number_policy);
if first.policy_vintage then diff_value = .;
run;

/* print report */
proc print data=_policy_vintage_weekly noobs;
var policy_vintage number_policy run_date diff_value;
run;

Result:

policy_vintage	number_policy	run_date	diff_value
2021-01	77	1	.
2021-01	77	2	0

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Gieorgie · Posted 11-02-2021 07:49 AM

That's what I meant! Amazing! I would like to ask you one more point. If I wanted to add a mean column where it would calculate the average of number_policy no matter what the run_date is for each vintage policy.?

When im tryign do like a below code i received.

POLICY_VINTAGE	NUMBER_POLICY	Run_Date	diff_value	mean_value
2021-10	660	1-2-1960	436	660
2021-10	660	1-3-1960	660	660
2021-10	1006	11-2-2021		1006

data _work.policy_vintage_weekly;
set policy_vintage_weekly_all;
by policy_vintage;
diff_value = dif(number_policy);
if first.policy_vintage then diff_value = .;
mean_value=mean(number_policy);
run;
%end;
/* print report */
proc print data=_work.policy_vintage_weekly noobs;
var policy_vintage number_policy run_date diff_value mean_value;
run;

Kurt_Bremser · Posted 11-02-2021 09:03 AM

So you want the mean for all entries of a given policy_vintage? The same value in every line, or something like a cumulative/rolling mean?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Gieorgie · Posted 11-02-2021 09:22 AM

all entries of a given policy_vintage eg. average for 2021.10

Kurt_Bremser · Posted 11-02-2021 09:26 AM

@Gieorgie wrote:

all entries of a given policy_vintage eg. average for 2021.10

I understand that, but where do you want to put it? The same overall value, displayed in every row / line of the report, or some kind of rolling stuff, or only on the last line of a group?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Gieorgie · Posted 11-02-2021 09:18 AM

I saw one thing doing here, maybe I got it wrong.

POLICY_VINTAGE	NUMBER_POLICY	Run_Date	diff_value
2021-09	483	1-2-1960	99
2021-09	483	1-3-1960	0
2021-09	436	11-2-2021	-47
2021-10	660	1-2-1960	224
2021-10	660	1-3-1960	0
2021-10	1006	11-2-2021	346

I mean that policy_vintage should make a diff for each, not in a row, e.g.And is it possible to show the results of only the previous and current run_date?

POLICY_VINTAGE	NUMBER_POLICY	Run_Date	diff_value
2021-09	483	1-2-1960	0
2021-09	483	1-3-1960	0
2021-09	436	11-2-2021	-47
2021-10	660	1-2-1960	0
2021-10	660	1-3-1960	0
2021-10	1006	11-2-2021	346

Kurt_Bremser · Posted 11-02-2021 09:28 AM

Change the condition for when to un-set the lagged value:

data _work.policy_vintage_weekly;
set policy_vintage_weekly_all;
by policy_vintage;
diff_value = dif(number_policy);
if not last.policy_vintage then diff_value = .;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Gieorgie · Posted 11-02-2021 10:11 AM

I changed but still see same problem Doing calculations between different policy_vintage. I mean that it should do between the same, i.e. only in 2021.10

2011-09	1	2021-W44-03	0
2011-10	5	1960-W00-07	4

Kurt_Bremser · Posted 11-03-2021 05:14 AM

If you have policy_vintage groups with only one entry, extend the condition:

if first.policy_vintage or not last.policy_vintage then diff_value = .;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Registration is open

SAS Training: Just a Click Away