After adding the value, I would like to subtract the results of the previous code run from the current one and this is best demonstrated in proc reports. Always with previous with new one . But when he tries to do this, I don't know why, but he gets very strange results.This is my output which i get
POLICY_VINTAGE | NUMBER_POLICY | Run_Date | diff_value | |
2021-01 | 77 | 1 | 95 | |
2021-01 | 77 | 2 | 77 |
This is what i would like
POLICY_VINTAGE | NUMBER_POLICY | Run_Date | diff_value | |
2021-01 | 77 | 1 | 0 | |
2021-01 | 77 | 2 | 0 |
PROC SQL;
create table PolisyEnd as
select distinct
t4.spr_NRB as NRB
,intnx('month',datepart(t1.PRP_END_DATE),0,'b') format=yymmd7. as POLICY_VINTAGE,
case
when datepart(t1.PRP_END_DATE) IS NOT NULL and datepart(t1.PRP_END_DATE) - &gv_date_dly. < 0 THEN 'W'
when datepart(t1.PRP_END_DATE) IS NOT NULL and datepart(t1.PRP_END_DATE) - &gv_date_dly. >30 then 'U'
when datepart(t1.PRP_END_DATE) IS NOT NULL and datepart(t1.PRP_END_DATE) - &gv_date_dly. >= 30 THEN 'A'
when datepart(t1.PRP_END_DATE) IS NULL THEN 'NO INFO'
end as POLIS
from
cmz.WMDTZDP_BH t1
left join
(select distinct kontr_id,obj_oid from cmz.BH_D_ZAB_X_ALOK_&thismonth) t2
on t2.obj_oid = t1.obj_oid
left join
(select distinct data_danych, kontr_id, kre_nrb from dm.BH_WMDTKRE_&thismonth) t3
on t3.kontr_id = t2.kontr_id
left join
(select distinct spr_NRB, spr_STATUS from _mart.mart_kred) t4
on t4.spr_NRB = t3.kre_nrb
where datepart(t1.date_danych) between '5Aug2019'd and &gv_date_dly. and t1.Actual = "T"
and t4.spr_STATUS ="A"
;
quit;
PROC SQL;
create table POLICY_VINTAGE_WEEKLY as
select
POLICY_VINTAGE
,count(NRB) as NUMBER_POLICY
,today() as Run_Date format weeku.
from PolisyEnd
where POLIS="W"
group by
POLICY_VINTAGE
;
Quit;
proc append base=_work.policy_vintage_weekly data=policy_vintage_weekly force;
run;
data _work.policy_vintage_weekly;
set _work.policy_vintage_weekly;
by run_date;
diff_value= lag(number_policy);
if first.run_date then diff_value =.;
run;
Since you need to (physically) read the whole existing dataset first to get the last values for all groups, there is no difference between calculating the difference before appending or after. But you can save yourself the necessary sort by using an interleave instead of appending:
proc sql;
create table policy_vintage_weekly as
select
policy_vintage
,count(NRB) as number_policy
,today() as Run_Date format weeku.
from PolisyEnd
where POLIS = "W"
group by policy_vintage
;
quit;
data policy_vintage_weekly_all;
set
_work.policy_vintage_weekly (where=(run_date ne today()))
policy_vintage_weekly
;
by policy_vintage run_date;
run;
%if &syscc. = 0
%then %do;
data _work.policy_vintage_weekly;
set policy_vintage_weekly_all;
by policy_vintage;
diff_value = dif(number_policy);
if first.policy_vintage then diff_value = .;
run;
%end;
Your existing dataset needs to be sorted by policy_vintage and run_date once, after that, the order will always be kept.
The final step that replaces the existing dataset is made error-proof by wrapping it in the %IF.
A SET with multiple datasets and a BY does an "interleave". All datasets are read synchronously, and observations are selected according to the BY. The WHERE= dataset option on the existing dataset makes sure that you can run the step repeatedly on a given day without creating unwanted duplicates.
You need to sort by policy_vintage and run_date, and run your data step by policy_vintage.
Since you need to (physically) read the whole existing dataset first to get the last values for all groups, there is no difference between calculating the difference before appending or after. But you can save yourself the necessary sort by using an interleave instead of appending:
proc sql;
create table policy_vintage_weekly as
select
policy_vintage
,count(NRB) as number_policy
,today() as Run_Date format weeku.
from PolisyEnd
where POLIS = "W"
group by policy_vintage
;
quit;
data policy_vintage_weekly_all;
set
_work.policy_vintage_weekly (where=(run_date ne today()))
policy_vintage_weekly
;
by policy_vintage run_date;
run;
%if &syscc. = 0
%then %do;
data _work.policy_vintage_weekly;
set policy_vintage_weekly_all;
by policy_vintage;
diff_value = dif(number_policy);
if first.policy_vintage then diff_value = .;
run;
%end;
Your existing dataset needs to be sorted by policy_vintage and run_date once, after that, the order will always be kept.
The final step that replaces the existing dataset is made error-proof by wrapping it in the %IF.
A SET with multiple datasets and a BY does an "interleave". All datasets are read synchronously, and observations are selected according to the BY. The WHERE= dataset option on the existing dataset makes sure that you can run the step repeatedly on a given day without creating unwanted duplicates.
Thank you so much. If I wanted to add diff_value to proc reports to show how the values have changed from the last to the current ?
proc report data=_work.policy_vintage_weekly; column run_date policy_vintage,number_policy,diff_value; define run_date / group; define policy_vintage / across; define number_policy / mean; define diff_value / ?; run;
You don't need REPORT for the report in your initial question, that is just a PROC PRINT:
/* make up date */
data policy_vintage_weekly;
input
policy_vintage :$7.
run_date
number_policy
;
datalines;
2021-01 1 77
2021-01 2 77
;
/* calculate diff value */
data _policy_vintage_weekly;
set policy_vintage_weekly;
by policy_vintage;
diff_value = dif(number_policy);
if first.policy_vintage then diff_value = .;
run;
/* print report */
proc print data=_policy_vintage_weekly noobs;
var policy_vintage number_policy run_date diff_value;
run;
Result:
policy_vintage number_policy run_date diff_value 2021-01 77 1 . 2021-01 77 2 0
That's what I meant! Amazing! I would like to ask you one more point. If I wanted to add a mean column where it would calculate the average of number_policy no matter what the run_date is for each vintage policy.?
When im tryign do like a below code i received.
POLICY_VINTAGE | NUMBER_POLICY | Run_Date | diff_value | mean_value |
2021-10 | 660 | 1-2-1960 | 436 | 660 |
2021-10 | 660 | 1-3-1960 | 660 | 660 |
2021-10 | 1006 | 11-2-2021 | 1006 |
data _work.policy_vintage_weekly; set policy_vintage_weekly_all; by policy_vintage; diff_value = dif(number_policy); if first.policy_vintage then diff_value = .; mean_value=mean(number_policy); run; %end; /* print report */ proc print data=_work.policy_vintage_weekly noobs; var policy_vintage number_policy run_date diff_value mean_value; run;
So you want the mean for all entries of a given policy_vintage? The same value in every line, or something like a cumulative/rolling mean?
all entries of a given policy_vintage eg. average for 2021.10
@Gieorgie wrote:
all entries of a given policy_vintage eg. average for 2021.10
I understand that, but where do you want to put it? The same overall value, displayed in every row / line of the report, or some kind of rolling stuff, or only on the last line of a group?
I saw one thing doing here, maybe I got it wrong.
POLICY_VINTAGE | NUMBER_POLICY | Run_Date | diff_value |
2021-09 | 483 | 1-2-1960 | 99 |
2021-09 | 483 | 1-3-1960 | 0 |
2021-09 | 436 | 11-2-2021 | -47 |
2021-10 | 660 | 1-2-1960 | 224 |
2021-10 | 660 | 1-3-1960 | 0 |
2021-10 | 1006 | 11-2-2021 | 346 |
I mean that policy_vintage should make a diff for each, not in a row, e.g.And is it possible to show the results of only the previous and current run_date?
POLICY_VINTAGE | NUMBER_POLICY | Run_Date | diff_value |
2021-09 | 483 | 1-2-1960 | 0 |
2021-09 | 483 | 1-3-1960 | 0 |
2021-09 | 436 | 11-2-2021 | -47 |
2021-10 | 660 | 1-2-1960 | 0 |
2021-10 | 660 | 1-3-1960 | 0 |
2021-10 | 1006 | 11-2-2021 | 346 |
Change the condition for when to un-set the lagged value:
data _work.policy_vintage_weekly;
set policy_vintage_weekly_all;
by policy_vintage;
diff_value = dif(number_policy);
if not last.policy_vintage then diff_value = .;
run;
I changed but still see same problem Doing calculations between different policy_vintage. I mean that it should do between the same, i.e. only in 2021.10
2011-09 | 1 | 2021-W44-03 | 0 |
2011-10 | 5 | 1960-W00-07 | 4 |
If you have policy_vintage groups with only one entry, extend the condition:
if first.policy_vintage or not last.policy_vintage then diff_value = .;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.