I have a dataset with nine different dates.
Every time the job is launched, if the dataset already contains nine different dates, it deletes the least recent date, ensuring that there are always nine dates:
%macro versioning(FLOW=,dt_rif=); proc sql noprint; select count(distinct data_rif) into : num_ver from &FLOW ; quit;
proc sql noprint; select count(*) into : num_obs from &FLOW where data_rif=&dt_rif; quit;
%if &num_ver = 9 && &num_obs=0 %then %do;
proc sql noprint; select min(data_rif) into : data_rif_to_del from &FLOW; quit; data &FLOW; set &FLOW; where data_rif ne &data_rif_to_del; run; %end; %mend;
The issue is that the number of historical dates is no longer fixed at 9; it can vary. For example, it could be 5 or 12. How can I modify the code to determine which dates (&data_rif_to_del) need to be deleted?
Because your requirements are only partially clear to me, let me propose a much simpler way to go about this. Start with sorting your incoming data:
proc sort data=&FLOW;
by descending dt_rif;
run;
Then select how many distinct dates you would like:
%let n_select = 10;
Then because the data is sorted from highest to lowest, select the first 10 dates you encounter:
data want;
set &FLOW;
by descending dt_rif;
if first.dt_rif then date_count + 1;
if date_count > &n_select then stop;
drop date_count;
run;
This may do exactly what you want. But if not, it should give us a way to discuss what you would like to achieve here.
Something along the line of below should work for you.
data have;
format dt_var date9.;
do dt_var=today()-11 to today();
output;
output;
end;
run;
%macro versioning(inds=,outds=, dt_rif=,keep_n_dist_dates=999999);
%if %nrbquote(&outds) = %nrbquote() %then %let outds=&inds;
data _null_;
if 0 then set &inds(keep=&dt_rif);
dcl hash h1(dataset:"&inds", ordered:'d');
h1.defineKey("&dt_rif");
h1.defineData("&dt_rif");
h1.defineDone();
call symputx('n_dist_dates',h1.num_items,'l');
h1.output(dataset:'work.__dist_dates');
run;
%if &n_dist_dates>&keep_n_dist_dates %then
%do;
data _null_;
set work.__dist_dates(firstobs=&keep_n_dist_dates obs=&keep_n_dist_dates);
call symputx("min_dt_keep",&dt_rif,'l');
run;
data &outds;
set &inds;
where &dt_rif >= &min_dt_keep;
run;
%end;
%else %if %nrbquote(&inds) ne %nrbquote(&outds) %then
%do;
data &outds;
set &inds;
run;
%end;
proc datasets lib=work nolist nowarn;
delete __dist_dates;
quit;
%mend;
%versioning(inds=have, dt_rif=dt_var, keep_n_dist_dates=5);
proc print data=have;
run;
Thanks Patrick,
but the input dataset can have multiple records with the same date.
For example
dt_rif | Value_1 | Value_2 | Desc_1 |
31mar2023 | 13 | 4343 | a |
31mar2023 | 13 | 34324 | b |
31mar2023 | 233 | 34324 | c |
31mar2023 | 99 | 546 | Desc_1 |
30apr2023 | 54 | 565645 | e |
30apr2023 | 323 | 4354 | f |
31may2023 | 3434 | 45 | g |
31may2023 | 234 | 3454 | h |
31may2023 | 43 | 3455 | y |
.. | .. | .. | .. |
The current value of the different dt_rif is 9. However, they now want to modify this value.
If they change from 9 to 10, the job must remove the oldest dates so that the output dataset contains the last ten distinct dates.
Because your requirements are only partially clear to me, let me propose a much simpler way to go about this. Start with sorting your incoming data:
proc sort data=&FLOW;
by descending dt_rif;
run;
Then select how many distinct dates you would like:
%let n_select = 10;
Then because the data is sorted from highest to lowest, select the first 10 dates you encounter:
data want;
set &FLOW;
by descending dt_rif;
if first.dt_rif then date_count + 1;
if date_count > &n_select then stop;
drop date_count;
run;
This may do exactly what you want. But if not, it should give us a way to discuss what you would like to achieve here.
@rakeon3 wrote:
Thanks Patrick,
but the input dataset can have multiple records with the same date.
....
The current value of the different dt_rif is 9. However, they now want to modify this value.
If they change from 9 to 10, the job must remove the oldest dates so that the output dataset contains the last ten distinct dates.
@rakeon3 That's what the macro I've shared is doing. The HAVE dataset created for testing contains already two rows per date.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.