BookmarkSubscribeRSS Feed
learnsas2
Calcite | Level 5

I have a library that have tables with name as "tablename_YYYYMM".I want to write a script which loops through all the folders on the disk and deletes tables older than the years specified. Table deletion should be done on the date implied by the filename (dataset name)and individual querying of the dataset is not required. the deleted dataset name and deletion time is stored. I have attached the tables structure. I have the below code which deletes tables older than 4 years but what if I want to delete some tables older than 4 years and some tables older than 7 years. I have attached a sample dataset structure present in the library. So suppose i want to delete tables starting with "data_yyyymm" which are older than 7 years and rest of the tables older than 4 years. how to do it?

DATA work.deleted;
 length libname $15. time 8;
 format time datetime20.;
 RETAIN libname memname time;
 SET contents(KEEP=LIBNAME MEMNAME) end=eof;
 by memname;
 where length(scan(memname, -1, '_')) = 6 and  input(scan(memname, -1, '_'),? yymmn6.) < intnx('year', today(), -4, 'b');
  if first.memname; 
    time = datetime();
 if _n_ = 1 then call execute('proc sql;');
 call execute(catx(' ', 'drop table', catx('.', libname, memname), ';'));
 if eof then do;
 call execute('quit;');
 end;
 RUN;
12 REPLIES 12
PaigeMiller
Diamond | Level 26

... but what if I want to delete some tables older than 4 years and some tables older than 7 years.

 

We can't program this. It isn't clear what you mean by this. Please explain further. Please give examples. By the way, some of us, like me, refuse to — or cannot — download attachments, and so the data you have (or a representative portion of it) ought to be included in your message as text, rather than as an attachment.

--
Paige Miller
learnsas2
Calcite | Level 5

data_202011
data_201312
data_201401
data_201702
data_201805
trial_202011
trial_201902
trial_201111
trial_201703
dummy_202101
dummy_202005
dummy_201806
dummy_201709

The datasets are in the above format. i want to delete tables older than 7 years whose name starts with "data_yyyymm" and rest of the tables will be deleted older than 4 years

PaigeMiller
Diamond | Level 26

@learnsas2 wrote:

 

The datasets are in the above format. i want to delete tables older than 7 years whose name starts with "data_yyyymm" and rest of the tables will be deleted older than 4 years


This is simply adding in the proper IF statements to your logic, depending on whatever the first "word" of the file name is.

--
Paige Miller
learnsas2
Calcite | Level 5
can you show me how? i tried but it didn't work
PaigeMiller
Diamond | Level 26

Hi. Saying you "tried", and not providing any other information, makes it difficult to help. Show us what you tried — show us the code. If there are errors, also show us the ENTIRE log. Please post code in the window that appears when you click on the "little running man" icon and please post the log in the window that appears when you click on the </> icon.

--
Paige Miller
learnsas2
Calcite | Level 5

i used the below code

PROC CONTENTS DATA=work._ALL_ OUT=work.contents NOPRINT;
RUN;

/*Sorting the contents dataset by memname*/
proc sort data=work.contents nodupkey;
by memname;
run;

/*Deleting datasets based on period and storing details in the work library*/

DATA work.deleted;
length libname $15.;
format deletion_time datetime20.;
RETAIN libname memname deletion_time;
SET work.contents(KEEP=LIBNAME MEMNAME) end=eof;
by memname;
where length(scan(memname, -1, '_')) = 6;
deletion_time = datetime();
if find(memname,'data','i') ge 1 then do;
if first.memname;
if _n_ = 1 then call execute('proc sql;');
call execute(catx(' ', 'drop table', catx('.', libname, memname), ';'));
call execute (catx(' ', 'where', 'input(scan(memname, -1, '_'),? yymmn6.) < intnx('year', today(), -7, 'b');'));
if eof then do;
call execute('quit;');
end;
end;
else do;
if first.memname;
if _n_ = 1 then call execute('proc sql;');
call execute(catx(' ', 'drop table', catx('.', libname, memname), ';'));
call execute (catx(' ', 'where', 'input(scan(memname, -1, '_'),? yymmn6.) < intnx('year', today(), -4, 'b');'));
if eof then do;
call execute('quit;');
end;
end;

RUN;

 

and the log has the below errors
228 PROC CONTENTS DATA=work._ALL_ OUT=work.contents NOPRINT;
229 RUN;

NOTE: The data set WORK.CONTENTS has 128 observations and 41 variables.
NOTE: Compressing data set WORK.CONTENTS decreased size by 0.00 percent.
Compressed is 2 pages; un-compressed would require 2 pages.
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.06 seconds
cpu time 0.06 seconds


230
231 /*Sorting the contents dataset by memname*/
232 proc sort data=work.contents nodupkey;
233 by memname;
234 run;

NOTE: There were 128 observations read from the data set WORK.CONTENTS.
NOTE: SAS sort was used.
NOTE: 107 observations with duplicate key values were deleted.
NOTE: The data set WORK.CONTENTS has 21 observations and 41 variables.
NOTE: Compressing data set WORK.CONTENTS increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds


235
236 /*Deleting datasets based on period and storing details in the work library*/
237
238 DATA work.deleted;
239 length libname $15.;
240 format deletion_time datetime20.;
241 RETAIN libname memname deletion_time;
242 SET work.contents(KEEP=LIBNAME MEMNAME) end=eof;
243 by memname;
244 where length(scan(memname, -1, '_')) = 6;
245 deletion_time = datetime();
246 if find(memname,'data','i') ge 1 then do;
247 if first.memname;
248 if _n_ = 1 then call execute('proc sql;');
249 call execute(catx(' ', 'drop table', catx('.', libname, memname), ';'));
250 call execute (catx(' ', 'where', 'input(scan(memname, -1, '_'),? yymmn6.) < intnx('year',
-------------------------- -----------------------
49 49
-
388
-
76
250 ! today(), -7, 'b');'));
NOTE 49-169: The meaning of an identifier after a quoted string might change in a future SAS release.
Inserting white space between a quoted string and the succeeding identifier is
recommended.

ERROR 388-185: Expecting an arithmetic operator.

ERROR 76-322: Syntax error, statement will be ignored.

251 if eof then do;
252 call execute('quit;');
253 end;
254 end;
255 else do;
256 if first.memname;
257 if _n_ = 1 then call execute('proc sql;');
258 call execute(catx(' ', 'drop table', catx('.', libname, memname), ';'));
259 call execute (catx(' ', 'where', 'input(scan(memname, -1, '_'),? yymmn6.) < intnx('year',
-------------------------- -----------------------
49 49
-
388
-
76
259 ! today(), -4, 'b');'));
NOTE 49-169: The meaning of an identifier after a quoted string might change in a future SAS release.
Inserting white space between a quoted string and the succeeding identifier is
recommended.

ERROR 388-185: Expecting an arithmetic operator.

ERROR 76-322: Syntax error, statement will be ignored.

260 if eof then do;
261 call execute('quit;');
262 end;
263 end;
264
265 RUN;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.DELETED may be incomplete. When this step was stopped there were 0
observations and 3 variables.
WARNING: Data set WORK.DELETED was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.08 seconds
cpu time 0.09 seconds

PaigeMiller
Diamond | Level 26

Repeating: Please post code in the window that appears when you click on the "little running man" icon and please post the log in the window that appears when you click on the </> icon.

--
Paige Miller
learnsas2
Calcite | Level 5
PROC CONTENTS DATA=work._ALL_ OUT=work.contents NOPRINT;
RUN;

/*Sorting the contents dataset by memname*/
proc sort data=work.contents nodupkey;
by memname;
run;

/*Deleting datasets based on period and storing details in the work library*/

DATA work.deleted;
length libname $15.;
format deletion_time datetime20.;
RETAIN libname memname deletion_time;
SET work.contents(KEEP=LIBNAME MEMNAME) end=eof;
by memname;
where length(scan(memname, -1, '_')) = 6;
deletion_time = datetime();
if find(memname,'data','i') ge 1 then do;
if first.memname;
if _n_ = 1 then call execute('proc sql;');
call execute(catx(' ', 'drop table', catx('.', libname, memname), ';'));
call execute (catx(' ', 'where', 'input(scan(memname, -1, '_'),? yymmn6.) < intnx('year', today(), -7, 'b');'));
if eof then do;
call execute('quit;');
end;
end;
else do;
if first.memname;
if _n_ = 1 then call execute('proc sql;');
call execute(catx(' ', 'drop table', catx('.', libname, memname), ';'));
call execute (catx(' ', 'where', 'input(scan(memname, -1, '_'),? yymmn6.) < intnx('year', today(), -4, 'b');'));
if eof then do;
call execute('quit;');
end;
end;

RUN;
228 PROC CONTENTS DATA=work._ALL_ OUT=work.contents NOPRINT;
229 RUN;

NOTE: The data set WORK.CONTENTS has 128 observations and 41 variables.
NOTE: Compressing data set WORK.CONTENTS decreased size by 0.00 percent.
Compressed is 2 pages; un-compressed would require 2 pages.
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.06 seconds
cpu time 0.06 seconds


230
231 /*Sorting the contents dataset by memname*/
232 proc sort data=work.contents nodupkey;
233 by memname;
234 run;

NOTE: There were 128 observations read from the data set WORK.CONTENTS.
NOTE: SAS sort was used.
NOTE: 107 observations with duplicate key values were deleted.
NOTE: The data set WORK.CONTENTS has 21 observations and 41 variables.
NOTE: Compressing data set WORK.CONTENTS increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds


235
236 /*Deleting datasets based on period and storing details in the work library*/
237
238 DATA work.deleted;
239 length libname $15.;
240 format deletion_time datetime20.;
241 RETAIN libname memname deletion_time;
242 SET work.contents(KEEP=LIBNAME MEMNAME) end=eof;
243 by memname;
244 where length(scan(memname, -1, '_')) = 6;
245 deletion_time = datetime();
246 if find(memname,'data','i') ge 1 then do;
247 if first.memname;
248 if _n_ = 1 then call execute('proc sql;');
249 call execute(catx(' ', 'drop table', catx('.', libname, memname), ';'));
250 call execute (catx(' ', 'where', 'input(scan(memname, -1, '_'),? yymmn6.) < intnx('year',
-------------------------- -----------------------
49 49
-
388
-
76
250 ! today(), -7, 'b');'));
NOTE 49-169: The meaning of an identifier after a quoted string might change in a future SAS release.
Inserting white space between a quoted string and the succeeding identifier is
recommended.

ERROR 388-185: Expecting an arithmetic operator.

ERROR 76-322: Syntax error, statement will be ignored.

251 if eof then do;
252 call execute('quit;');
253 end;
254 end;
255 else do;
256 if first.memname;
257 if _n_ = 1 then call execute('proc sql;');
258 call execute(catx(' ', 'drop table', catx('.', libname, memname), ';'));
259 call execute (catx(' ', 'where', 'input(scan(memname, -1, '_'),? yymmn6.) < intnx('year',
-------------------------- -----------------------
49 49
-
388
-
76
259 ! today(), -4, 'b');'));
NOTE 49-169: The meaning of an identifier after a quoted string might change in a future SAS release.
Inserting white space between a quoted string and the succeeding identifier is
recommended.

ERROR 388-185: Expecting an arithmetic operator.

ERROR 76-322: Syntax error, statement will be ignored.

260 if eof then do;
261 call execute('quit;');
262 end;
263 end;
264
265 RUN;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.DELETED may be incomplete. When this step was stopped there were 0
observations and 3 variables.
WARNING: Data set WORK.DELETED was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.08 seconds
cpu time 0.09 seconds
Amir
PROC Star

Hi @learnsas2,

 

Try changing the surrounding single quotes to double quotes for the third catx() parameter, so that the single quotes in the scan() and intnx() functions can be interpreted correctly, e.g., try using:

 

call execute (catx(' ', 'where', "input(scan(memname, -1, '_'),? yymmn6.) < intnx('year', today(), -4, 'b');"));

 

Kind regards,

Amir.

Tom
Super User Tom
Super User

This condition 

if find(memname,'data','i') ge 1 then do;

Is also not the same as your requirement that the name start with DATA.  This will also find datasets names XXDATA_YYYYYY.  Is that ok?  If you want DATA to be first part of the name, like DATAXX_YYYYYY then test if the location it is found is 1 only.

 

This statement is not needed.

if first.memname;

Since you used PROC SORT with NODUPKEY to reduce the data to one observation per MEMNAME already.

 

This statement is placed in the wrong location. 

if _n_ = 1 then call execute('proc sql;');

If should be before the IF/THEN DO block.  Otherwise it will not run when the first dataset is not one that meets that IF condition.  The same goes for the 

if eof then do;
call execute('quit;');
end;

That should be placed outside of the conditional blocks.

 

This statement makes no sense at all

call execute (catx(' ', 'where', 'input(scan(memname, -1, '_'),? yymmn6.) < intnx('year', today(), -7, 'b');'));

The data step that is running it is already the dataset that has the variable MEMNAME you appear to want to test.  Why are you trying to generate some SQL code to run later?  Why not just test the name of the dataset you are already seeing and use that to decide whether or not you need to generate the DROP statement?

 

I cannot tell if you want to delete dataset that have the YYYYMM stamp on their names that are more than 7 years old or more than 4 years old.  Here is code to do it for those that are more than 7 years old.

DATA work.deleted;
  SET work.contents(KEEP=LIBNAME MEMNAME) end=eof;
  where length(scan(memname, -1, '_')) = 6;
  where also find(memname,'data','i') ge 1;
  where also not missing(input(scan(memname, -1, '_'),?yymmn6.));
  where also input(scan(memname, -1, '_'),?yymmn6.) < intnx('year', today(), -7, 'b');
  deletion_time = datetime();
  format deletion_time datetime20.;
  if _n_ = 1 then call execute('proc sql;');
  call execute(catx(' ', 'drop table', catx('.', libname, memname), ';'));
  if eof then call execute('quit;');
RUN;

So the where clause is filtering dataset names with:

  • Six character suffix after an _
  • Have the string DATA in them
  • The six character suffix is a valid YYYYMM string
  • The year of the suffix is more than seven years ago

 

Note your test using the INTNX() with the "B" option would be easier to implement if you only read the YEAR from the name.

 where also input(scan(memname, -1, '_'),?4.) < year(today())-7 ;

 

ballardw
Super User

You still have a lot of imprecision in your details. "tables older than 7 years" is not clearly defined. If you mean the year in the NAME you should state so. But it might be that are actually requesting details about data sets created more than 7 years ago and SOME of the names would be excluded because, as an example Data_200310 (I know, not in your list but if your list is exhaustive it would be so easy to program explicitly I am assuming you have many more data sets) was actually created in 2020 and is not actually 7 years old.

 

AGE versus NAME without explicitly stating the relationship is a business logic issue. Files have creation and modification dates. Proc Contents or Datasets will report them. You can query  sashelp.vtable (or Dictionary.tables in Proc SQL) to see creation date and last modified dates. So we don't know with your description if this step is needed or not.

Reeza
Super User

Solution from StackOverflow

 

*make fake data;
data demo_201501;
    set sashelp.class;

data random_201803;
    set sashelp.class;

data help_202004;
    set sashelp.class;
run;

*get list of tables;

data _table_list;
    set sashelp.vtable;
    where libname='WORK' and substr(memname, 1, 1) ne "_";
    cutoff_date_data=put(intnx('year', today(), -7, 'b'), yymmn6.);
    cutoff_date_rest=put(intnx('year', today(), -4, 'b'), yymmn6.);
    date_file=scan(memname, 2, "_");

    if (scan(memname, 1, "_") = "DATA" and date_file < cutoff_date_data) 
    OR (scan(memname, 1, "_") ne "DATA" date_file < cutoff_date_rest);

    keep libname memname nobs cutoff_date date_file;
run;

*delete via proc datasets;
data _null_;
    set _table_list end=eof;

    if _n_=1 then
        do;
            call execute('proc datasets lib=work nolist nodetails;
delete');
        end;
    call execute(" "||memname||" ");

    if eof then
        call execute(";run;quit;");
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 769 views
  • 1 like
  • 6 in conversation