BookmarkSubscribeRSS Feed

The DATA= option on the PROC DELETE statement will accept numerical suffixed datasets lists, like have1-have3 , but not lists using the colon suffix wildcard, like have: .  Can we enhance PROC DELETE to allow the use of dataset name lists defined by using the colon wildcard.  Note that the DELETE statement in PROC DATASETS does support dataset name lists made using colon suffix.

 

See this thread:  https://communities.sas.com/t5/SAS-Programming/How-to-drop-tables-with-same-prefix-witout-a-cicle/td...

8 Comments
DianeOlson
SAS Employee

Hi, Tom.

 

Thanks for your suggestion!

 

I'm the developer for PROC DATASETS and have worked on PROC DELETE along the way as well. There are some things that PROC DATASETS does well and other things you need to count on PROC DELETE for, and vice versa. You probably know that PROC DATASETS creates an in-memory list of your SAS library contents when it initializes. Because it has that information, PROC DATASETS DELETE statement is able to easily process a colon suffix wildcard. PROC DELETE  deletes specific files as instructed and has no listing of the library's contents (which takes a fair amount of time to obtain for large directories). Because of this, PROC DELETE can't do something open-ended like the colon suffix wildcard. It can do a numbered list because it can create all the names between the first number specified and the last number specified, and then try to delete all of those. Obtaining a listing of the library is time-consuming, so PROC DELETE does not employ that method. Different tools for different situations. There is a paper that talks about some of this if you are interested: Optimize Your Delete 

 

Thanks again,
Diane

Tom
Super User
Super User

Does your response mean that every place where support for member name lists is implemented a new copy of the logic to expand the list needs be hard coded into the individual procedure?  Seems strange to me and different to how it feels like the use of colon in variable lists is implemented.  Each statement in a data step that needs support variable lists doesn't have to build its own logic. The variable list specification is just replaced with the list of zero or more variables that match it and the statement works exactly as it did before when given a normal space delimited list of variable names.  How is the member list processing that much different? 

 

Perhaps the real place where the change needs to happen is outside of PROC DELETE and in the general process that expands member name lists.

DianeOlson
SAS Employee

Variable lists are much easier. They come from reading the data set. Member lists come from an operating system call.

Quentin
Super User

Thanks for this explanation @DianeOlson , it's very helpful information!

 

I'm just curious, if you consider a SET statement with a colon- modifier, e.g.:

11   data want ;
12     set sashelp.prd:  ;
13   run ;

NOTE: There were 23040 observations read from the data set SASHELP.PRDSAL2.
NOTE: There were 11520 observations read from the data set SASHELP.PRDSAL3.
NOTE: There were 1440 observations read from the data set SASHELP.PRDSALE.
NOTE: The data set WORK.WANT has 36000 observations and 14 variables.Do

Does the SET statement do an OS call to build the list of data sets, or does it use a dictionary table?  

 

Regardless, I would assume the SET statement only does this lookup when there is a actually a colon-modified data set name.  I would think PROC DELETE could do something similar, i.e. "if there is a colon-modified name then look up all the names of all the datasets to see which fit the pattern."

 

So we might pay a small performance when we choose to use a colon modifier.  

 

The colon modifier is so handly, would be nice to have it for PROC DELETE.  (Although admittedly, most of my code still uses PROC DATASETS, because I started programming in the 'gap years' where PROC DELETE was undocumented... : )

 

That's a handy paper you linked to.  But I shudder at the thought of a library with thousands of members.

DianeOlson
SAS Employee

@Quentin  Hi there, Quentin.  I didn't know the answer to your question, so I went to look at the DATA step code. It uses the operating system call to get the list of names, and yes, it only does that for the colon modifier, it appears.

 

I really appreciate that you shudder at a library with thousands of members, as I did! Unfortunately, that is becoming more and more common. Perhaps it has to do with so much more data being available these days. Performance is the major complaint about PROC DATASETS, seen in the case of large libraries.

 

I appreciate your post. I'll send email to the PROC DELETE developer so that she is aware.

 

Quentin
Super User

Thanks again @DianeOlson .  As a long time SAS fan, there's so much mystique around benefits of being a SAS employee (the campus, the m&m's, all the other reasons SAS is constantly on the best place to work lists... )  But reading your note, all I can think is how cool it would be to be someone who could say "I went to look at the DATA step code..."

 

Not that it would actually do me any good (since I don't know C, or whatever it's written in now), but just the thought is exciting. : )  Thanks for the follow-up. 

ChrisHemedinger
Community Manager
Status changed to: Not Planned

Great discussion on this! As noted, this specific enhancement isn't in the plans, but the ideas have made to the developers for future consideration.

ChrisNZ
Tourmaline | Level 20

I understand why the idea might not desirable.

In turn, that makes me wonder then why the colon syntax was recently introduced for the SET statement.

Why was it decided that this improvement was good for SET, when the discussion above explains why it is bad for PROC DELETE?