Help using Base SAS procedures

Use of same dataset

Reply
Contributor
Posts: 74

Use of same dataset

Hi,


Data temp;
set input_file;
run;

Data Temp;
Set Temp;
if sal < 1000 then flag=1;
run;.

I know this works,but is this recommended?

Please let me know your valuable comments.
Thanks in advance.
SAS Super FREQ
Posts: 8,743

Re: Use of same dataset

Hi:
I know that a LOT of folks do this and if you're a programmer who NEVER makes mistakes or logic errors, then this is an OK thing to do. But, I've been programming in SAS for over 20 years and I never do this. Here's why:
1) I used to work for lawyers and one thing they never wanted was for their statistical expert to be on the stand and get asked about why the same named file was being used for both INPUT and OUTPUT.
2) if there's any logic error in the code, you have just overwritten the original file with possible errors. Unless you have a backup of the original WORK.TEMP or INPUT_FILE, you are in trouble recreating or fixing the problem you just introduced.
3) it is harder to explain to beginners what is going on and therefore, only more experienced SAS programmers can maintain the program.

With the code you show, simple as it is, is it probably OK to have
[pre]
data temp;
set temp;
[/pre]

Sure, but the minute you get tempted to add more logic or more data manipulation to the program, I'd do something different. In fact, there's NO reason why you couldn't set the FLAG variable when you read in the INPUT_FILE, like this:
[pre]
Data temp;
set input_file;
if sal < 1000 then flag = 1;
run;

[/pre]

Understanding a bit more about how SAS data steps operate and how to make your programs more efficient would be a good learning exercise. There are a LOT of user group papers and documentation tutorials on how SAS works.

This particular program is rather inefficient
[pre]
Data temp; /* creates work.temp from work.input_file */
set input_file;
run;

Data Temp;
Set Temp; /* REREADS work.temp a second time */
if sal < 1000 then flag=1;
run;

[/pre]

because you are creating WORK.TEMP in the first step and then REREADING WORK.TEMP in order to set the FLAG variable. Probably not a big deal efficiency wise if you only have a few hundred observations,
but if you have hundreds of thousands or millions of obs, not a good idea.

I know that other folks have differing opinions about the construction you're using. And I'm sure you'll hear about them all!

cynthia
Trusted Advisor
Posts: 2,113

Re: Use of same dataset

Contrary to Cynthia, I routinely reuse dataset names in the WORK library. They are going to disappear at the end of the batch job or interactive session anyway, so I find the risk of data loss to be negligible.

We also work with datasets that have millions of rows in them, so it is easy to run out of disk space if one is not careful in its management. Reusing the dataset name is generally easier than inserting a lot of PROC DATASETS to explicitly delete the unneeded files.

In agreement with Cynthia, I do NOT write over permanent library files unless I think about it and make it a conscious decision. Usually that occurs in "freshening" an analysis file from a operational store (e.g. Oracle).

Doc Muhlbaier
Duke
PROC Star
Posts: 7,363

Re: Use of same dataset

I'll agree with Cynthia for two principal reasons. If you're importing large datasets (e.g., a file that takes around 2 or more hours to import .. but only minutes to analyze), why risk having to repeat the initial time consuming part.

And second, with permanent files, some shops depend on the files' date/time stamps. Re-saving the files changes those dates and times and can cause everyone a lot of unnecessary aggrevation.

As for Doc's concern, you can always delete files that are no longer needed.

Art
Super Contributor
Super Contributor
Posts: 3,174

Re: Use of same dataset

Additional performance considerations with this post/thread, when applicable:

1) use of the WHERE statement,
2) use of SAS views.

Scott Barry
SBBWorks, Inc.
Ask a Question
Discussion stats
  • 4 replies
  • 118 views
  • 0 likes
  • 5 in conversation