BookmarkSubscribeRSS Feed
ren2010
Obsidian | Level 7
Hi,


Data temp;
set input_file;
run;

Data Temp;
Set Temp;
if sal < 1000 then flag=1;
run;.

I know this works,but is this recommended?

Please let me know your valuable comments.
Thanks in advance.
4 REPLIES 4
Cynthia_sas
SAS Super FREQ
Hi:
I know that a LOT of folks do this and if you're a programmer who NEVER makes mistakes or logic errors, then this is an OK thing to do. But, I've been programming in SAS for over 20 years and I never do this. Here's why:
1) I used to work for lawyers and one thing they never wanted was for their statistical expert to be on the stand and get asked about why the same named file was being used for both INPUT and OUTPUT.
2) if there's any logic error in the code, you have just overwritten the original file with possible errors. Unless you have a backup of the original WORK.TEMP or INPUT_FILE, you are in trouble recreating or fixing the problem you just introduced.
3) it is harder to explain to beginners what is going on and therefore, only more experienced SAS programmers can maintain the program.

With the code you show, simple as it is, is it probably OK to have
[pre]
data temp;
set temp;
[/pre]

Sure, but the minute you get tempted to add more logic or more data manipulation to the program, I'd do something different. In fact, there's NO reason why you couldn't set the FLAG variable when you read in the INPUT_FILE, like this:
[pre]
Data temp;
set input_file;
if sal < 1000 then flag = 1;
run;

[/pre]

Understanding a bit more about how SAS data steps operate and how to make your programs more efficient would be a good learning exercise. There are a LOT of user group papers and documentation tutorials on how SAS works.

This particular program is rather inefficient
[pre]
Data temp; /* creates work.temp from work.input_file */
set input_file;
run;

Data Temp;
Set Temp; /* REREADS work.temp a second time */
if sal < 1000 then flag=1;
run;

[/pre]

because you are creating WORK.TEMP in the first step and then REREADING WORK.TEMP in order to set the FLAG variable. Probably not a big deal efficiency wise if you only have a few hundred observations,
but if you have hundreds of thousands or millions of obs, not a good idea.

I know that other folks have differing opinions about the construction you're using. And I'm sure you'll hear about them all!

cynthia
Doc_Duke
Rhodochrosite | Level 12
Contrary to Cynthia, I routinely reuse dataset names in the WORK library. They are going to disappear at the end of the batch job or interactive session anyway, so I find the risk of data loss to be negligible.

We also work with datasets that have millions of rows in them, so it is easy to run out of disk space if one is not careful in its management. Reusing the dataset name is generally easier than inserting a lot of PROC DATASETS to explicitly delete the unneeded files.

In agreement with Cynthia, I do NOT write over permanent library files unless I think about it and make it a conscious decision. Usually that occurs in "freshening" an analysis file from a operational store (e.g. Oracle).

Doc Muhlbaier
Duke
art297
Opal | Level 21
I'll agree with Cynthia for two principal reasons. If you're importing large datasets (e.g., a file that takes around 2 or more hours to import .. but only minutes to analyze), why risk having to repeat the initial time consuming part.

And second, with permanent files, some shops depend on the files' date/time stamps. Re-saving the files changes those dates and times and can cause everyone a lot of unnecessary aggrevation.

As for Doc's concern, you can always delete files that are no longer needed.

Art
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Additional performance considerations with this post/thread, when applicable:

1) use of the WHERE statement,
2) use of SAS views.

Scott Barry
SBBWorks, Inc.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 801 views
  • 0 likes
  • 5 in conversation