BookmarkSubscribeRSS Feed
edasdfasdfasdfa
Quartz | Level 8

Hello,

Can we use the same data name throughout code (different blocks) and will the program know we are referring to the same data set or do we need a new name for new block?

 

 

8 REPLIES 8
Kurt_Bremser
Super User

A dataset is a dataset is a dataset and will have the same name for the time of its existence.

What exactly is your (perceived) issue? Show code and point out where you have doubts or questions.

edasdfasdfasdfa
Quartz | Level 8

Sorry for any confusion.

 

I just mean that I tend to give a new name to a data set each time I start a new block (assuming there is significant additions in that new block). But I have had the habit of reading the old data set name with set and giving the new block a new name. I'm just asking if I can use the same name throughout even if i start new blocks?

Kurt_Bremser
Super User

Define "block". In SAS parlance, a "block" is a set of statements that together constitute a logical entity, usually a do/end block in a data step.

SHOW CODE AND POINT OUT YOUR QUESTION POINTS.

edasdfasdfasdfa
Quartz | Level 8

data banktransactions;

infile

input

run;

 

can i just now refer to it again after the run statement like data banktransactions; ??

 

do i need a set statement?

Kurt_Bremser
Super User

This very rudimentary piece of pseudo-code seems to read external data from an external file into a dataset, therefore a SET statement is not needed; if you want to read the resulting dataset in a follow-up DATA step, a SET, MERGE or MODIFY statement is needed.

ChrisHemedinger
Community Manager

I think you want to know whether it's possible (and maybe good practice) to operate on the same data set over many different steps.  

It's possible, Yes.

 

data mydata;
 infile...;
 input ...;
 /* do work */
run;

data mydata;
 set mydata; /* bring in existing data file */
 /* do more work */
run;

data mydata;
 set mydata; /* bring in existing data file */
 /* do EVEN MORE work */
run;

As to whether it's a good idea...well, it depends.  With each step you are reading the data again, so it may be more efficient to combine steps.  And during development you might make mistakes and alter your data in ways you didn't intend.  If that happens, you don't have an intermediate version to go back to so you have to start over with running the first step.

SAS Hackathon registration is open! Build your skills. Make connections. Enjoy creative freedom. Maybe change the world.
Reeza
Super User

@edasdfasdfasdfa wrote:

data banktransactions;

infile

input

run;

 

can i just now refer to it again after the run statement like data banktransactions; ??

 

do i need a set statement?


You do not need a SET statement but you cannot just refer to your data set as 'data banktransactions'.

 

When you want to use the data set you refer to it using the libraryName.dataSetName notation. If the library is WORK you can omit that portion of the statement. However, you can use it in a DATA statement, SET statement or other relevant statements.

 

Using the same name over and over is not recommended because it makes it harder to debug your code. My development process is to write my code in steps and then once it's working add it back to my main data step. 

 

*import or make fake data;
data have;
set sashelp.class;
run;

*summarize data set;
proc means data=have;
run;

proc freq data=have;
run;

*add a new variable BMI;
data have2;
set have;
*****calculate BMI;
BMI = weight/ (height**2) * 703;
run;

*categorize that variable;
data have3;
set have2;
    	length category $20.;

    	if bmi < 18 then
    		category='Under Weight';
    	else if 18 <= BMI < 25 then
    		category='Normal';
    	else if 25 <= BMI < 30 then
    		category ='Over Weight';
    	else if BMI >=30 then
    		category = 'Obese';
run;

That would be my first stab. Once I was sure the category and calculation was working, I would then modify my code to be:

*add a new variable BMI and categorize it;
data have2;
set have;

length category $20.;

*****calculate BMI;
BMI = weight/ (height**2) * 703;



    	if bmi < 18 then
    		category='Under Weight';
    	else if 18 <= BMI < 25 then
    		category='Normal';
    	else if 25 <= BMI < 30 then
    		category ='Over Weight';
    	else if BMI >=30 then
    		category = 'Obese';
run;

 

ballardw
Super User

If you mean that you generate code similar to this:

data one;
  infile filename <fileoptions>;
  input <variables>;
run;

data two;
   set one;
   newvar = <some calculation>;
end;

data three;
   set two;
   othervar = <some calculation>;
end;

data four;
   set three;
   nowthatvar =<more calculations>;
run;

What I would suggest is after you testing the data from creating data set two to move the calculations in the data step that creates data set one. Then as you test each bit of code to move it to the earliest practical set.

 

It may be that the changes should all be moved into the data two step especially if you are replacing values in existing variables (data cleaning, standardizing or recoding perhaps). That way if data step two does have a problem then the data one is available to restart from the same point.

 

If any of these extra data steps only do Format or Label assignments then really they should be in an earlier data step.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1039 views
  • 0 likes
  • 5 in conversation