DATA Step, Macro, Functions and more

Create new variable without creating new data

Reply
Contributor
Posts: 31

Create new variable without creating new data

I have a data set with hundreds of variables and millions of observations. I need to constantly modify this data (about 30 GB). For example, creating a new variable by take sum of two existing variables, etc. I used the following data step:

 

data mydata;

  set mydata;

  new_var=var1+var2;

run;

 

My understanding is that even the data has the same name, but SAS is creating a new data based on the original one. This will take longer time every time since the data size is large. I am wondering if there is other more efficient ways to add new variable to existing data. 

 

Super User
Posts: 23,754

Re: Create new variable without creating new data

Not really, once you're creating new variables it will recreate the data. SQL seems like it doesn't using insert, but it still does as well. 

 

What type of efficiency are you trying to gain here? Are you trying to speed up your processing to reduce wait time while programming or your time?

Contributor
Posts: 31

Re: Create new variable without creating new data

want to reduce waiting time Smiley Happy

Super User
Posts: 23,754

Re: Create new variable without creating new data

I don't know enough to do that Smiley Sad But the approaches I use:

 

1. hash tables for joins, assuming you have enough memory

2. Limit the number of obs when testing code, I'll usually only work with about 1-2 million observations, enough to have all the things I'm likely to encounter. If you use that, be very careful with sorts - especially ones without OUT data sets. It will create a data set with only 1000000 records. 

option obs=1000000;

3. Make sure  to do multiple steps together rather than have 5 different data steps that each create a single variable.

4. Create indexes to make sure the usage afterwards is quick. 

 


sasecn wrote:

want to reduce waiting time Smiley Happy


 

Contributor
Posts: 31

Re: Create new variable without creating new data

Thanks for the ideas!

Super User
Posts: 13,566

Re: Create new variable without creating new data


sasecn wrote:

I have a data set with hundreds of variables and millions of observations. I need to constantly modify this data (about 30 GB). For example, creating a new variable by take sum of two existing variables, etc. I used the following data step:

 

 


You might look into WHY you have to "constantly modify" existing data. Doing so is often indicative of poor process planning.

Are the new variables used frequently or just once?

 

If for a specific analysis task it is often preferable to extract only the variables needed and then add the variables just for that task. Such as:

 

Data analysistask;

   set mydata (keep= var1 var2 <other variables needed for the specific task>);

   new_var = sum(var1, var2); /* or var1+var2 if you know why there is a difference between SUM and +*/

run;

and use the analysistask set for the next steps.

Another approach instead of creating a huge data set might be to create a view from the existing data. The view basically has the code to create stuff from the current version of the data set and trades execution time for storage space

Data analysistask /view=analysistask;

   set mydata (keep= var1 var2 <other variables needed for the specific task>);

   new_var = sum(var1, var2); /* or var1+var2 if you know why there is a difference between SUM and +*/

run;

 

Creates a reference that will read the current set mydata.

Then something like

 

Proc Reg data=analysistask;

<proc options>

run;

in effect reads Mydata, adds the new_var and then uses the result in the procedure. Which will be slower to execute but doesn't have the opportunity to overwrite existing data.

 

BTW habitual use of the structure:

Data mydata;

   set mydata;

  <modification code>

;

will eventually lead to case where one or more variables has values you do not expect and you will have a lot of fun trying to recover data.

Contributor
Posts: 31

Re: Create new variable without creating new data

Thanks for the reply! Good thoughts!

Ask a Question
Discussion stats
  • 6 replies
  • 151 views
  • 1 like
  • 3 in conversation