BookmarkSubscribeRSS Feed
taupirho
Calcite | Level 5

I currently use the following code to read a list of variable names

 

data list_of_tables ;
infile datalines;
input TABLE_NAME & $40.;
datalines;
TABLE_1
TABLE_2
;
run;

Lets say I had a variable number of TABLE_N's to read in like this ie in one run  it would be 3, the next run it might be 5 etc ... . Is there a way to do this?

19 REPLIES 19
PaigeMiller
Diamond | Level 26

I feel like I'm missing the point of your question. If you want to read in five table names, you would use this code. So what am I not understanding?

 


data list_of_tables ;
infile datalines;
input TABLE_NAME & $40.;
datalines;
TABLE_1
TABLE_2
TABLE_3
TABLE_4
TABLE_5
;
run;
--
Paige Miller
taupirho
Calcite | Level 5

Let's say I had  large datasets that I wanted to divide up into smaller datasets and then process each new sub-set of data. On one run the sub-division might give me 3 sub-sets, TABLE_1, TABLE_2, TABLE_3 . On another run it might give me 100 sub-sets, TABLE_1-100  etc ...

PaigeMiller
Diamond | Level 26

Are you trying to say that this is part of some larger activity, where sometimes there are 100 datasets and sometimes there are 600 datasets and sometimes there are 7 dataset, and you want this splitting to happen automatically? How would you know how many datasets there are? What are the rules to do the splitting?

--
Paige Miller
taupirho
Calcite | Level 5

Something like that. There are multiple datasets as input. Each input dataset is to be split up depending on its size.

 

Input dataset1 might end up being split up into 5 sub datasets, input dataset2 might be split up into 10 sub-datasets.

 

At the end of each input dataset splitting I'll know how many sub-sets it was split into. How do I then cycle through and get the name of each sub-set into a variable that I can use to process it.

PaigeMiller
Diamond | Level 26

@taupirho wrote:

Something like that. There are multiple datasets as input. Each input dataset is to be split up depending on its size.

 

Input dataset1 might end up being split up into 5 sub datasets, input dataset2 might be split up into 10 sub-datasets.

 

At the end of each input dataset splitting I'll know how many sub-sets it was split into. How do I then cycle through and get the name of each sub-set into a variable that I can use to process it.


So you know how many sub-sets it was split into, but how would any SAS code that we write know?

--
Paige Miller
andreas_lds
Jade | Level 19

And why do you have to split the dataset at all?

Splitting datasets requires additional code for splitting, even more code to process the fragments and another amount of code to add everything together at the end.

PaigeMiller
Diamond | Level 26

@andreas_lds wrote:

And why do you have to split the dataset at all?

Splitting datasets requires additional code for splitting, even more code to process the fragments and another amount of code to add everything together at the end.


I agree with this, usually you would not want to do this.

 

The exception where you would want to do this is if you are parallelizing some very time consuming task, and you can run parts of this task on different remote computers simultaneously. So, @taupirho, are you in this situation, or not?

--
Paige Miller
taupirho
Calcite | Level 5

We are doing some numerical and frequency analysis a series of big datasets. A small data set would be 2.5 Gb and going up to 500Gb.

When I tried our analysis code on the one of the small datasets I got a memory error

 

Table XYZ (Number 1) Frequency Analysis began at Runtime 10:36:48. Rundate 11/07/2019.
Table XYZ (Number 1) has 537 Fields for Frequency Analysis
ERROR: PROC SUMMARY was terminated prematurely due to a memory shortage. Adjusting MEMSIZE and/or SUMSIZE may allow normal
       processing to complete.
ERROR: The SAS System stopped processing this step because of insufficient memory.
WARNING: The data set WORK.SUMM1 may be incomplete.  When this step was stopped there were 0 observations and 2 variables.

 

So one thought is to split up the larger datasets into smaller ones and do the analysis on each of the sub-sets

PaigeMiller
Diamond | Level 26

@taupirho wrote:

We are doing some numerical and frequency analysis a series of big datasets. A small data set would be 2.5 Gb and going up to 500Gb.

When I tried our analysis code on the one of the small datasets I got a memory error

 

Table XYZ (Number 1) Frequency Analysis began at Runtime 10:36:48. Rundate 11/07/2019.
Table XYZ (Number 1) has 537 Fields for Frequency Analysis
ERROR: PROC SUMMARY was terminated prematurely due to a memory shortage. Adjusting MEMSIZE and/or SUMSIZE may allow normal
       processing to complete.
ERROR: The SAS System stopped processing this step because of insufficient memory.
WARNING: The data set WORK.SUMM1 may be incomplete.  When this step was stopped there were 0 observations and 2 variables.

 

So one thought is to split up the larger datasets into smaller ones and do the analysis on each of the sub-sets


I consider this to be a valid reason to split up the dataset, but you still need to explain how you know how many variables you have, and how they will be split. 

 

PLEASE GIVE A REALISTIC EXAMPLE.

--
Paige Miller
Tom
Super User Tom
Super User

If you want to supply a list of variable names that you can use in SAS code it might be easier to supply them in a macro variable.

%let varlist=
 AGE
 HEIGHT
 WEIGHT
;
proc summary data=sashelp.class ;
 var &varlist;
run;

If want to get a list of variables in an existing dataset then you can ask SAS to tell you.

proc contents data=have out=contents; run;

If you want to put a list of names into a macro variable from a set of names in a dataset then you can use the INTO clause of SQL query.

select name into :varlist separated by ' '
from contents;

 

taupirho
Calcite | Level 5

Can you create a variable list "on the fly". Something like (forgive the syntax as I'm a newbie)

 

%let myvar=;

 

%do i 1 to 5;

myvar = myvar || "table_" &i.;

 

%let varlist = &myvar.;

Tom
Super User Tom
Super User

If you want a variable list where the names end in numbers then no need to list them all.  Just use a variable list in your code.

proc means data=have;
  var table_1 - table_5 ;
run;
PaigeMiller
Diamond | Level 26

If @Tom's reply doesn't quite answer your question, then please give a realistic example of the situation and what you expect.

--
Paige Miller
taupirho
Calcite | Level 5

Ok, I want to split some datasets into a number of smaller sub-sets. The number of sub-sets will vary depending on the size of the input dataset.

 

I then want to set up a loop for each  sub-set and process it. I don't know in advance how many sub-sets there will be, only after the splitting has happened. In psuedocode,

 

Split dataset1 

   Produces  5 sub-sets say

   for i = 1 to 5

       process subset_i

 

 

Split dataset2

   Produces  100 sub-sets say

   for x = 1 to 100

       process subset_x

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 1302 views
  • 1 like
  • 5 in conversation