I currently use the following code to read a list of variable names
data list_of_tables ; infile datalines; input TABLE_NAME & $40.; datalines; TABLE_1 TABLE_2 ; run;
Lets say I had a variable number of TABLE_N's to read in like this ie in one run it would be 3, the next run it might be 5 etc ... . Is there a way to do this?
I feel like I'm missing the point of your question. If you want to read in five table names, you would use this code. So what am I not understanding?
data list_of_tables ;
infile datalines;
input TABLE_NAME & $40.;
datalines;
TABLE_1
TABLE_2
TABLE_3
TABLE_4
TABLE_5
;
run;
Let's say I had large datasets that I wanted to divide up into smaller datasets and then process each new sub-set of data. On one run the sub-division might give me 3 sub-sets, TABLE_1, TABLE_2, TABLE_3 . On another run it might give me 100 sub-sets, TABLE_1-100 etc ...
Are you trying to say that this is part of some larger activity, where sometimes there are 100 datasets and sometimes there are 600 datasets and sometimes there are 7 dataset, and you want this splitting to happen automatically? How would you know how many datasets there are? What are the rules to do the splitting?
Something like that. There are multiple datasets as input. Each input dataset is to be split up depending on its size.
Input dataset1 might end up being split up into 5 sub datasets, input dataset2 might be split up into 10 sub-datasets.
At the end of each input dataset splitting I'll know how many sub-sets it was split into. How do I then cycle through and get the name of each sub-set into a variable that I can use to process it.
@taupirho wrote:
Something like that. There are multiple datasets as input. Each input dataset is to be split up depending on its size.
Input dataset1 might end up being split up into 5 sub datasets, input dataset2 might be split up into 10 sub-datasets.
At the end of each input dataset splitting I'll know how many sub-sets it was split into. How do I then cycle through and get the name of each sub-set into a variable that I can use to process it.
So you know how many sub-sets it was split into, but how would any SAS code that we write know?
And why do you have to split the dataset at all?
Splitting datasets requires additional code for splitting, even more code to process the fragments and another amount of code to add everything together at the end.
@andreas_lds wrote:
And why do you have to split the dataset at all?
Splitting datasets requires additional code for splitting, even more code to process the fragments and another amount of code to add everything together at the end.
I agree with this, usually you would not want to do this.
The exception where you would want to do this is if you are parallelizing some very time consuming task, and you can run parts of this task on different remote computers simultaneously. So, @taupirho, are you in this situation, or not?
We are doing some numerical and frequency analysis a series of big datasets. A small data set would be 2.5 Gb and going up to 500Gb.
When I tried our analysis code on the one of the small datasets I got a memory error
Table XYZ (Number 1) Frequency Analysis began at Runtime 10:36:48. Rundate 11/07/2019.
Table XYZ (Number 1) has 537 Fields for Frequency Analysis
ERROR: PROC SUMMARY was terminated prematurely due to a memory shortage. Adjusting MEMSIZE and/or SUMSIZE may allow normal
processing to complete.
ERROR: The SAS System stopped processing this step because of insufficient memory.
WARNING: The data set WORK.SUMM1 may be incomplete. When this step was stopped there were 0 observations and 2 variables.
So one thought is to split up the larger datasets into smaller ones and do the analysis on each of the sub-sets
@taupirho wrote:
We are doing some numerical and frequency analysis a series of big datasets. A small data set would be 2.5 Gb and going up to 500Gb.
When I tried our analysis code on the one of the small datasets I got a memory error
Table XYZ (Number 1) Frequency Analysis began at Runtime 10:36:48. Rundate 11/07/2019.
Table XYZ (Number 1) has 537 Fields for Frequency Analysis
ERROR: PROC SUMMARY was terminated prematurely due to a memory shortage. Adjusting MEMSIZE and/or SUMSIZE may allow normal
processing to complete.
ERROR: The SAS System stopped processing this step because of insufficient memory.
WARNING: The data set WORK.SUMM1 may be incomplete. When this step was stopped there were 0 observations and 2 variables.
So one thought is to split up the larger datasets into smaller ones and do the analysis on each of the sub-sets
I consider this to be a valid reason to split up the dataset, but you still need to explain how you know how many variables you have, and how they will be split.
PLEASE GIVE A REALISTIC EXAMPLE.
If you want to supply a list of variable names that you can use in SAS code it might be easier to supply them in a macro variable.
%let varlist=
AGE
HEIGHT
WEIGHT
;
proc summary data=sashelp.class ;
var &varlist;
run;
If want to get a list of variables in an existing dataset then you can ask SAS to tell you.
proc contents data=have out=contents; run;
If you want to put a list of names into a macro variable from a set of names in a dataset then you can use the INTO clause of SQL query.
select name into :varlist separated by ' '
from contents;
Can you create a variable list "on the fly". Something like (forgive the syntax as I'm a newbie)
%let myvar=;
%do i 1 to 5;
myvar = myvar || "table_" &i.;
%let varlist = &myvar.;
If you want a variable list where the names end in numbers then no need to list them all. Just use a variable list in your code.
proc means data=have;
var table_1 - table_5 ;
run;
If @Tom's reply doesn't quite answer your question, then please give a realistic example of the situation and what you expect.
Ok, I want to split some datasets into a number of smaller sub-sets. The number of sub-sets will vary depending on the size of the input dataset.
I then want to set up a loop for each sub-set and process it. I don't know in advance how many sub-sets there will be, only after the splitting has happened. In psuedocode,
Split dataset1
Produces 5 sub-sets say
for i = 1 to 5
process subset_i
Split dataset2
Produces 100 sub-sets say
for x = 1 to 100
process subset_x
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.