I had a great prog running in SAS 9.3 for years that substituted a series of quoted strings in a changelist into another program, ran a regression analysis, appended the results to a datafile, and repeated that process for the next string.
The strings are various combinations of subsets of a main database. In a nutshell, the program tests which combinations of subsets give the best (or worst) results.
It worked great with a small string length and limited number of strings.
I greatly expanded the number of strings (from 1,000 to millions) and quickly got this message:
ERROR: The text expression length (65536) exceeds maximum length (65534). The text expression has been truncated to 65534 characters.
What alternatives can I try to get around this limitation?
Thoughts would be appreciated.
So before you where counting the number of lists in your list of lists.
Now you need to count the number of observations in your dataset of lists.
proc sql noprint;
select count(*) into :n
from testcomb2
;
quit;
Store your changelist in a dataset, and use any join method to work with subsets.
Also note that the subsetting if you used will have worse performance than using a where condition.
PS a GREAT program would be written in a way that scales until you simply run out of computing resources like disk space or RAM.
It is never a sustainable solution to store mass data in macro variables. For this, SAS provides datasets.
Assuming you construct a data set with a variable named Iso that holds the values you want to keep then that data step could be replaced with
proc sql; create table data as select b.* from datasetwithchangelistvalues as a left join data1 as b on a.iso = b.iso
where not missing(b.iso) ; run;
And unless you have a terribly feeble computer won't puke on several million values of ISO
Thanks ballardw . And yes, ISO is a field in Data1 and I can certainly call in ISO in your "datasetwithchangelistvalues". But, I am just using the ISO field as a selector to cull the data I want to run through the regression loop (ISO is just one of about 30 variables by a few thousand observations). I don't understand - probably because I have a limited knowledge. How does your suggested code result in the quoted changelist string created in the %macro test; which then gets inserted, one at a time, followed by a regression analysis, and then the regression analysis gets looped for the dataset created by culling the ISOs in the next quoted string value in changelist? Or are you suggesting that the solution is to abandon the entire %macro test; as well as the start of the %MACRO TARGET_ANALYSIS;?
Just do the subsetting differently.
So change the macro TARGET_ANALYSIS
to NOT do this:
DATA REGDATA;
SET REGDATA1;
IF ISO IN (%unquote(&val));
RUN;
And instead do something that is scalable.
Note that it is possible to create a macro (not a macro variable) that would emit a list of values from a dataset. But it is probably gong to be easier to just fix the places where you need to do the subsetting to use the dataset directly instead.
Those are just English words and mean what they sound like.
You are currently trying to stuff a lot of information into a macro variable. You have found out that macro variables only hold 64K bytes. So your solution didn't scale. As the amount of data grew the program could not handle it.
So change to using a DATASET to store the information. Take your %LET statement
%LET changelist
='1','2','3','4','5'
|'1','2','3','4','6'
|'1','2','3','4','7'
;
and change it into a DATA step.
So if the variable ISO is 10 characters long then the data step might look like this:
data changelist ;
length list_number 8 ISO $10 ;
list_number +1 ;
infile cards truncover ;
do until (iso=' ');
input iso @;
if iso ne ' ' then output;
end;
cards;
1 2 3 4 5
1 2 3 4 5 6 7
1 2 3 4
;
Then change your method for selecting what data to use to be based on the DATASET instead of the macro variable.
So the subset just of the first set of ISO values you could use:
...
proc sql ;
create table REGDATA as
select *
from regdata1
where iso in (select iso from changelist where list_number = 1)
;
quit;
Now if you want to select different subsets of the REGDATA you just need to change which value of LIST_NUMBER you select.
You could use the macro variable I that you had before in place of the hardcoded 1 in the code just above.
Note you might be able to eliminate the macro completely and just change your other steps use LIST_NUMBER as a BY variable in their processing.
Tom: thanks. The 1st part (creating the dataset of changelist values) works like a charm.
Here is the code I used that works great:
However, the 2nd part of your suggest code, creating a subset from the larger dataset, failed. Zero were selected.
Perhaps I was not clear, there is no ISO value of 1 2 3 4 or
1 2 3 4 5 6 7. Those are, in that case, four individual ISO values: an ISO of 1, of 2, of 3, and of 4. So the database currently has 10-years of data for each ISO and, there are 38 ISOs (a total of 380 lines).
In reality, the ISO is a 3-character string representing countries within the larger database. For example:
'IND','KOR','TUR','AUS','AUT','BRA','CAN','CHL','CHN','CZE','DNK','EST','FIN','FRA','DEU','GRC','HUN','ISL','IDN','IRL','ISR','ITA','JPN','FIN','FRA','DEU','NZL','NOR','POL','PRT','RUS','SVK','ESP','SWE','CHE','USA','GBR'
So when the line
where iso in (select iso from changelist where list_number = 1)
executes, the program should populated it with
In fact, I did it manually and the SQL code you suggested works great. I am calling that a partial solution.The REGDATA just created runs through a regression analysis for that particular combination of countries. I am calling that a partial solution.
However, that is one loop. 2nd loop uses a different combination of countries in the "where iso in" statement (i.e., the string in the 2nd observation in the changelist).
etc, etc until the last combination in the changelist is run. There are millions of combinations, so I can't do that manually.
The sticking point in the program now is to tell SAS to fill in that "where" statement automatically and run the regression with each subsequent combination.... millions of times without hitting that 65534 macro limit.
Instead of making each observation have one value of ISO you have made each observation have a LIST of ISO values.
So you have changed your limit from 64K to the full list of lists to 32K for any single list.
If you will never have a list of ISO values that is longer than 32K then your method might work with less modifications to your original program. Instead of using %SCAN() to pull one list from the larger list of list use CALL SYMPUTX() to push one observation from your dataset.
data _null_;
set changelist ;
where list_number = &i ;
call symputx('changelist',iso);
run;
Then you can continue to use
if iso in (&changelist);
Or, probably more correctly:
where iso in (&changelist);
Otherwise change your data step to read each ISO value, like the code I posted. Just add the DSD option to your INFILE statement.
data changelist ;
length list_number 8 ISO $280 ;
list_number +1 ;
infile cards dsd truncover ;
do until (iso=' ');
input iso @;
if iso ne ' ' then output;
end;
cards;
'IND','KOR','TUR','AUS','AUT','BRA','CAN','CHL','CHN','CZE','DNK','EST','FIN','FRA','DEU','GRC','HUN','ISL','IDN','IRL','ISR','ITA','JPN','FIN','FRA','DEU','NZL','NOR','POL','PRT','RUS','SVK','SVN','ESP','SWE','CHE','USA','GBR'|
'IND','KOR','TUR','AUS','AUT','BRA','CAN','CHL','CHN','CZE','DNK','EST','FIN','FRA','DEU','GRC','HUN','ISL','IDN','IRL','ISR','ITA','JPN','FIN','FRA','DEU','NZL','NOR','POL','PRT','RUS','SVK','SVN','ESP','SWE','CHE','USA'|
'IND','KOR','TUR','AUS','AUT','BRA','CAN','CHL','CHN','CZE','DNK','EST','FIN','FRA','DEU','GRC','HUN','ISL','IDN','IRL','ISR','ITA','JPN','FIN','FRA','DEU','NZL','NOR','POL','PRT','RUS','SVK','SVN','ESP','SWE','CHE','GBR'|
'IND','KOR','TUR','AUS','AUT','BRA','CAN','CHL','CHN','CZE','DNK','EST','FIN','FRA','DEU','GRC','HUN','ISL','IDN','IRL','ISR','ITA','JPN','FIN','FRA','DEU','NZL','NOR','POL','PRT','RUS','SVK','SVN','ESP','SWE','USA','GBR'|
'IND','KOR','TUR','AUS','AUT','BRA','CAN','CHL','CHN','CZE','DNK','EST','FIN','FRA','DEU','GRC','HUN','ISL','IDN','IRL','ISR','ITA','JPN','FIN','FRA','DEU','NZL','NOR','POL','PRT','RUS','SVK','SVN','ESP','CHE','USA','GBR'|
'IND','KOR','TUR','AUS','AUT','BRA','CAN','CHL','CHN','CZE','DNK','EST','FIN','FRA','DEU','GRC','HUN','ISL','IDN','IRL','ISR','ITA','JPN','FIN','FRA','DEU','NZL','NOR','POL','PRT','RUS','SVK','SVN','SWE','CHE','USA','GBR'|
'IND','KOR','TUR','AUS','AUT','BRA','CAN','CHL','CHN','CZE','DNK','EST','FIN','FRA','DEU','GRC','HUN','ISL','IDN','IRL','ISR','ITA','JPN','FIN','FRA','DEU','NZL','NOR','POL','PRT','RUS','SVK','ESP','SWE','CHE','USA','GBR'|
;
Tom, thanks. Indeed, the string will likely never exceed 300 characters, let alone 32K, so making minimal changes is attractive. However, when I tried your code I got an error message when SAS tried to evaluate &i. Note that my dataset of combinations is called testcomb2 and the variable is called combos2 rather than iso.
when I run it, I get the following error:
1 data _null_; set testcomb2; where list_number = &i ; call symputx('changelist',combos2); run;
-
22
1 ! %US13_TARGET_ANALYSIS;
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, a numeric constant,
a datetime constant, a missing value, INPUT, PUT.
NOTE: Line generated by the invoked macro "TEST".
1 data _null_; set testcomb2; where list_number = &i ; call symputx('changelist',combos2); run;
-
76
1 ! %US13_TARGET_ANALYSIS;
ERROR 76-322: Syntax error, statement will be ignored.
WARNING: Apparent symbolic reference I not resolved.
MPRINT(TEST): where list_number = &i ;
ERROR: Syntax error while parsing WHERE clause.
MPRINT(TEST): call symputx('changelist',combos2);
MPRINT(TEST): run;
almost there, it looks like the counter step is not functioning. Note that I dropped the "|" stop delimiter from the character string, since we moved to a dataset.
I get this error message:
13341 %test
MLOGIC(TEST): Beginning execution.
MLOGIC(TEST): %LET (variable name is N)
WARNING: Apparent symbolic reference CHANGELIST not resolved.
WARNING: Apparent symbolic reference CHANGELIST not resolved.
SYMBOLGEN: Macro variable N resolves to 0
MLOGIC(TEST): %DO loop beginning; index variable I; start value is 1; stop value is 0; by value is 1. Loop will not
be executed.
MPRINT(TEST): */
MLOGIC(TEST): Ending execution.
So before you where counting the number of lists in your list of lists.
Now you need to count the number of observations in your dataset of lists.
proc sql noprint;
select count(*) into :n
from testcomb2
;
quit;
YAHTZEE!!!! Much thanks Tom.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.