Solved: Re: Great PROG now useless after hitting 65534 limit - Page 2

texasmfp · Posted 10-23-2019 06:11 PM

Just ran 6546 combinations in 41 minutes (need a faster computer!). I learned a lot from this discussion.

Thanks to all.

Kurt_Bremser · Posted 10-23-2019 02:14 AM

@texasmfp wrote:
Thanks Tom. I have no idea what you just said. Subsetting, scalable are meaningless terms to me. I am a novice.

Subsetting: taking a part out of a greater amount of data, which may be variables within an observation, or a dataset. In SAS parlance, "subsetting" is often used in "subsetting if", which is this construct:

if <condition>;

an "if" statement without a "then". It means that if the condition is not met, the rest of the data step iteration (including any output) is skipped. This is used similar to a where condition, but can use variables created in the data step.

Scalable: means that a process will work for any conceivable amount of data, with no consequences except more resource usage and longer run times. In your case, the process did not scale, because you used a very limited tool (64K is really small nowadays) along the way.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 10-22-2019 07:16 PM

An example using a data set you should have access to.

First create a data set of the variables whose values you want to keep (or exclude about the same difficulty).

This data set will contain a value that I know does not actually exist in the data set I am going to extract from:

data work.names;
    length name  $ 8;
    input name;
datalines;
Alice
Judy
Henry
Jacques
;
run;

Now use proc sql to find the values that have matching names.

data work.names;
    length name  $ 8;
    input name;
datalines;
Alice
Judy
Henry
Jacques
;
run;

If you look at the data set work.want you will find that only records with matching values of Name are in the output.

Which is exactly what your example no longer working data step:

DATA DATA;
SET DATA1;
IF ISO IN (%unquote(&val));
RUN;

does. I admit that Proc Sql is likely to change the order of the records but since your current code doesn't work at all I think that might be a small trade off.

This is one exactly one of the methods to get around the limitation of the macro "list" approach. The data set work.names could contain millions of records.

If there is something related to what goes on inside the macro TARGETANALYSIS then you have to share the code for that macro as saying an approach does not work with out details as to why is pretty pointless.

If you need to call some code for every value then CALL Execute with the data set will do that as well.

texasmfp · Posted 10-22-2019 09:28 PM

"If you need to call some code for every value then CALL Execute with the data set will do that as well."

ballardw, Thanks. I think that sums up the situation as of now. I am interested in understanding the Call execute function.

I may have been unclear in originally explaining the goal and function of my original programming. I have a database with data for 35 countries. Each country is identified by a 3-character field (ISO). For example 'DEU' is Germany. I want to run a regression analysis of all countries, but I also want to determine if I get a better result using only a subset of the 35. Lets say I want to test all combinations of 28 countries from within the master dataset of 35. That is 6.7 million possible combinations (order does not matter). Today, I created a program that generates all 6.7 million combinations as a text string. For example:

'IND','KOR','TUR','AUS','AUT','BRA','CAN','CHL','CHN','CZE','DNK','EST','FIN','FRA','DEU','GRC','HUN','ISL','IDN','IRL','ISR','ITA','JPN',

'FIN','FRA','DEU','NZL','NOR'

My program generating the combination strings is equivalent to your first datastep and, the text string above, would be one row in the

data work.names;

Using your example, I can manually take each of those rows in data.names and insert that string into the PROC SQL, or another one of the suggested datasteps, or my original program in place of the (%unquote(&val);

Doing so will create a subset of data, that runs through the regression analysis. The results are captured in an appending database. That is one loop. I am good on programming all of those steps. However, I can't manually enter the string of combinations 6.7 million times and, my old macro-based change list capped out after just a few hundred lines of those long character string combinations.

The only thing going on in the TARGET_ANALYSIS is 1) creating the subset, 2) regression analysis, 3) creating a database of results, which includes the combination that generated each result for easy identification. In this last step, the %unquote(&val) is invoked again, so that it gets recorded along with the result.

Call Execute is interesting and it may be exactly what I need. Can you tell me more about how I could use call execute to generate the character string values for %unquote(&val) one at a time (followed by the regression analysis) and looping back for each of the 6.7 million rows in the list of combinations? Thanks

ballardw · Posted 10-23-2019 12:27 PM

@texasmfp wrote:
"If you need to call some code for every value then CALL Execute with the data set will do that as well."

ballardw, Thanks. I think that sums up the situation as of now. I am interested in understanding the Call execute function.

I may have been unclear in originally explaining the goal and function of my original programming. I have a database with data for 35 countries. Each country is identified by a 3-character field (ISO). For example 'DEU' is Germany. I want to run a regression analysis of all countries, but I also want to determine if I get a better result using only a subset of the 35. Lets say I want to test all combinations of 28 countries from within the master dataset of 35. That is 6.7 million possible combinations (order does not matter). Today, I created a program that generates all 6.7 million combinations as a text string. For example:

'IND','KOR','TUR','AUS','AUT','BRA','CAN','CHL','CHN','CZE','DNK','EST','FIN','FRA','DEU','GRC','HUN','ISL','IDN','IRL','ISR','ITA','JPN',

'FIN','FRA','DEU','NZL','NOR'

My program generating the combination strings is equivalent to your first datastep and, the text string above, would be one row in the
data work.names;
Using your example, I can manually take each of those rows in data.names and insert that string into the PROC SQL, or another one of the suggested datasteps, or my original program in place of the (%unquote(&val);

Doing so will create a subset of data, that runs through the regression analysis. The results are captured in an appending database. That is one loop. I am good on programming all of those steps. However, I can't manually enter the string of combinations 6.7 million times and, my old macro-based change list capped out after just a few hundred lines of those long character string combinations.

The only thing going on in the TARGET_ANALYSIS is 1) creating the subset, 2) regression analysis, 3) creating a database of results, which includes the combination that generated each result for easy identification. In this last step, the %unquote(&val) is invoked again, so that it gets recorded along with the result.

Call Execute is interesting and it may be exactly what I need. Can you tell me more about how I could use call execute to generate the character string values for %unquote(&val) one at a time (followed by the regression analysis) and looping back for each of the 6.7 million rows in the list of combinations? Thanks

Since you say you have code to generate the combinations then show us that.

And we would NOT be creating any macro variable but an actual Where (or IF though where usually is faster) statement like

if Iso in ('IND' 'KOR' 'TUR' 'AUS' 'AUT' 'BRA' );

Here is an example to creates ONE such statement with a data set you should have that selects 3 names from SASHELP.CLASS and prints on those records.

/* these two steps genarate a data set with all the values of the 
   name variable as a single record to demonstrate
*/
Proc sort data=sashelp.class (obs=3) out=work.class;
   by name;
run;

proc transpose data=work.class out=work.trans (drop=_name_)
   prefix=Name
   ;
   var name ;
run;
/* an example of building a where statement for a proc call*/
data junk;
   set work.trans;
   array n Name: ;
   /* string variable to hold the values
   the name variable is 8 characters so need to add 2 for quotes and a space to separate
   3 values (maximum in this case) so 3*(8+2+1) = 33*/
   length nstr $33;
   do _i_=1 to dim(n);
      nstr= catx(' ',nstr,quote(strip(n[_i_])));
   end;
   call execute ("Title 'Names to use are "||nstr||"';");

   call execute ("Proc print data=sashelp.class noobs; where name in (");
   call execute (nstr);
   call execute ("); run;title;");
run;

Depending on exactly how you go about creating your combinations it may be easy or not to modify the above example.

Note two different ways of using the list with concatenating into a string on the fly in the Title statement or referencing alone in the body. The later leaves the where statement open from the previous call execute line and then closes afterward. The Log shows the generated code.

Or instead of call execute you can use PUT statements to write to a file and %include that file.

Re: Great PROG now useless after hitting 65534 limit

Re: Great PROG now useless after hitting 65534 limit

Re: Great PROG now useless after hitting 65534 limit

Re: Great PROG now useless after hitting 65534 limit

Re: Great PROG now useless after hitting 65534 limit

SAS Innovate 2025: Call for Content