BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
texasmfp
Lapis Lazuli | Level 10

Just ran 6546 combinations in 41 minutes (need a faster computer!).  I learned a lot from this discussion. 

 

Thanks to all.

Kurt_Bremser
Super User

@texasmfp wrote:
Thanks Tom. I have no idea what you just said. Subsetting, scalable are meaningless terms to me. I am a novice.

Subsetting: taking a part out of a greater amount of data, which may be variables within an observation, or a dataset. In SAS parlance, "subsetting" is often used in "subsetting if", which is this construct:

if <condition>;

an "if" statement without a "then". It means that if the condition is not met, the rest of the data step iteration (including any output) is skipped. This is used similar to a where condition, but can use variables created in the data step.

 

Scalable: means that a process will work for any conceivable amount of data, with no consequences except more resource usage and longer run times. In your case, the process did not scale, because you used a very limited tool (64K is really small nowadays) along the way.

ballardw
Super User

An example using a data set you should have access to.

 

First create a data set of the variables whose values you want to keep (or exclude about the same difficulty).

This data set will contain a value that I know does not actually exist in the data set I am going to extract from:

data work.names;
    length name  $ 8;
    input name;
datalines;
Alice
Judy
Henry
Jacques
;
run;

Now use proc sql to find the values that have matching names.

data work.names;
    length name  $ 8;
    input name;
datalines;
Alice
Judy
Henry
Jacques
;
run;

If you look at the data set work.want you will find that only records with matching values of Name are in the output.

Which is exactly what your example no longer working data step:

DATA DATA;
SET DATA1;
IF ISO IN (%unquote(&val));
RUN;

does. I admit that Proc Sql is likely to change the order of the records but since your current code doesn't work at all I think that might be a small trade off.

 

This is one exactly one of the methods to get around the limitation of the macro "list" approach. The data set work.names could contain millions of records.

 

If there is something related to what goes on inside the macro TARGETANALYSIS then you have to share the code for that macro as saying an approach does not work with out details as to why is pretty pointless.

 

If you need to call some code for every value then CALL Execute with the data set will do that as well.

texasmfp
Lapis Lazuli | Level 10

"If you need to call some code for every value then CALL Execute with the data set will do that as well."

 

ballardw,  Thanks.  I think that sums up the situation as of now.  I am interested in understanding the Call execute function. 

 

I may have been unclear in originally explaining the goal and function of my original programming. I have a database with data for 35 countries.  Each country is identified by a 3-character field (ISO).  For example 'DEU' is Germany.  I want to run a regression analysis of all countries, but I also want to determine if I get a better result using only a subset of the 35.  Lets say I want to test all combinations of 28 countries from within the master dataset of 35.  That is 6.7 million possible combinations (order does not matter).  Today, I created a program that generates all 6.7 million combinations as a text string.  For example:

 

'IND','KOR','TUR','AUS','AUT','BRA','CAN','CHL','CHN','CZE','DNK','EST','FIN','FRA','DEU','GRC','HUN','ISL','IDN','IRL','ISR','ITA','JPN',

'FIN','FRA','DEU','NZL','NOR'

 

My program generating the combination strings is equivalent to your first datastep and, the text string above, would be one row in the

data work.names;

 

Using your example, I can manually take each of those rows in data.names and insert that string into the PROC SQL, or another one of the suggested datasteps, or my original program in place of the (%unquote(&val);

 

Doing so will create a subset of data, that runs through the regression analysis.  The results are captured in an appending database.  That is one loop.  I am good on programming all of those steps.   However, I can't manually enter the string of combinations 6.7 million times and, my old macro-based change list capped out after just a few hundred lines of those long character string combinations.

 

The only thing going on in the TARGET_ANALYSIS is 1) creating the subset, 2) regression analysis, 3) creating a database of results, which includes the combination that generated each result for easy identification.  In this last step, the %unquote(&val) is invoked again, so that it gets recorded along with the result.

 

Call Execute is interesting and it may be exactly what I need.  Can you tell me more about how I could use call execute to generate the character string values for  %unquote(&val) one at a time (followed by the regression analysis) and looping back for each of the 6.7 million rows in the list of combinations?  Thanks

 

 

 

 

 

ballardw
Super User

@texasmfp wrote:

"If you need to call some code for every value then CALL Execute with the data set will do that as well."

 

ballardw,  Thanks.  I think that sums up the situation as of now.  I am interested in understanding the Call execute function. 

 

I may have been unclear in originally explaining the goal and function of my original programming. I have a database with data for 35 countries.  Each country is identified by a 3-character field (ISO).  For example 'DEU' is Germany.  I want to run a regression analysis of all countries, but I also want to determine if I get a better result using only a subset of the 35.  Lets say I want to test all combinations of 28 countries from within the master dataset of 35.  That is 6.7 million possible combinations (order does not matter).  Today, I created a program that generates all 6.7 million combinations as a text string.  For example:

 

'IND','KOR','TUR','AUS','AUT','BRA','CAN','CHL','CHN','CZE','DNK','EST','FIN','FRA','DEU','GRC','HUN','ISL','IDN','IRL','ISR','ITA','JPN',

'FIN','FRA','DEU','NZL','NOR'

 

My program generating the combination strings is equivalent to your first datastep and, the text string above, would be one row in the

data work.names;

 

Using your example, I can manually take each of those rows in data.names and insert that string into the PROC SQL, or another one of the suggested datasteps, or my original program in place of the (%unquote(&val);

 

Doing so will create a subset of data, that runs through the regression analysis.  The results are captured in an appending database.  That is one loop.  I am good on programming all of those steps.   However, I can't manually enter the string of combinations 6.7 million times and, my old macro-based change list capped out after just a few hundred lines of those long character string combinations.

 

The only thing going on in the TARGET_ANALYSIS is 1) creating the subset, 2) regression analysis, 3) creating a database of results, which includes the combination that generated each result for easy identification.  In this last step, the %unquote(&val) is invoked again, so that it gets recorded along with the result.

 

Call Execute is interesting and it may be exactly what I need.  Can you tell me more about how I could use call execute to generate the character string values for  %unquote(&val) one at a time (followed by the regression analysis) and looping back for each of the 6.7 million rows in the list of combinations?  Thanks

 


Since you say you have code to generate the combinations then show us that.

And we would NOT be creating any macro variable but an actual Where (or IF though where usually is faster)  statement like

if Iso in ('IND' 'KOR' 'TUR' 'AUS' 'AUT' 'BRA' );

Here is an example to creates ONE such statement with a data set you should have that selects 3 names from SASHELP.CLASS and prints on those records.

/* these two steps genarate a data set with all the values of the 
   name variable as a single record to demonstrate
*/
Proc sort data=sashelp.class (obs=3) out=work.class;
   by name;
run;

proc transpose data=work.class out=work.trans (drop=_name_)
   prefix=Name
   ;
   var name ;
run;
/* an example of building a where statement for a proc call*/
data junk;
   set work.trans;
   array n Name: ;
   /* string variable to hold the values
   the name variable is 8 characters so need to add 2 for quotes and a space to separate
   3 values (maximum in this case) so 3*(8+2+1) = 33*/
   length nstr $33;
   do _i_=1 to dim(n);
      nstr= catx(' ',nstr,quote(strip(n[_i_])));
   end;
   call execute ("Title 'Names to use are "||nstr||"';");

   call execute ("Proc print data=sashelp.class noobs; where name in (");
   call execute (nstr);
   call execute ("); run;title;");
run;
   

Depending on exactly how you go about creating your combinations it may be easy or not to modify the above example.

Note two different ways of using the list with concatenating into a string on the fly in the Title statement or referencing alone in the body. The later leaves the where statement open from the previous call execute line and then closes afterward. The Log shows the generated code.

Or instead of call execute you can use PUT statements to write to a file and %include that file.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 19 replies
  • 1986 views
  • 2 likes
  • 4 in conversation