I really could use some help in a task I have:
I have an initial global variable:
%let list_predictor= var1 var2;
I also have a database with 200 variables(let's say they are: var1 var2 var3 ... var200).
I have written a macro that returns, for each variable, a database that has a "percent" variable with 10 observations.
I want to be able to add the variables, for which percent>10, to &list_predictor.
If percent <10 then the global variable should retain its anterior form.
For example:
in data "percent_var3" I have the variable "percent" that contains 10 values from 1 to 15.
I see that at least one value > 10 so I want to add "var3" to the list of global variables => &list_predictor = var1 var2 var3
For data "percent_var4" the values are all < 10, so &list_predictor should stay = var1 var2 var3
and so on ...
Considering that percent has 10obs, it is possible that more than 1obs is >10.
If this should be the case I don't need the global variable to recover the name of the variable multiple times.
So a double condition must be introduced: after sorting the database, last.percent and percent>10 should both be valid.
But consider that this code should stay inside of my macro and data step is compromised, since %if ... %then can't function as it usually does.
I really need some help, I have been concentrating on this the entire day and I have remained clueless.
Thank you in advance
Glad you got it working. Sorry, typo in my symput suggestion. There should NOT be an ampersand in the first argument. Should have been:
data _null_;
set percent_var (where=(default_rate>10));
call symput("list_predictor","&list_predictor &varname");
stop;
run;
Idea of that step is take the resolved text string: "&list_predictor &varname" and put in the macro variable List_Predictor.
It's like:
%let list_predictor=&list_predictor &varname;
but doing it with call symput to make it conditional on existence of a record.
But that is a non-traditional use of symput.
Your adaptation of my suggestion looks good. Basically in the macro you are building the big dataset with all of your information on candidate predictors. Often, once you have that sort of dataset, it proves useful for other things. So generally there are many benefits to building such control datasets, in a long and skinny format.
It seems you quite some logic here to take on, and I don't really see the what you are trying to achieve. Perhaps you could tell us more of the underlying requirement?
Just a hunch, but often what this kind of problem occur, it's because the data structure is not optimal. Often transposing data (columns to rows) let do more dynamic programming without complicated macro looping algorithms.
Well I have a database with more than 200 variables and I have computed for each one of them a default rate per quantile. If the default rate is superior to a number I choose I want to keep that variable because it is a good predictor. The default rates are calculated within a macro and each time this macro computes for each variable it creates a "percent_var" table which will be replaced with the default rates of the next variable computed. So no, I don't have 200 tables with 10 obs, each. I only have one that will replace its values every time I compute the macro.
Basically, I want to store all the variables that are good predictors (for which the default rates are high enough) in a single global variable because afterwards I have some more conditions to impose to the selected variables and I don't want to write them by hand, because the database and the list of good predictors will change monthly and I don't want to change the sas program every time I have new data.
So the global variable will help me in the next data step where I get to keep only those that are good predictors.
What exactly do you need help with?
It sounds like you have a process in mind, something like the following.
1. Calculate default rate per quantile per variable
2. Check if default rate is more than set amount
3. Store variables with the default rate > a set amount
2/3 depend on how you do one, so how are you doing that part?
The first 2 steps are done. I only have the third one
Well guys, I have solved it in a not-so elegant manner. Quentin, thank you, I have taken your idea and adapted it:
/* macro */
data percent_var;
length varname $ 20;
set percent_var;
varname="&var";
run;
proc append base=predictors data=percent_var (where=(percent>10)) nowarn force;
run;
/* end macro */
proc sql noprint;
select distinct(varname) into :list_predictors separated by " "
from predictors;
quit;
%put &list_predictors;
Hi,
Agree with @LinusH regarding thinking about the data structures to make this easier.
Are you saying now you have 200 data sets percent_var1 percent_var2... percent_var200 ? And each has 10 records? If you concatenated them all together, could you make a dataset like below:
data have; input VarNum $ Percent; cards; Var1 12 Var1 16 Var1 12 Var1 11 Var1 16 Var2 9 Var2 6 Var2 4 Var2 6 Var2 6 Var3 12 Var3 1 Var3 2 Var3 3 Var3 4 ; run;
If so, then I think you could get your variable list with just:
86 proc sql noprint; 87 select distinct(VarNum) into :listPredictor separated by " " 88 from have 89 where Percent>10 90 ; 91 quit; 92 93 %put &listpredictor; Var1 Var3
HTH,
-Q.
Your idea is good, but doesn't apply here ... Basically I don't have a variable called varnum that contains the name of the actual variable. I only have "default rate" and "Percentile_name". But since I am in a macro and I run it, let's say for "var1" I can use &var1 as a reference for the command I should give for the global variable.
Hi, it feels a bit like I'm shooting in the dark, but I'll shoot again. But agree with Reeza, would be easier to help if you gave more information/ sample data.
It sounds like you have a macro that is looping over the variables in a dataset. (already sounds odd, but let's proceed).
During each macro loop, lets say you have macro variable &varname which is the name of the current variable (predictor) you are testing. And after processing that variable, you end up with a dataset percent_var which has a variable named default_rate and 10 observations. If any record in that dataset has a value of default_rate>10, then you want to insert &varname into your list of interesting predictors. If that's close, in your macro loop you could do something like (untested):
data _null_;
set percent_var (where=(default_rate>10));
call symput("&list_predictor","&list_predictor &varname");
stop;
run;
Idea being that that step would run once for each potential predictor you are testing. If there is a record where default_rate is greater than 10, the call symput statement will run, appeding the name of the predictor (&varname) to your list of interesting predictors (&list_predictor). If there are no records with default_rate>10, then the call symput statement will not execute, because the step will end when the set statement executes.
Quentin, you've perfectly understood what i'm trying to do. Unfortunately, the code you provided isn't working. I get the following error:
ERROR: Symbolic variable name SIREN DEFAULT_RATE must contain only letters, digits, and underscores. /* siren default_rate - are the 2 initial variables in my &list_predictor */
NOTE: Argument 1 to function SYMPUTX at line 11 column 155 is invalid.
percentile_name=0-10 default_rate=1 COUNT=2801 PERCENT=5.3384920332 _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.PERCENT_VAR.
WHERE percent>10;
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
It might be because of this: "&list_predictor &varname"
I think it should be written in some other way.
Plus I have read that to be able to use an already existing global variable in a call symput/x routine it should afterwards be specified: list_predictor=resolve('&list_predictor')
I have tried it but doesn't work.
Something is missing.
I'm sure it's a call symputx job, but i'm not very sure how to "tell it what i want to be done"
Although I have incorporated th
Glad you got it working. Sorry, typo in my symput suggestion. There should NOT be an ampersand in the first argument. Should have been:
data _null_;
set percent_var (where=(default_rate>10));
call symput("list_predictor","&list_predictor &varname");
stop;
run;
Idea of that step is take the resolved text string: "&list_predictor &varname" and put in the macro variable List_Predictor.
It's like:
%let list_predictor=&list_predictor &varname;
but doing it with call symput to make it conditional on existence of a record.
But that is a non-traditional use of symput.
Your adaptation of my suggestion looks good. Basically in the macro you are building the big dataset with all of your information on candidate predictors. Often, once you have that sort of dataset, it proves useful for other things. So generally there are many benefits to building such control datasets, in a long and skinny format.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.