I have 2 datasets; an original dataset formatted in SAS (5M x 10000), and an Excel sheet which contains variables' names and their statistics summary (10000 x 7). I want to compute around 5000 single-variable regression models of the form Y=Xi, where i = 1 to 5000. Those 5000 variables are selected by conditioning on particular columns (like if 5 < nLevels < 40 AND nMissing < 10000) in the Excel sheet. My variables' names do not have a common pattern.
I could find the code for computing many single-variable regression models, but I don't know how to list the subset of variables by condition from Excel sheet:
/* 1. transpose from wide (Y, X1 ,...,X10000) to long (varNum VarName Y Value) */ data Long; set Wide; array x [*] x1-x&nCont; /* <== I want this list to be read from Excel sheet
by conditioning */ do varNum = 1 to dim(x); VarName = vname(x[varNum]); /* variable name in char var */ Value = x[varNum]; /* value for each variable for each obs */ output; end; drop x:; run;
/* 2. Sort by BY-group variable */ proc sort data=Long; by VarName; run;
/* 3. Call PROC REG and use BY statement to compute all regressions */ proc reg data=Long noprint outest=PE; by VarName; model Y = Value; quit; /* Look at the results */ proc print data=PE(obs=5); var VarName Intercept Value; run;
I'm very new to SAS and any help would be greatly appreciated!
Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html
You can very easy build a dynamic list but see the shortcut approaches above to see if any work for you to fix your array statement.
The other method is to query sashelp.vcolumns or dictionary.colums which has the metadata about your data set.
@mh2t wrote:
I have 2 datasets; an original dataset formatted in SAS (5M x 10000), and an Excel sheet which contains variables' names and their statistics summary (10000 x 7). I want to compute around 5000 single-variable regression models of the form Y=Xi, where i = 1 to 5000. Those 5000 variables are selected by conditioning on particular columns (like if 5 < nLevels < 40 AND nMissing < 10000) in the Excel sheet. My variables' names do not have a common pattern.
I could find the code for computing many single-variable regression models, but I don't know how to list the subset of variables by condition from Excel sheet:
/* 1. transpose from wide (Y, X1 ,...,X10000) to long (varNum VarName Y Value) */ data Long; set Wide; array x [*] x1-x&nCont; /* <== I want this list to be read from Excel sheet
by conditioning */ do varNum = 1 to dim(x); VarName = vname(x[varNum]); /* variable name in char var */ Value = x[varNum]; /* value for each variable for each obs */ output; end; drop x:; run;/* 2. Sort by BY-group variable */ proc sort data=Long; by VarName; run;/* 3. Call PROC REG and use BY statement to compute all regressions */ proc reg data=Long noprint outest=PE; by VarName; model Y = Value; quit; /* Look at the results */ proc print data=PE(obs=5); var VarName Intercept Value; run;I'm very new to SAS and any help would be greatly appreciated!
Thank you @Reeza for your response. How can I read variables from Excel sheet in proc sql?
proc sql noprint; select Variable into :MissingVarList separated by ' ' from MissingValues /* <=== should be read from Excel file */ where NMiss > 0; quit; %put &=MissingVarList;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.