turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- How to use macro to create a series of logistic re...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-06-2017 12:09 PM

Dear All,

I would like to generate hundreds of logistic regressions. My data includes a dependent variable and hundreds of quantitative variables that in the format of "_"+random numbers, it looks like this:

Dependent variable _123 _234 _341 _234 .........

0 23 12 0 1

1 45 48 9 12

1 0 23 6 23

0 89 12 34 7

1 .......

0

0

1

0

So I would like to have hundreds of logistic regression that with each of the _XXX variables, maybe similar like this, but I'm not sure how to put it in the macro

proc logistic; model dependent variable=_XXX; run;

Appreciate any advice! Thank you so much!!

Accepted Solutions

Solution

11-06-2017
01:25 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wwendy

11-06-2017 12:51 PM

Here's one approach. It relies on using any and all variable names that begin with "_" as an independent variable:

proc contents data=have noprint out=varnames (keep=name);

run;

data _null_;

set varnames;

where name =: '_';

call execute ('proc logistic data=have; model DEPENDENT = ' || name || '; run;');

run;

You'll need to replace the word DEPENDENT with the actual name of your dependent variable.

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wwendy

11-06-2017 12:22 PM

Don't. Transpose your data and use BY group processing instead. Then all your results can be captured into a single data set as well.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wwendy

11-06-2017 12:25 PM - edited 11-06-2017 12:26 PM

Hi @wwendy!

You can easily make a macro for what you are talking about, but if your variables are not numbered sequentially you will have to generate a macro call for each one, like this:

```
data example;
input dependent_var _123 _234 _341;
datalines;
0 23 12 0
1 45 48 9
1 0 23 6
0 89 12 34
;
run;
%macro regression(var);
proc logistic data=example;
model dependent_var = _&var;
run;
%mend;
%regression(123);
%regression(234);
%regression(341);
```

However, if your variables were numbered sequentially for example, from 1 to 100, you could generate the regression code like this:

```
%macro regression(numVars);
%do i = 1 %to &numVars;
proc reg data=example;
model dependent_var = _&i;
run;
%end;
%mend;
%regression(100);
```

Hope that helps!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to OliviaWright

11-06-2017 01:16 PM

Thank you for your suggestion, unfortunately my variables are not numbered sequentially and it's so time consuming to write it one by one, since I have hundreds of it. Are you familiar with how to change my variables' name to sequence? I really appreciated!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wwendy

11-06-2017 01:18 PM

In that case, I think it might be helpful to use @Astounding's suggestion.

Solution

11-06-2017
01:25 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wwendy

11-06-2017 12:51 PM

Here's one approach. It relies on using any and all variable names that begin with "_" as an independent variable:

proc contents data=have noprint out=varnames (keep=name);

run;

data _null_;

set varnames;

where name =: '_';

call execute ('proc logistic data=have; model DEPENDENT = ' || name || '; run;');

run;

You'll need to replace the word DEPENDENT with the actual name of your dependent variable.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

11-06-2017 12:56 PM - edited 11-06-2017 01:00 PM

I consider the approach of performing many logistic regressions and picking the best to be extremely suboptimal (and that's ignoring the programming issues stated here) to the point you could easily be misled by the results. Any time you have hundreds of variables, they will be correlated with one another, and this causes logistic regression (and ordinary least squares regression) to provide parameter estimates and predicted values that have HUGE variances, to the point where you can get the wrong sign on a model parameter estimate.

A better approach is to use a modeling method that performs better in the presence of large numbers of correlated variables. That method is called Partial Least Squares regression — in SAS, it is PROC PLS. This method produces a model which is less susceptible to correlation between the variables, and it produces model coefficients and predicted values with much smaller root mean square errors than regression or logistic regression.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

11-06-2017 01:19 PM

Thank you for your reply. Unfortunately I would like to have many separate regressions that are generated with each of the_XXX variable.

Since I have hundreds of _XXX variables, I will have hundreds of logistic regressions. Do you have some experience with it?

Really appreciate your help!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wwendy

11-06-2017 01:24 PM

I would like to have many separate regressions that are generated with each of the_XXX variable.

PROC PLS makes this unnecessary. It is one model that has ALL of your input variables; the variables that are not predictive of your response will get very low weights, and PLS still produces models with the lower mean square error of parameter estimates that I mentioned above.

And, it's a bazillion (that's a technical term) times easier than doing hundreds of logistic regressions.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

11-07-2017 08:00 AM

PROC PLS can do Logistic Regression ? Can you show me an example ?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

11-07-2017 08:01 AM - edited 11-07-2017 08:01 AM

You use a binary response variable.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

11-07-2017 08:15 AM

How do define which level to model ? like :

model sex(event='F')= .....

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

11-07-2017 08:16 AM

And logistic regression is using MLE , but PLS is using OLS .

OLS could apply to logistic regression ?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

11-07-2017 08:39 AM

PLS is not using OLS. It is using Partial Least Squares, a completely different algorithm.

The response variable takes on values 0 or 1.

Logistic regression is a modeling method that uses continuous x-variables to predict binary (or multi-nomial responses). PLS with binary responses is a modeling method that uses continuous x-variables to predict binary (or multi-nomial responses). So far, they are identical. However, under the hood, they are different algorithms, and will not produce the same answers. However, PLS is less susceptible to the problem of collinearity among the x-variables, and so will produce models that fit better (lower mean square error of regression coefficients and lower mean square error of predicted values).

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

11-07-2017 08:47 AM

But I don't find any example in PROC PLS 's documentation.

Can you show an example to do logistic regression?

take SASHELP.CLASS as an example, and I want modl sex='M' ?