Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- r/n logistic regression in SAS EM

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-27-2010 04:56 PM

Hi All,

As we know PROC LOGISTIC allows the input of binary response data that are grouped:

proc logistic;

model r/n=x1 x2;

run;

Here, n represents the number of trials and r represents the number of events.

Does anyone know if it is possible to run the previous example in EM? Do you know how to do it?

Thanks for your help,

Reynaldo

As we know PROC LOGISTIC allows the input of binary response data that are grouped:

proc logistic;

model r/n=x1 x2;

run;

Here, n represents the number of trials and r represents the number of events.

Does anyone know if it is possible to run the previous example in EM? Do you know how to do it?

Thanks for your help,

Reynaldo

Accepted Solutions

Solution

Monday

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Monday

The short answer is you can get essentially the same results but you need to structure your data somewhat differently. Instead of having only one row of data for each observation in your original data, you will have two rows of data -- one row containing the number of events and one rpw containing the number of non-events. I'll explain this in more detail below:

The scenario you are describing involves using the events/trials syntax of the LOGISTIC (or GENMOD) procedure. Consider the first example data set in the documentation for the GENMOD procedure which includes a character input variable (drug), an interval input variable (x), the number of events (r), and the number of trials (n) for each observation.

/*** BEGIN GENMOD DOCUMENTATION EXCERPT ***/

The following DATA step creates the data set:

data drug; input drug$ x r n @@; datalines; A .1 1 10 A .23 2 12 A .67 1 9 B .2 3 13 B .3 4 15 B .45 5 16 B .78 5 13 C .04 0 10 C .15 0 11 C .56 1 12 C .7 2 12 D .34 5 10 D .6 5 9 D .7 8 10 E .2 12 20 E .34 15 20 E .56 13 15 E .8 17 20 ;

/*** END GENMOD DOCUMENTATION EXCERPT ***/

which generates data that looks like the following:

drug |
x |
r |
n |

A | 0.1 | 1 | 10 |

A | 0.23 | 2 | 12 |

A | 0.67 | 1 | 9 |

B | 0.2 | 3 | 13 |

B | 0.3 | 4 | 15 |

B | 0.45 | 5 | 16 |

B | 0.78 | 5 | 13 |

C | 0.04 | 0 | 10 |

C | 0.15 | 0 | 11 |

C | 0.56 | 1 | 12 |

C | 0.7 | 2 | 12 |

D | 0.34 | 5 | 10 |

D | 0.6 | 5 | 9 |

D | 0.7 | 8 | 10 |

E | 0.2 | 12 | 20 |

E | 0.34 | 15 | 20 |

E | 0.56 | 13 | 15 |

E | 0.8 | 17 | 20 |

You could use the events/trials syntax with the LOGISTIC procedure to analyze this data set as follows:

proc logistic data=drug; class drug; model r/n = x drug; run;

To analyze this same data set in SAS Enterprise Miner, you would need to take the following steps:

1. Create a new data set containing two rows for each row in the original data set (e.g. append the data set to itself).

2. Replace the value of **r** (# of events) in each duplicate row with the number of nonevents (**n - r**)

3. Rename the column **r** to **freq** (or just specify it as a frequency variable in SAS Enterprise Miner)

4. Add another column named **target** which contains a 1 if the row contains the number of events and a 0 if the row contains the number of nonevents.

5. Drop the column **n** containing the total number of events (it is not used directly in the analysis)

6. Add the modified data set as an Input Data Source in SAS Enterprise Miner being sure to specify the following variable information:

drug = nominal input variable role

x = interval input variable role

freq = frequency variable role

target = target variable role (1 if an event, 0 if a non-event)

The data used in SAS Enterprise Miner should appear as follows:

drug |
x |
freq |
target |

A | 0.1 | 1 | 1 |

A | 0.23 | 2 | 1 |

A | 0.67 | 1 | 1 |

B | 0.2 | 3 | 1 |

B | 0.3 | 4 | 1 |

B | 0.45 | 5 | 1 |

B | 0.78 | 5 | 1 |

C | 0.04 | 0 | 1 |

C | 0.15 | 0 | 1 |

C | 0.56 | 1 | 1 |

C | 0.7 | 2 | 1 |

D | 0.34 | 5 | 1 |

D | 0.6 | 5 | 1 |

D | 0.7 | 8 | 1 |

E | 0.2 | 12 | 1 |

E | 0.34 | 15 | 1 |

E | 0.56 | 13 | 1 |

E | 0.8 | 17 | 1 |

A | 0.1 | 9 | 0 |

A | 0.23 | 10 | 0 |

A | 0.67 | 8 | 0 |

B | 0.2 | 10 | 0 |

B | 0.3 | 11 | 0 |

B | 0.45 | 11 | 0 |

B | 0.78 | 8 | 0 |

C | 0.04 | 10 | 0 |

C | 0.15 | 11 | 0 |

C | 0.56 | 11 | 0 |

C | 0.7 | 10 | 0 |

D | 0.34 | 5 | 0 |

D | 0.6 | 4 | 0 |

D | 0.7 | 2 | 0 |

E | 0.2 | 8 | 0 |

E | 0.34 | 5 | 0 |

E | 0.56 | 2 | 0 |

E | 0.8 | 3 | 0 |

If you connect the newly created data source described above to a Regression node and run the flow, you will get approximately (within rounding error) the same results that you would have obtained using the LOGISTIC procedure.

I hope this helps!

Doug

All Replies

Solution

Monday

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Monday

The short answer is you can get essentially the same results but you need to structure your data somewhat differently. Instead of having only one row of data for each observation in your original data, you will have two rows of data -- one row containing the number of events and one rpw containing the number of non-events. I'll explain this in more detail below:

The scenario you are describing involves using the events/trials syntax of the LOGISTIC (or GENMOD) procedure. Consider the first example data set in the documentation for the GENMOD procedure which includes a character input variable (drug), an interval input variable (x), the number of events (r), and the number of trials (n) for each observation.

/*** BEGIN GENMOD DOCUMENTATION EXCERPT ***/

The following DATA step creates the data set:

data drug; input drug$ x r n @@; datalines; A .1 1 10 A .23 2 12 A .67 1 9 B .2 3 13 B .3 4 15 B .45 5 16 B .78 5 13 C .04 0 10 C .15 0 11 C .56 1 12 C .7 2 12 D .34 5 10 D .6 5 9 D .7 8 10 E .2 12 20 E .34 15 20 E .56 13 15 E .8 17 20 ;

/*** END GENMOD DOCUMENTATION EXCERPT ***/

which generates data that looks like the following:

drug |
x |
r |
n |

A | 0.1 | 1 | 10 |

A | 0.23 | 2 | 12 |

A | 0.67 | 1 | 9 |

B | 0.2 | 3 | 13 |

B | 0.3 | 4 | 15 |

B | 0.45 | 5 | 16 |

B | 0.78 | 5 | 13 |

C | 0.04 | 0 | 10 |

C | 0.15 | 0 | 11 |

C | 0.56 | 1 | 12 |

C | 0.7 | 2 | 12 |

D | 0.34 | 5 | 10 |

D | 0.6 | 5 | 9 |

D | 0.7 | 8 | 10 |

E | 0.2 | 12 | 20 |

E | 0.34 | 15 | 20 |

E | 0.56 | 13 | 15 |

E | 0.8 | 17 | 20 |

You could use the events/trials syntax with the LOGISTIC procedure to analyze this data set as follows:

proc logistic data=drug; class drug; model r/n = x drug; run;

To analyze this same data set in SAS Enterprise Miner, you would need to take the following steps:

1. Create a new data set containing two rows for each row in the original data set (e.g. append the data set to itself).

2. Replace the value of **r** (# of events) in each duplicate row with the number of nonevents (**n - r**)

3. Rename the column **r** to **freq** (or just specify it as a frequency variable in SAS Enterprise Miner)

4. Add another column named **target** which contains a 1 if the row contains the number of events and a 0 if the row contains the number of nonevents.

5. Drop the column **n** containing the total number of events (it is not used directly in the analysis)

6. Add the modified data set as an Input Data Source in SAS Enterprise Miner being sure to specify the following variable information:

drug = nominal input variable role

x = interval input variable role

freq = frequency variable role

target = target variable role (1 if an event, 0 if a non-event)

The data used in SAS Enterprise Miner should appear as follows:

drug |
x |
freq |
target |

A | 0.1 | 1 | 1 |

A | 0.23 | 2 | 1 |

A | 0.67 | 1 | 1 |

B | 0.2 | 3 | 1 |

B | 0.3 | 4 | 1 |

B | 0.45 | 5 | 1 |

B | 0.78 | 5 | 1 |

C | 0.04 | 0 | 1 |

C | 0.15 | 0 | 1 |

C | 0.56 | 1 | 1 |

C | 0.7 | 2 | 1 |

D | 0.34 | 5 | 1 |

D | 0.6 | 5 | 1 |

D | 0.7 | 8 | 1 |

E | 0.2 | 12 | 1 |

E | 0.34 | 15 | 1 |

E | 0.56 | 13 | 1 |

E | 0.8 | 17 | 1 |

A | 0.1 | 9 | 0 |

A | 0.23 | 10 | 0 |

A | 0.67 | 8 | 0 |

B | 0.2 | 10 | 0 |

B | 0.3 | 11 | 0 |

B | 0.45 | 11 | 0 |

B | 0.78 | 8 | 0 |

C | 0.04 | 10 | 0 |

C | 0.15 | 11 | 0 |

C | 0.56 | 11 | 0 |

C | 0.7 | 10 | 0 |

D | 0.34 | 5 | 0 |

D | 0.6 | 4 | 0 |

D | 0.7 | 2 | 0 |

E | 0.2 | 8 | 0 |

E | 0.34 | 5 | 0 |

E | 0.56 | 2 | 0 |

E | 0.8 | 3 | 0 |

If you connect the newly created data source described above to a Regression node and run the flow, you will get approximately (within rounding error) the same results that you would have obtained using the LOGISTIC procedure.

I hope this helps!

Doug