## r/n logistic regression in SAS EM

Solved
Occasional Contributor
Posts: 5

# r/n logistic regression in SAS EM

Hi All,

As we know PROC LOGISTIC allows the input of binary response data that are grouped:

proc logistic;
model r/n=x1 x2;
run;

Here, n represents the number of trials and r represents the number of events.

Does anyone know if it is possible to run the previous example in EM? Do you know how to do it?

Reynaldo

Accepted Solutions
Solution
‎08-14-2017 12:55 PM
SAS Employee
Posts: 179

## Re: r/n logistic regression in SAS EM

The short answer is you can get essentially the same results but you need to structure your data somewhat differently.  Instead of having only one row of data for each observation in your original data, you will have two rows of data -- one row containing the number of events and one rpw containing the number of non-events.   I'll explain this in more detail below:

The scenario you are describing involves using the events/trials syntax of the LOGISTIC (or GENMOD) procedure.  Consider the first example data set in the documentation for the GENMOD procedure which includes a character input variable (drug), an interval input variable (x), the number of events (r), and the number of trials (n) for each observation.

/*** BEGIN GENMOD DOCUMENTATION EXCERPT ***/

The following DATA step creates the data set:

```data drug;
input drug\$ x r n @@;
datalines;
A  .1   1  10   A  .23  2  12   A  .67  1   9
B  .2   3  13   B  .3   4  15   B  .45  5  16   B  .78  5  13
C  .04  0  10   C  .15  0  11   C  .56  1  12   C  .7   2  12
D  .34  5  10   D  .6   5   9   D  .7   8  10
E  .2  12  20   E  .34 15  20   E  .56 13  15   E  .8  17  20
;```

/*** END GENMOD DOCUMENTATION EXCERPT ***/

which generates data that looks like the following:

 drug x r n A 0.1 1 10 A 0.23 2 12 A 0.67 1 9 B 0.2 3 13 B 0.3 4 15 B 0.45 5 16 B 0.78 5 13 C 0.04 0 10 C 0.15 0 11 C 0.56 1 12 C 0.7 2 12 D 0.34 5 10 D 0.6 5 9 D 0.7 8 10 E 0.2 12 20 E 0.34 15 20 E 0.56 13 15 E 0.8 17 20

You could use the events/trials syntax with the LOGISTIC procedure to analyze this data set as follows:

```proc logistic data=drug;
class drug;
model r/n = x drug;
run;```

To analyze this same data set in SAS Enterprise Miner, you would need to take the following steps:

1. Create a new data set containing two rows for each row in the original data set (e.g. append the data set to itself).

2. Replace the value of r (# of events) in each duplicate row with the number of nonevents (n - r)

3. Rename the column r to freq (or just specify it as a frequency variable in SAS Enterprise Miner)

4. Add another column named target which contains a 1 if the row contains the number of events and a 0 if the row contains the number of nonevents.

5. Drop the column n containing the total number of events (it is not used directly in the analysis)

6. Add the modified data set as an Input Data Source in SAS Enterprise Miner being sure to specify the following variable information:

drug = nominal input variable role

x = interval input variable role

freq = frequency variable role

target = target variable role (1 if an event, 0 if a non-event)

The data used in SAS Enterprise Miner should appear as follows:

 drug x freq target A 0.1 1 1 A 0.23 2 1 A 0.67 1 1 B 0.2 3 1 B 0.3 4 1 B 0.45 5 1 B 0.78 5 1 C 0.04 0 1 C 0.15 0 1 C 0.56 1 1 C 0.7 2 1 D 0.34 5 1 D 0.6 5 1 D 0.7 8 1 E 0.2 12 1 E 0.34 15 1 E 0.56 13 1 E 0.8 17 1 A 0.1 9 0 A 0.23 10 0 A 0.67 8 0 B 0.2 10 0 B 0.3 11 0 B 0.45 11 0 B 0.78 8 0 C 0.04 10 0 C 0.15 11 0 C 0.56 11 0 C 0.7 10 0 D 0.34 5 0 D 0.6 4 0 D 0.7 2 0 E 0.2 8 0 E 0.34 5 0 E 0.56 2 0 E 0.8 3 0

If you connect the newly created data source described above to a Regression node and run the flow, you will get approximately (within rounding error) the same results that you would have obtained using the LOGISTIC procedure.

I hope this helps!

Doug

All Replies
Solution
‎08-14-2017 12:55 PM
SAS Employee
Posts: 179

## Re: r/n logistic regression in SAS EM

The short answer is you can get essentially the same results but you need to structure your data somewhat differently.  Instead of having only one row of data for each observation in your original data, you will have two rows of data -- one row containing the number of events and one rpw containing the number of non-events.   I'll explain this in more detail below:

The scenario you are describing involves using the events/trials syntax of the LOGISTIC (or GENMOD) procedure.  Consider the first example data set in the documentation for the GENMOD procedure which includes a character input variable (drug), an interval input variable (x), the number of events (r), and the number of trials (n) for each observation.

/*** BEGIN GENMOD DOCUMENTATION EXCERPT ***/

The following DATA step creates the data set:

```data drug;
input drug\$ x r n @@;
datalines;
A  .1   1  10   A  .23  2  12   A  .67  1   9
B  .2   3  13   B  .3   4  15   B  .45  5  16   B  .78  5  13
C  .04  0  10   C  .15  0  11   C  .56  1  12   C  .7   2  12
D  .34  5  10   D  .6   5   9   D  .7   8  10
E  .2  12  20   E  .34 15  20   E  .56 13  15   E  .8  17  20
;```

/*** END GENMOD DOCUMENTATION EXCERPT ***/

which generates data that looks like the following:

 drug x r n A 0.1 1 10 A 0.23 2 12 A 0.67 1 9 B 0.2 3 13 B 0.3 4 15 B 0.45 5 16 B 0.78 5 13 C 0.04 0 10 C 0.15 0 11 C 0.56 1 12 C 0.7 2 12 D 0.34 5 10 D 0.6 5 9 D 0.7 8 10 E 0.2 12 20 E 0.34 15 20 E 0.56 13 15 E 0.8 17 20

You could use the events/trials syntax with the LOGISTIC procedure to analyze this data set as follows:

```proc logistic data=drug;
class drug;
model r/n = x drug;
run;```

To analyze this same data set in SAS Enterprise Miner, you would need to take the following steps:

1. Create a new data set containing two rows for each row in the original data set (e.g. append the data set to itself).

2. Replace the value of r (# of events) in each duplicate row with the number of nonevents (n - r)

3. Rename the column r to freq (or just specify it as a frequency variable in SAS Enterprise Miner)

4. Add another column named target which contains a 1 if the row contains the number of events and a 0 if the row contains the number of nonevents.

5. Drop the column n containing the total number of events (it is not used directly in the analysis)

6. Add the modified data set as an Input Data Source in SAS Enterprise Miner being sure to specify the following variable information:

drug = nominal input variable role

x = interval input variable role

freq = frequency variable role

target = target variable role (1 if an event, 0 if a non-event)

The data used in SAS Enterprise Miner should appear as follows:

 drug x freq target A 0.1 1 1 A 0.23 2 1 A 0.67 1 1 B 0.2 3 1 B 0.3 4 1 B 0.45 5 1 B 0.78 5 1 C 0.04 0 1 C 0.15 0 1 C 0.56 1 1 C 0.7 2 1 D 0.34 5 1 D 0.6 5 1 D 0.7 8 1 E 0.2 12 1 E 0.34 15 1 E 0.56 13 1 E 0.8 17 1 A 0.1 9 0 A 0.23 10 0 A 0.67 8 0 B 0.2 10 0 B 0.3 11 0 B 0.45 11 0 B 0.78 8 0 C 0.04 10 0 C 0.15 11 0 C 0.56 11 0 C 0.7 10 0 D 0.34 5 0 D 0.6 4 0 D 0.7 2 0 E 0.2 8 0 E 0.34 5 0 E 0.56 2 0 E 0.8 3 0

If you connect the newly created data source described above to a Regression node and run the flow, you will get approximately (within rounding error) the same results that you would have obtained using the LOGISTIC procedure.

I hope this helps!

Doug

☑ This topic is solved.