Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

r/n logistic regression in SAS EM

Accepted Solution Solved
Reply
Occasional Contributor rbm
Occasional Contributor
Posts: 5
Accepted Solution

r/n logistic regression in SAS EM

Hi All,

As we know PROC LOGISTIC allows the input of binary response data that are grouped:

proc logistic;
model r/n=x1 x2;
run;

Here, n represents the number of trials and r represents the number of events.

Does anyone know if it is possible to run the previous example in EM? Do you know how to do it?

Thanks for your help,
Reynaldo

Accepted Solutions
Solution
‎08-14-2017 12:55 PM
SAS Employee
Posts: 179

Re: r/n logistic regression in SAS EM

The short answer is you can get essentially the same results but you need to structure your data somewhat differently.  Instead of having only one row of data for each observation in your original data, you will have two rows of data -- one row containing the number of events and one rpw containing the number of non-events.   I'll explain this in more detail below:

 

The scenario you are describing involves using the events/trials syntax of the LOGISTIC (or GENMOD) procedure.  Consider the first example data set in the documentation for the GENMOD procedure which includes a character input variable (drug), an interval input variable (x), the number of events (r), and the number of trials (n) for each observation.   

 

/*** BEGIN GENMOD DOCUMENTATION EXCERPT ***/

 

The following DATA step creates the data set:

data drug;
   input drug$ x r n @@;
   datalines;
A  .1   1  10   A  .23  2  12   A  .67  1   9
B  .2   3  13   B  .3   4  15   B  .45  5  16   B  .78  5  13
C  .04  0  10   C  .15  0  11   C  .56  1  12   C  .7   2  12
D  .34  5  10   D  .6   5   9   D  .7   8  10
E  .2  12  20   E  .34 15  20   E  .56 13  15   E  .8  17  20
;

  

/*** END GENMOD DOCUMENTATION EXCERPT ***/

 

which generates data that looks like the following:

 

drug x r n
A 0.1 1 10
A 0.23 2 12
A 0.67 1 9
B 0.2 3 13
B 0.3 4 15
B 0.45 5 16
B 0.78 5 13
C 0.04 0 10
C 0.15 0 11
C 0.56 1 12
C 0.7 2 12
D 0.34 5 10
D 0.6 5 9
D 0.7 8 10
E 0.2 12 20
E 0.34 15 20
E 0.56 13 15
E 0.8 17 20

 

You could use the events/trials syntax with the LOGISTIC procedure to analyze this data set as follows:

 

proc logistic data=drug;
   class drug;
   model r/n = x drug;
run;

 

To analyze this same data set in SAS Enterprise Miner, you would need to take the following steps:

1. Create a new data set containing two rows for each row in the original data set (e.g. append the data set to itself).

2. Replace the value of r (# of events) in each duplicate row with the number of nonevents (n - r)

3. Rename the column r to freq (or just specify it as a frequency variable in SAS Enterprise Miner)

4. Add another column named target which contains a 1 if the row contains the number of events and a 0 if the row contains the number of nonevents.  

5. Drop the column n containing the total number of events (it is not used directly in the analysis)

6. Add the modified data set as an Input Data Source in SAS Enterprise Miner being sure to specify the following variable information:

      drug = nominal input variable role

            x = interval input variable role 

        freq = frequency variable role

     target = target variable role (1 if an event, 0 if a non-event)

 

The data used in SAS Enterprise Miner should appear as follows:

 

drug x freq target
A 0.1 1 1
A 0.23 2 1
A 0.67 1 1
B 0.2 3 1
B 0.3 4 1
B 0.45 5 1
B 0.78 5 1
C 0.04 0 1
C 0.15 0 1
C 0.56 1 1
C 0.7 2 1
D 0.34 5 1
D 0.6 5 1
D 0.7 8 1
E 0.2 12 1
E 0.34 15 1
E 0.56 13 1
E 0.8 17 1
A 0.1 9 0
A 0.23 10 0
A 0.67 8 0
B 0.2 10 0
B 0.3 11 0
B 0.45 11 0
B 0.78 8 0
C 0.04 10 0
C 0.15 11 0
C 0.56 11 0
C 0.7 10 0
D 0.34 5 0
D 0.6 4 0
D 0.7 2 0
E 0.2 8 0
E 0.34 5 0
E 0.56 2 0
E 0.8 3 0

 

If you connect the newly created data source described above to a Regression node and run the flow, you will get approximately (within rounding error) the same results that you would have obtained using the LOGISTIC procedure. 

 

I hope this helps!

Doug

View solution in original post


All Replies
Solution
‎08-14-2017 12:55 PM
SAS Employee
Posts: 179

Re: r/n logistic regression in SAS EM

The short answer is you can get essentially the same results but you need to structure your data somewhat differently.  Instead of having only one row of data for each observation in your original data, you will have two rows of data -- one row containing the number of events and one rpw containing the number of non-events.   I'll explain this in more detail below:

 

The scenario you are describing involves using the events/trials syntax of the LOGISTIC (or GENMOD) procedure.  Consider the first example data set in the documentation for the GENMOD procedure which includes a character input variable (drug), an interval input variable (x), the number of events (r), and the number of trials (n) for each observation.   

 

/*** BEGIN GENMOD DOCUMENTATION EXCERPT ***/

 

The following DATA step creates the data set:

data drug;
   input drug$ x r n @@;
   datalines;
A  .1   1  10   A  .23  2  12   A  .67  1   9
B  .2   3  13   B  .3   4  15   B  .45  5  16   B  .78  5  13
C  .04  0  10   C  .15  0  11   C  .56  1  12   C  .7   2  12
D  .34  5  10   D  .6   5   9   D  .7   8  10
E  .2  12  20   E  .34 15  20   E  .56 13  15   E  .8  17  20
;

  

/*** END GENMOD DOCUMENTATION EXCERPT ***/

 

which generates data that looks like the following:

 

drug x r n
A 0.1 1 10
A 0.23 2 12
A 0.67 1 9
B 0.2 3 13
B 0.3 4 15
B 0.45 5 16
B 0.78 5 13
C 0.04 0 10
C 0.15 0 11
C 0.56 1 12
C 0.7 2 12
D 0.34 5 10
D 0.6 5 9
D 0.7 8 10
E 0.2 12 20
E 0.34 15 20
E 0.56 13 15
E 0.8 17 20

 

You could use the events/trials syntax with the LOGISTIC procedure to analyze this data set as follows:

 

proc logistic data=drug;
   class drug;
   model r/n = x drug;
run;

 

To analyze this same data set in SAS Enterprise Miner, you would need to take the following steps:

1. Create a new data set containing two rows for each row in the original data set (e.g. append the data set to itself).

2. Replace the value of r (# of events) in each duplicate row with the number of nonevents (n - r)

3. Rename the column r to freq (or just specify it as a frequency variable in SAS Enterprise Miner)

4. Add another column named target which contains a 1 if the row contains the number of events and a 0 if the row contains the number of nonevents.  

5. Drop the column n containing the total number of events (it is not used directly in the analysis)

6. Add the modified data set as an Input Data Source in SAS Enterprise Miner being sure to specify the following variable information:

      drug = nominal input variable role

            x = interval input variable role 

        freq = frequency variable role

     target = target variable role (1 if an event, 0 if a non-event)

 

The data used in SAS Enterprise Miner should appear as follows:

 

drug x freq target
A 0.1 1 1
A 0.23 2 1
A 0.67 1 1
B 0.2 3 1
B 0.3 4 1
B 0.45 5 1
B 0.78 5 1
C 0.04 0 1
C 0.15 0 1
C 0.56 1 1
C 0.7 2 1
D 0.34 5 1
D 0.6 5 1
D 0.7 8 1
E 0.2 12 1
E 0.34 15 1
E 0.56 13 1
E 0.8 17 1
A 0.1 9 0
A 0.23 10 0
A 0.67 8 0
B 0.2 10 0
B 0.3 11 0
B 0.45 11 0
B 0.78 8 0
C 0.04 10 0
C 0.15 11 0
C 0.56 11 0
C 0.7 10 0
D 0.34 5 0
D 0.6 4 0
D 0.7 2 0
E 0.2 8 0
E 0.34 5 0
E 0.56 2 0
E 0.8 3 0

 

If you connect the newly created data source described above to a Regression node and run the flow, you will get approximately (within rounding error) the same results that you would have obtained using the LOGISTIC procedure. 

 

I hope this helps!

Doug

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 204 views
  • 0 likes
  • 2 in conversation