BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rbm
Calcite | Level 5 rbm
Calcite | Level 5
Hi All,

As we know PROC LOGISTIC allows the input of binary response data that are grouped:

proc logistic;
model r/n=x1 x2;
run;

Here, n represents the number of trials and r represents the number of events.

Does anyone know if it is possible to run the previous example in EM? Do you know how to do it?

Thanks for your help,
Reynaldo
1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

The short answer is you can get essentially the same results but you need to structure your data somewhat differently.  Instead of having only one row of data for each observation in your original data, you will have two rows of data -- one row containing the number of events and one rpw containing the number of non-events.   I'll explain this in more detail below:

 

The scenario you are describing involves using the events/trials syntax of the LOGISTIC (or GENMOD) procedure.  Consider the first example data set in the documentation for the GENMOD procedure which includes a character input variable (drug), an interval input variable (x), the number of events (r), and the number of trials (n) for each observation.   

 

/*** BEGIN GENMOD DOCUMENTATION EXCERPT ***/

 

The following DATA step creates the data set:

data drug;
   input drug$ x r n @@;
   datalines;
A  .1   1  10   A  .23  2  12   A  .67  1   9
B  .2   3  13   B  .3   4  15   B  .45  5  16   B  .78  5  13
C  .04  0  10   C  .15  0  11   C  .56  1  12   C  .7   2  12
D  .34  5  10   D  .6   5   9   D  .7   8  10
E  .2  12  20   E  .34 15  20   E  .56 13  15   E  .8  17  20
;

  

/*** END GENMOD DOCUMENTATION EXCERPT ***/

 

which generates data that looks like the following:

 

drug x r n
A 0.1 1 10
A 0.23 2 12
A 0.67 1 9
B 0.2 3 13
B 0.3 4 15
B 0.45 5 16
B 0.78 5 13
C 0.04 0 10
C 0.15 0 11
C 0.56 1 12
C 0.7 2 12
D 0.34 5 10
D 0.6 5 9
D 0.7 8 10
E 0.2 12 20
E 0.34 15 20
E 0.56 13 15
E 0.8 17 20

 

You could use the events/trials syntax with the LOGISTIC procedure to analyze this data set as follows:

 

proc logistic data=drug;
   class drug;
   model r/n = x drug;
run;

 

To analyze this same data set in SAS Enterprise Miner, you would need to take the following steps:

1. Create a new data set containing two rows for each row in the original data set (e.g. append the data set to itself).

2. Replace the value of r (# of events) in each duplicate row with the number of nonevents (n - r)

3. Rename the column r to freq (or just specify it as a frequency variable in SAS Enterprise Miner)

4. Add another column named target which contains a 1 if the row contains the number of events and a 0 if the row contains the number of nonevents.  

5. Drop the column n containing the total number of events (it is not used directly in the analysis)

6. Add the modified data set as an Input Data Source in SAS Enterprise Miner being sure to specify the following variable information:

      drug = nominal input variable role

            x = interval input variable role 

        freq = frequency variable role

     target = target variable role (1 if an event, 0 if a non-event)

 

The data used in SAS Enterprise Miner should appear as follows:

 

drug x freq target
A 0.1 1 1
A 0.23 2 1
A 0.67 1 1
B 0.2 3 1
B 0.3 4 1
B 0.45 5 1
B 0.78 5 1
C 0.04 0 1
C 0.15 0 1
C 0.56 1 1
C 0.7 2 1
D 0.34 5 1
D 0.6 5 1
D 0.7 8 1
E 0.2 12 1
E 0.34 15 1
E 0.56 13 1
E 0.8 17 1
A 0.1 9 0
A 0.23 10 0
A 0.67 8 0
B 0.2 10 0
B 0.3 11 0
B 0.45 11 0
B 0.78 8 0
C 0.04 10 0
C 0.15 11 0
C 0.56 11 0
C 0.7 10 0
D 0.34 5 0
D 0.6 4 0
D 0.7 2 0
E 0.2 8 0
E 0.34 5 0
E 0.56 2 0
E 0.8 3 0

 

If you connect the newly created data source described above to a Regression node and run the flow, you will get approximately (within rounding error) the same results that you would have obtained using the LOGISTIC procedure. 

 

I hope this helps!

Doug

View solution in original post

1 REPLY 1
DougWielenga
SAS Employee

The short answer is you can get essentially the same results but you need to structure your data somewhat differently.  Instead of having only one row of data for each observation in your original data, you will have two rows of data -- one row containing the number of events and one rpw containing the number of non-events.   I'll explain this in more detail below:

 

The scenario you are describing involves using the events/trials syntax of the LOGISTIC (or GENMOD) procedure.  Consider the first example data set in the documentation for the GENMOD procedure which includes a character input variable (drug), an interval input variable (x), the number of events (r), and the number of trials (n) for each observation.   

 

/*** BEGIN GENMOD DOCUMENTATION EXCERPT ***/

 

The following DATA step creates the data set:

data drug;
   input drug$ x r n @@;
   datalines;
A  .1   1  10   A  .23  2  12   A  .67  1   9
B  .2   3  13   B  .3   4  15   B  .45  5  16   B  .78  5  13
C  .04  0  10   C  .15  0  11   C  .56  1  12   C  .7   2  12
D  .34  5  10   D  .6   5   9   D  .7   8  10
E  .2  12  20   E  .34 15  20   E  .56 13  15   E  .8  17  20
;

  

/*** END GENMOD DOCUMENTATION EXCERPT ***/

 

which generates data that looks like the following:

 

drug x r n
A 0.1 1 10
A 0.23 2 12
A 0.67 1 9
B 0.2 3 13
B 0.3 4 15
B 0.45 5 16
B 0.78 5 13
C 0.04 0 10
C 0.15 0 11
C 0.56 1 12
C 0.7 2 12
D 0.34 5 10
D 0.6 5 9
D 0.7 8 10
E 0.2 12 20
E 0.34 15 20
E 0.56 13 15
E 0.8 17 20

 

You could use the events/trials syntax with the LOGISTIC procedure to analyze this data set as follows:

 

proc logistic data=drug;
   class drug;
   model r/n = x drug;
run;

 

To analyze this same data set in SAS Enterprise Miner, you would need to take the following steps:

1. Create a new data set containing two rows for each row in the original data set (e.g. append the data set to itself).

2. Replace the value of r (# of events) in each duplicate row with the number of nonevents (n - r)

3. Rename the column r to freq (or just specify it as a frequency variable in SAS Enterprise Miner)

4. Add another column named target which contains a 1 if the row contains the number of events and a 0 if the row contains the number of nonevents.  

5. Drop the column n containing the total number of events (it is not used directly in the analysis)

6. Add the modified data set as an Input Data Source in SAS Enterprise Miner being sure to specify the following variable information:

      drug = nominal input variable role

            x = interval input variable role 

        freq = frequency variable role

     target = target variable role (1 if an event, 0 if a non-event)

 

The data used in SAS Enterprise Miner should appear as follows:

 

drug x freq target
A 0.1 1 1
A 0.23 2 1
A 0.67 1 1
B 0.2 3 1
B 0.3 4 1
B 0.45 5 1
B 0.78 5 1
C 0.04 0 1
C 0.15 0 1
C 0.56 1 1
C 0.7 2 1
D 0.34 5 1
D 0.6 5 1
D 0.7 8 1
E 0.2 12 1
E 0.34 15 1
E 0.56 13 1
E 0.8 17 1
A 0.1 9 0
A 0.23 10 0
A 0.67 8 0
B 0.2 10 0
B 0.3 11 0
B 0.45 11 0
B 0.78 8 0
C 0.04 10 0
C 0.15 11 0
C 0.56 11 0
C 0.7 10 0
D 0.34 5 0
D 0.6 4 0
D 0.7 2 0
E 0.2 8 0
E 0.34 5 0
E 0.56 2 0
E 0.8 3 0

 

If you connect the newly created data source described above to a Regression node and run the flow, you will get approximately (within rounding error) the same results that you would have obtained using the LOGISTIC procedure. 

 

I hope this helps!

Doug

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1150 views
  • 0 likes
  • 2 in conversation