Most or all instances of binary response models allow "Events/Trials" format. These binary response models could run in Proc Logistic, Proc GLIMMIX and Proc BGLIMM. Events could be number of people who cross the finish line, and trials could be the total number of runners, for example.
Sometimes there is interest to represent data in the binary format, where each trial comes repeated by row, and each event gets a row of its own, with a 1-0 or yes no flag. The reason I wanted to do this was to merge by-observation predictions into a corresponding binary table a la this example.
/*This data is in events-trials format*/
data bytrials;
input Lot$ Event_count Total;
datalines;
A 1 6
B 3 6
C 0 6
D 5 6
;
/*Flag yes events in a YesEvents data set*/
data YesEvents;
set bytrials;
do Yes_Flag=1 to Event_count;
output;
end;
run;
/*Flag nonevents in a NoEvents data set*/
data NoEvents;
set bytrials;
do No_Flag=1 to Total-Event_count;
output;
end;
run;
/*Append YesEvents and NoEvents and create Binary Event Flag*/
data byevent;
set YesEvents NoEvents;
if Yes_Flag >0 then
Event=1;
if No_Flag >0 then
Event=0;
drop Yes_Flag No_Flag Event_count Total;
run;
/*Evaluate outcome by sorting and printing*/
proc sort data=byevent;
by lot;
run;
proc print data=byevent;
run;
While this code is not simple or elegant, it did get me to the binary data set I needed. Hopefully it will prove useful for others searching for the same thing. Comments with improvements are most welcome!
Are the intermediate data sets needed for some purpose other than producing the final BYEVENT data set?
If not, this may be a more maintainable approach:
data byevent (drop=event_count total i);
set bytrials;
do i=1 to total;
if i<=event_count then event=1;
else event=0;
output;
end;
run;
I don't see a need for the YESEVENTS and NOEVENTS datasets, since they could be trivially replicated by applying a WHERE filter to the BYEVENT datasets (i.e. where event=1 or where event=0;)
This technique is known as "unrolling the data." A similar technique can be used when there is a FREQ variable in a data set, but you want to unroll the data so that each subject appears on a row by itself.
In addition to the reasons you mention, unrolling the data can be useful if you are trying to obtain a subsample of the data. You can also use the unrolled data for a bootstrap analysis or another resampling technique.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.