## Proc logistic, how to get the observations with ties?

Regular Contributor
Posts: 196

# Proc logistic, how to get the observations with ties?

Hi,

I built a logistic model and the number of ties are about 24%. How can I identify the observations which have ties, so that I can analyse them?

Super User
Posts: 10,205

## Re: Proc logistic, how to get the observations with ties?

How do you define this TIES ? the obs have the same value in all the variables ?

Regular Contributor
Posts: 196

## Re: Proc logistic, how to get the observations with ties?

No, I am talking about the section of logistic model, which tells about, concordant, dis-concordant and ties. Which are calculated from pairs of scored probabilities of target 1 and 0.
Super User
Posts: 20,727

## Re: Proc logistic, how to get the observations with ties?

1. Get the scored data set - with predicted probabilities. Look at output statement.
2. Run a proc freq to generate Ctable
3. Extract ties...
Regular Contributor
Posts: 196

## Re: Proc logistic, how to get the observations with ties?

proc freq to run ctable?, Can you please explain that?
Posts: 1,125

## Re: Proc logistic, how to get the observations with ties?

[ Edited ]

Hi @munitech4u,

Here is an example:

Let's take dataset REMISSION from the PROC LOGISTIC documentation as a basis.

``````/* Add an ID to identify observations */
data Remission;
set Remission;
id=_n_;
run; /* 27 obs. */

/* Run an arbitrary logistic regression,
write predicted probabilities to dataset PRED */
proc logistic data=Remission;
model remiss(event='1')=li;
output out=pred p=p;
run;

/* Create dataset TIES with "tied" pairs of IDs */
proc sql;
create table ties as
select a.id as id1, b.id as id2
from pred a, pred b
where a.id<b.id & a.remiss ne b.remiss & a.p=b.p;
quit; /* 5 obs. */``````

Alternatively, you could create a dataset with all relevant pairs:

``````/* Create dataset PAIRS with all pairs of IDs considered in output table
"Association of Predicted Probabilities and Observed Responses" */
proc sql;
create table pairs as
select a.id as id1, b.id as id2, a.p as p1, b.p as p2, a.remiss as r1, b.remiss as r2,
case when r1=1 & r2=0 & p1>p2 | r1=0 & r2=1 & p1<p2 then 'Concordant'
when r1=1 & r2=0 & p1<p2 | r1=0 & r2=1 & p1>p2 then 'Discordant'
else 'Tied' end as assoc
from pred a, pred b
where a.id<b.id & a.remiss ne b.remiss;
quit; /* 162 obs. */

proc freq data=pairs;
tables assoc;
run;``````

Result:

```                                       Cumulative    Cumulative
assoc         Frequency     Percent     Frequency      Percent
---------------------------------------------------------------
Concordant         136       83.95           136        83.95
Discordant          21       12.96           157        96.91
Tied                 5        3.09           162       100.00```

This corresponds to table "Association of Predicted Probabilities and Observed Responses" in Output 72.1.2 (see link above).

(Edit: just improved layout)

Regular Contributor
Posts: 196

## Re: Proc logistic, how to get the observations with ties?

Thanks, but do you recommend running it on a dataset as large as 4 million?
Posts: 1,125

## Re: Proc logistic, how to get the observations with ties?

[ Edited ]

munitech4u wrote:
Thanks, but do you recommend running it on a dataset as large as 4 million?

No, given this new information I would choose a different approach:

``````/* "Blow up" the test dataset and add an ID to identify observations */
data Remission;
set Remission;
do i=1 to 148149;
id=(_n_-1)*148149+i;
output;
end;
drop i;
run; /* 4000023 obs. */

/* Run an arbitrary logistic regression,
write predicted probabilities to dataset PRED */
proc logistic data=Remission;
model remiss(event='1')=li;
output out=pred p=p;
run;

/* Select "tied" observations */
proc sql;
create table tied_obs(drop=_level_) as
select *
from pred
group by p
having count(distinct remiss)>1;
quit; /* 1185192 obs. */``````

This has the additional advantage that you have the other variables from dataset PRED in dataset TIED_OBS, so you can start your analysis immediately.

Edit: Simplified HAVING condition: count(*)>1 was redundant.

Discussion stats
• 7 replies
• 406 views
• 3 likes
• 4 in conversation