turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Proc logistic, how to get the observations with ti...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2016 03:49 AM

Hi,

I built a logistic model and the number of ties are about 24%. How can I identify the observations which have ties, so that I can analyse them?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2016 05:33 AM

How do you define this TIES ? the obs have the same value in all the variables ?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2016 06:58 AM

No, I am talking about the section of logistic model, which tells about, concordant, dis-concordant and ties. Which are calculated from pairs of scored probabilities of target 1 and 0.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2016 07:32 AM

- Get the scored data set - with predicted probabilities. Look at output statement.
- Run a proc freq to generate Ctable
- Extract ties...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2016 09:10 AM

proc freq to run ctable?, Can you please explain that?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2016 08:23 AM - edited 03-17-2016 08:37 AM

Hi @munitech4u,

Here is an example:

Let's take dataset REMISSION from the PROC LOGISTIC documentation as a basis.

```
/* Add an ID to identify observations */
data Remission;
set Remission;
id=_n_;
run; /* 27 obs. */
/* Run an arbitrary logistic regression,
write predicted probabilities to dataset PRED */
proc logistic data=Remission;
model remiss(event='1')=li;
output out=pred p=p;
run;
/* Create dataset TIES with "tied" pairs of IDs */
proc sql;
create table ties as
select a.id as id1, b.id as id2
from pred a, pred b
where a.id<b.id & a.remiss ne b.remiss & a.p=b.p;
quit; /* 5 obs. */
```

Alternatively, you could create a dataset with all relevant pairs:

```
/* Create dataset PAIRS with all pairs of IDs considered in output table
"Association of Predicted Probabilities and Observed Responses" */
proc sql;
create table pairs as
select a.id as id1, b.id as id2, a.p as p1, b.p as p2, a.remiss as r1, b.remiss as r2,
case when r1=1 & r2=0 & p1>p2 | r1=0 & r2=1 & p1<p2 then 'Concordant'
when r1=1 & r2=0 & p1<p2 | r1=0 & r2=1 & p1>p2 then 'Discordant'
else 'Tied' end as assoc
from pred a, pred b
where a.id<b.id & a.remiss ne b.remiss;
quit; /* 162 obs. */
proc freq data=pairs;
tables assoc;
run;
```

Result:

Cumulative Cumulative assoc Frequency Percent Frequency Percent --------------------------------------------------------------- Concordant 136 83.95 136 83.95 Discordant 21 12.96 157 96.91 Tied 5 3.09 162 100.00

This corresponds to table "Association of Predicted Probabilities and Observed Responses" in Output 72.1.2 (see link above).

(Edit: just improved layout)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2016 09:09 AM

Thanks, but do you recommend running it on a dataset as large as 4 million?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2016 09:52 AM - edited 03-17-2016 10:01 AM

munitech4u wrote:

Thanks, but do you recommend running it on a dataset as large as 4 million?

No, given this new information I would choose a different approach:

```
/* "Blow up" the test dataset and add an ID to identify observations */
data Remission;
set Remission;
do i=1 to 148149;
id=(_n_-1)*148149+i;
output;
end;
drop i;
run; /* 4000023 obs. */
/* Run an arbitrary logistic regression,
write predicted probabilities to dataset PRED */
proc logistic data=Remission;
model remiss(event='1')=li;
output out=pred p=p;
run;
/* Select "tied" observations */
proc sql;
create table tied_obs(drop=_level_) as
select *
from pred
group by p
having count(distinct remiss)>1;
quit; /* 1185192 obs. */
```

This has the additional advantage that you have the other variables from dataset PRED in dataset TIED_OBS, so you can start your analysis immediately.

Edit: Simplified HAVING condition: count(*)>1 was redundant.