turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-03-2016 07:49 AM - edited 05-09-2016 05:44 AM

Dear SAS/IML support team,

I am using SAS 9.3.

I am conducting simulations to compare methods for cutoff estimation. One is kernel density estimation where I use the intersection of 2 pdfs as a cutoff. For these I want to estimate sensitivity and specificity or area under the curve.

I am unable to make the code run over e.g. a 1000 simulations. Thus, I need to construct some kind of loop or by statement around it, so it gives me the output "by sampledID".

The second issue I am facing is that each sample has a different cutoff, therefore I have a second data set with the cutoffs by sampleID, which at the moment I need to enter manually.

To sum it up, I have 2 data sets, 1 with sampleID 1 to x, the values and densities for each sample from the proc kde output, and a 2. data set with the cutoffs and sampleID.

Is there any possibility to manage that with SAS?

I would like to create an output data set with sensitivity/specificity by sampleID.

I would appreciate any hint you can give me or whether you think it is even possible to conduct.

Kind regards,

Tim

Accepted Solutions

Solution

05-09-2016
05:43 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-03-2016 08:47 AM

To answer your question, yes it is possible. The main idea is to read both data sets into IML at the same time. You can then loop over each sampleID. For each sampleID you get the cutoff value and the subset of simulated values that correspond to the current sampleiD. Then compute the TN and TP values for that sample, using that cutoff.

The following program shows you one way structure the program:

```
proc iml;
start TrapIntegral(x,y);
N = nrow(x);
dx = x[2:N] - x[1:N-1];
meanY = ( y[2:N] + y[1:N-1] )/2;
return( dx` * meanY );
finish;
/* read in cutoff limits */
use Res_kernel;
read all var {sampleID cutoff_kernel};
close Res_kernel;
/** read in kernel density estimate **/
use kernel_final;
read all var {marker0 density0 marker1 density1};
close kernel_final;
numSamples = nrow(cutoff_kernel);
TN = j(numSamples, 1); /* allocate array to hold results */
TP = j(numSamples, 1); /* allocate array to hold results */
do i = 1 to nrow(cutoff_kernel); /* for each sampleID */
cutoff = cutoff_kernel[i]; /* cutoff value for this sampleID */
firstObs = (i-1)*1000 + 1; /* first obs in sample */
lastObs = i*1000; /* last obs in sample */
obs = firstObs:lastObs; /* subset for sampleID = i */
/* true negatives */
x = marker0[obs];
y = density0[obs];
idx = loc(x < cutoff);
TN[i] = TrapIntegral(x[idx],y[idx]);
/* true positives */
x = marker1[obs];
y = density1[obs];
idx = loc(x >= cutoff);
TP[i] = TrapIntegral(x[idx],y[idx]);
end;
print SampleID TN TP;
```

Incidentally, you should add ODS GRAPHICS OFF to the top of your program. That will prevent unnecessary plots from being created when you run the program interactively. (In SAS 9.4, PROC KDE creates a histogram for each BY group.)

You might want to read the article "The area under a density estimate curve" for a simpler way to compute the area. The simpler way is to estimate the empirical CDF directly from the data, rather than integrate a density estimate.

Also, I notice that you are using the difference between two density estimates to compute the cutoff value. I didn't study your code closely, but I wanted to point out that in general, two KDEs will intersect in multiple places. I'm not sure whether that is relevant to you. In general, finding the difference between density estimates involves thinking about some subtle statistical issues. You might want to read the article "The difference of density estimates: When does it make sense?"

All Replies

Solution

05-09-2016
05:43 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-03-2016 08:47 AM

To answer your question, yes it is possible. The main idea is to read both data sets into IML at the same time. You can then loop over each sampleID. For each sampleID you get the cutoff value and the subset of simulated values that correspond to the current sampleiD. Then compute the TN and TP values for that sample, using that cutoff.

The following program shows you one way structure the program:

```
proc iml;
start TrapIntegral(x,y);
N = nrow(x);
dx = x[2:N] - x[1:N-1];
meanY = ( y[2:N] + y[1:N-1] )/2;
return( dx` * meanY );
finish;
/* read in cutoff limits */
use Res_kernel;
read all var {sampleID cutoff_kernel};
close Res_kernel;
/** read in kernel density estimate **/
use kernel_final;
read all var {marker0 density0 marker1 density1};
close kernel_final;
numSamples = nrow(cutoff_kernel);
TN = j(numSamples, 1); /* allocate array to hold results */
TP = j(numSamples, 1); /* allocate array to hold results */
do i = 1 to nrow(cutoff_kernel); /* for each sampleID */
cutoff = cutoff_kernel[i]; /* cutoff value for this sampleID */
firstObs = (i-1)*1000 + 1; /* first obs in sample */
lastObs = i*1000; /* last obs in sample */
obs = firstObs:lastObs; /* subset for sampleID = i */
/* true negatives */
x = marker0[obs];
y = density0[obs];
idx = loc(x < cutoff);
TN[i] = TrapIntegral(x[idx],y[idx]);
/* true positives */
x = marker1[obs];
y = density1[obs];
idx = loc(x >= cutoff);
TP[i] = TrapIntegral(x[idx],y[idx]);
end;
print SampleID TN TP;
```

Incidentally, you should add ODS GRAPHICS OFF to the top of your program. That will prevent unnecessary plots from being created when you run the program interactively. (In SAS 9.4, PROC KDE creates a histogram for each BY group.)

You might want to read the article "The area under a density estimate curve" for a simpler way to compute the area. The simpler way is to estimate the empirical CDF directly from the data, rather than integrate a density estimate.

Also, I notice that you are using the difference between two density estimates to compute the cutoff value. I didn't study your code closely, but I wanted to point out that in general, two KDEs will intersect in multiple places. I'm not sure whether that is relevant to you. In general, finding the difference between density estimates involves thinking about some subtle statistical issues. You might want to read the article "The difference of density estimates: When does it make sense?"