Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-03-2016 07:49 AM
(707 views)

Dear SAS/IML support team,

I am using SAS 9.3.

I am conducting simulations to compare methods for cutoff estimation. One is kernel density estimation where I use the intersection of 2 pdfs as a cutoff. For these I want to estimate sensitivity and specificity or area under the curve.

I am unable to make the code run over e.g. a 1000 simulations. Thus, I need to construct some kind of loop or by statement around it, so it gives me the output "by sampledID".

The second issue I am facing is that each sample has a different cutoff, therefore I have a second data set with the cutoffs by sampleID, which at the moment I need to enter manually.

To sum it up, I have 2 data sets, 1 with sampleID 1 to x, the values and densities for each sample from the proc kde output, and a 2. data set with the cutoffs and sampleID.

Is there any possibility to manage that with SAS?

I would like to create an output data set with sensitivity/specificity by sampleID.

I would appreciate any hint you can give me or whether you think it is even possible to conduct.

Kind regards,

Tim

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

To answer your question, yes it is possible. The main idea is to read both data sets into IML at the same time. You can then loop over each sampleID. For each sampleID you get the cutoff value and the subset of simulated values that correspond to the current sampleiD. Then compute the TN and TP values for that sample, using that cutoff.

The following program shows you one way structure the program:

```
proc iml;
start TrapIntegral(x,y);
N = nrow(x);
dx = x[2:N] - x[1:N-1];
meanY = ( y[2:N] + y[1:N-1] )/2;
return( dx` * meanY );
finish;
/* read in cutoff limits */
use Res_kernel;
read all var {sampleID cutoff_kernel};
close Res_kernel;
/** read in kernel density estimate **/
use kernel_final;
read all var {marker0 density0 marker1 density1};
close kernel_final;
numSamples = nrow(cutoff_kernel);
TN = j(numSamples, 1); /* allocate array to hold results */
TP = j(numSamples, 1); /* allocate array to hold results */
do i = 1 to nrow(cutoff_kernel); /* for each sampleID */
cutoff = cutoff_kernel[i]; /* cutoff value for this sampleID */
firstObs = (i-1)*1000 + 1; /* first obs in sample */
lastObs = i*1000; /* last obs in sample */
obs = firstObs:lastObs; /* subset for sampleID = i */
/* true negatives */
x = marker0[obs];
y = density0[obs];
idx = loc(x < cutoff);
TN[i] = TrapIntegral(x[idx],y[idx]);
/* true positives */
x = marker1[obs];
y = density1[obs];
idx = loc(x >= cutoff);
TP[i] = TrapIntegral(x[idx],y[idx]);
end;
print SampleID TN TP;
```

Incidentally, you should add ODS GRAPHICS OFF to the top of your program. That will prevent unnecessary plots from being created when you run the program interactively. (In SAS 9.4, PROC KDE creates a histogram for each BY group.)

You might want to read the article "The area under a density estimate curve" for a simpler way to compute the area. The simpler way is to estimate the empirical CDF directly from the data, rather than integrate a density estimate.

Also, I notice that you are using the difference between two density estimates to compute the cutoff value. I didn't study your code closely, but I wanted to point out that in general, two KDEs will intersect in multiple places. I'm not sure whether that is relevant to you. In general, finding the difference between density estimates involves thinking about some subtle statistical issues. You might want to read the article "The difference of density estimates: When does it make sense?"

1 REPLY 1

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

To answer your question, yes it is possible. The main idea is to read both data sets into IML at the same time. You can then loop over each sampleID. For each sampleID you get the cutoff value and the subset of simulated values that correspond to the current sampleiD. Then compute the TN and TP values for that sample, using that cutoff.

The following program shows you one way structure the program:

```
proc iml;
start TrapIntegral(x,y);
N = nrow(x);
dx = x[2:N] - x[1:N-1];
meanY = ( y[2:N] + y[1:N-1] )/2;
return( dx` * meanY );
finish;
/* read in cutoff limits */
use Res_kernel;
read all var {sampleID cutoff_kernel};
close Res_kernel;
/** read in kernel density estimate **/
use kernel_final;
read all var {marker0 density0 marker1 density1};
close kernel_final;
numSamples = nrow(cutoff_kernel);
TN = j(numSamples, 1); /* allocate array to hold results */
TP = j(numSamples, 1); /* allocate array to hold results */
do i = 1 to nrow(cutoff_kernel); /* for each sampleID */
cutoff = cutoff_kernel[i]; /* cutoff value for this sampleID */
firstObs = (i-1)*1000 + 1; /* first obs in sample */
lastObs = i*1000; /* last obs in sample */
obs = firstObs:lastObs; /* subset for sampleID = i */
/* true negatives */
x = marker0[obs];
y = density0[obs];
idx = loc(x < cutoff);
TN[i] = TrapIntegral(x[idx],y[idx]);
/* true positives */
x = marker1[obs];
y = density1[obs];
idx = loc(x >= cutoff);
TP[i] = TrapIntegral(x[idx],y[idx]);
end;
print SampleID TN TP;
```

Incidentally, you should add ODS GRAPHICS OFF to the top of your program. That will prevent unnecessary plots from being created when you run the program interactively. (In SAS 9.4, PROC KDE creates a histogram for each BY group.)

You might want to read the article "The area under a density estimate curve" for a simpler way to compute the area. The simpler way is to estimate the empirical CDF directly from the data, rather than integrate a density estimate.

Also, I notice that you are using the difference between two density estimates to compute the cutoff value. I didn't study your code closely, but I wanted to point out that in general, two KDEs will intersect in multiple places. I'm not sure whether that is relevant to you. In general, finding the difference between density estimates involves thinking about some subtle statistical issues. You might want to read the article "The difference of density estimates: When does it make sense?"

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.