Hi all,
The macro below perform 10 fold cross validation. The macro compute the predicted probabilities in each of test data generated and append the results in the predicted dataset.
I need to repeat the same process 500 times (currently the process is run 1 time). I was if you would be able to advise how I can modify the macro so that I can repeat this process.
Note: the data and the codes are below:
data kyphosis;
set kyphosis;
theRandom = ranuni(86);
run;
*Then, divide the dataset into 10 groups based on the random number;
/*Open the dataset kRanked to verify that each observation is ranked 0 to 9 (10 even groups).*/;
proc rank data=kyphosis out = kRanked groups = 10;
var theRandom;
run;
/*STEP 2: Repeat the following 10 times:
i. Fit a logistic regression model on 9/10 of your data (the training dataset) and hold aside the other 1/10 as the test dataset.
ii. Use the fitted model to calculate the predicted probability of kyphosis=1 for each observation in the test dataset.
iii. Store these predicted probabilities in a new dataset, "predicted".*/
/*PS: Since we will later be appending observations onto the dataset ‘predicted’, we want to make
sure that there is not already a dataset called predicted.*/
proc datasets library = work nodetails nolist;
delete predicted;
run;
quit;
/*The MACRO*/
%macro runit;
%do x = 0 %to 10; * asks SAS to repeat the steps 10 times;
proc logistic data = kRanked outmodel = model&x.; *Fit the logistic model on 9/10 of the data, and
output the model into the dataset model0 (when x=0), model1 (when x=1), etc.;
model kyphosis (event="1") = y1;
where theRandom ne &x; * Omit 1/10 of the data (eg, when x=0, omit the observations where theRandom=0).;
run;
data training&x.;
set kRanked;
where theRandom ne &x;
run;
data test&x.; * Put the omitted data into the test dataset, called test0 (when x=0), test1 (when x=1), etc.;
set kRanked;
where theRandom = &x;
run;
proc logistic inmodel = model&x.; *Apply the logistic model to the test dataset and put
the predicted probabilities into a dataset predicted0, predicted1, etc.;
score data= test&x. out = predicted&x.;
run;
proc append base = predicted data = predicted&x.; *Keep adding the predicted values to a single dataset ‘predicted.’ ;
run;
%end;
%mend runit;
%runit;
My dataset called kyphosis can be found below
y1 | kyphosis |
28 | 0 |
15.5 | 0 |
8.2 | 0 |
3.4 | 0 |
17.3 | 0 |
15.2 | 0 |
32.9 | 0 |
11.1 | 0 |
87.5 | 0 |
16.2 | 0 |
107.9 | 0 |
5.7 | 0 |
25.6 | 0 |
31.2 | 0 |
21.6 | 0 |
55.6 | 0 |
8.8 | 0 |
6.5 | 0 |
22.1 | 0 |
14.4 | 0 |
44.2 | 0 |
3.7 | 0 |
7.8 | 0 |
8.9 | 0 |
18 | 0 |
6.5 | 0 |
4.9 | 0 |
10.4 | 0 |
5 | 0 |
5.3 | 0 |
6.5 | 0 |
6.9 | 0 |
8.2 | 0 |
21.8 | 0 |
6.6 | 0 |
7.6 | 0 |
15.4 | 0 |
59.2 | 0 |
5.1 | 0 |
10 | 0 |
5.3 | 0 |
32.6 | 0 |
4.6 | 0 |
6.9 | 0 |
4 | 0 |
3.65 | 0 |
7.8 | 0 |
32.5 | 0 |
11.5 | 0 |
4 | 0 |
10.2 | 0 |
2.4 | 1 |
719 | 1 |
2106.667 | 1 |
24000 | 1 |
1715 | 1 |
3.6 | 1 |
521.5 | 1 |
1600 | 1 |
454 | 1 |
109.7 | 1 |
23.7 | 1 |
464 | 1 |
9810 | 1 |
255 | 1 |
58.7 | 1 |
225 | 1 |
90.1 | 1 |
50 | 1 |
5.6 | 1 |
4070 | 1 |
592 | 1 |
28.6 | 1 |
6160 | 1 |
1090 | 1 |
10.4 | 1 |
27.3 | 1 |
162 | 1 |
3560 | 1 |
14.7 | 1 |
83.3 | 1 |
336 | 1 |
55.7 | 1 |
1520 | 1 |
3.9 | 1 |
5.8 | 1 |
8.45 | 1 |
361 | 1 |
369 | 1 |
8230 | 1 |
39.3 | 1 |
43.5 | 1 |
361 | 1 |
12.8 | 1 |
18 | 1 |
9590 | 1 |
555 | 1 |
60.2 | 1 |
21.8 | 1 |
900 | 1 |
6.6 | 1 |
239 | 1 |
3100 | 1 |
3275 | 1 |
682 | 1 |
85.4 | 1 |
10290 | 1 |
770 | 1 |
247.6 | 1 |
12320 | 1 |
113.1 | 1 |
1079 | 1 |
45.6 | 1 |
1630 | 1 |
79.4 | 1 |
508 | 1 |
3190 | 1 |
542 | 1 |
1021 | 1 |
235 | 1 |
251 | 1 |
3160 | 1 |
479 | 1 |
222 | 1 |
15.7 | 1 |
2540 | 1 |
11630 | 1 |
1810 | 1 |
6.9 | 1 |
4.1 | 1 |
15.6 | 1 |
9820 | 1 |
1490 | 1 |
15.7 | 1 |
45.8 | 1 |
7.8 | 1 |
12.8 | 1 |
100.5333 | 1 |
227 | 1 |
70.9 | 1 |
2500 | 1 |
The title of your question is totally misleading.
Which process do you want to repeat 500 times? Run the macro 500 times or do the entire process 500 times?
Since you initially select a random sequence based on a specific seed, do you want to do that same thing 500 times, or replace it with a different seed each time.
I presume that you want to generate 500 different output files, but you haven't specified what you want.
Art, CEO, AnalystFinder.com
You could just add an outer loop within your current macro. However, if you use the same seed (like you did), all 50 replications will be identical. As such, I moved the random selection into the macro, and used a seed of 0.
Also, I presume you only want 50 files and that the ten files created each time can be overwritten.
Check to see if the following does what you want:
data kyphosis; infile cards dlm='09'x; input y1 kyphosis; cards; 28 0 15.5 0 8.2 0 3.4 0 17.3 0 15.2 0 32.9 0 11.1 0 87.5 0 16.2 0 107.9 0 5.7 0 25.6 0 31.2 0 21.6 0 55.6 0 8.8 0 6.5 0 22.1 0 14.4 0 44.2 0 3.7 0 7.8 0 8.9 0 18 0 6.5 0 4.9 0 10.4 0 5 0 5.3 0 6.5 0 6.9 0 8.2 0 21.8 0 6.6 0 7.6 0 15.4 0 59.2 0 5.1 0 10 0 5.3 0 32.6 0 4.6 0 6.9 0 4 0 3.65 0 7.8 0 32.5 0 11.5 0 4 0 10.2 0 2.4 1 719 1 2106.667 1 24000 1 1715 1 3.6 1 521.5 1 1600 1 454 1 109.7 1 23.7 1 464 1 9810 1 255 1 58.7 1 225 1 90.1 1 50 1 5.6 1 4070 1 592 1 28.6 1 6160 1 1090 1 10.4 1 27.3 1 162 1 3560 1 14.7 1 83.3 1 336 1 55.7 1 1520 1 3.9 1 5.8 1 8.45 1 361 1 369 1 8230 1 39.3 1 43.5 1 361 1 12.8 1 18 1 9590 1 555 1 60.2 1 21.8 1 900 1 6.6 1 239 1 3100 1 3275 1 682 1 85.4 1 10290 1 770 1 247.6 1 12320 1 113.1 1 1079 1 45.6 1 1630 1 79.4 1 508 1 3190 1 542 1 1021 1 235 1 251 1 3160 1 479 1 222 1 15.7 1 2540 1 11630 1 1810 1 6.9 1 4.1 1 15.6 1 9820 1 1490 1 15.7 1 45.8 1 7.8 1 12.8 1 100.5333 1 227 1 70.9 1 2500 1 ; /*STEP 2: Repeat the following 50*10 times: i. Fit a logistic regression model on 9/10 of your data (the training dataset) and hold aside the other 1/10 as the test dataset. ii. Use the fitted model to calculate the predicted probability of kyphosis=1 for each observation in the test dataset. iii. Store these predicted probabilities in a new dataset, "predicted".*/ /*The MACRO*/ %macro runit; %do i=1 %to 50; data kyphosis; set kyphosis; theRandom = ranuni(0); run; *Then, divide the dataset into 10 groups based on the random number*/ /*Open the dataset kRanked to verify that each observation is ranked 0 to 9 (10 even groups).*/; proc rank data=kyphosis out = kRanked groups = 10; var theRandom; run; /*PS: Since we will later be appending observations onto the dataset ‘predicted’, we want to make sure that there is not already a dataset called predicted.*/ proc datasets library = work nodetails nolist; delete predicted&i.; run; quit; %do x = 0 %to 9; * asks SAS to repeat the steps 10 times; proc logistic data = kRanked outmodel = model&x.; *Fit the logistic model on 9/10 of the data, and output the model into the dataset model0 (when x=0), model1 (when x=1), etc.; model kyphosis (event="1") = y1; where theRandom ne &x; * Omit 1/10 of the data (eg, when x=0, omit the observations where theRandom=0).; run; data training&x.; set kRanked; where theRandom ne &x; run; data test&x.; * Put the omitted data into the test dataset, called test0 (when x=0), test1 (when x=1), etc.; set kRanked; where theRandom = &x; run; proc logistic inmodel = model&x.; *Apply the logistic model to the test dataset and put the predicted probabilities into a dataset predicted0, predicted1, etc.; score data= test&x. out = _predicted&x.; run; proc append base = predicted&i. data = _predicted&x.; *Keep adding the predicted values to a single dataset ‘predicted.’ ; run; %end; %end; %mend runit; %runit;
Art, CEO, AnalystFinder.com
The following (not tested though) keeps all 500 files, as well as all 500 models and training datasets:
data kyphosis; infile cards dlm='09'x; input y1 kyphosis; cards; 28 0 15.5 0 8.2 0 3.4 0 17.3 0 15.2 0 32.9 0 11.1 0 87.5 0 16.2 0 107.9 0 5.7 0 25.6 0 31.2 0 21.6 0 55.6 0 8.8 0 6.5 0 22.1 0 14.4 0 44.2 0 3.7 0 7.8 0 8.9 0 18 0 6.5 0 4.9 0 10.4 0 5 0 5.3 0 6.5 0 6.9 0 8.2 0 21.8 0 6.6 0 7.6 0 15.4 0 59.2 0 5.1 0 10 0 5.3 0 32.6 0 4.6 0 6.9 0 4 0 3.65 0 7.8 0 32.5 0 11.5 0 4 0 10.2 0 2.4 1 719 1 2106.667 1 24000 1 1715 1 3.6 1 521.5 1 1600 1 454 1 109.7 1 23.7 1 464 1 9810 1 255 1 58.7 1 225 1 90.1 1 50 1 5.6 1 4070 1 592 1 28.6 1 6160 1 1090 1 10.4 1 27.3 1 162 1 3560 1 14.7 1 83.3 1 336 1 55.7 1 1520 1 3.9 1 5.8 1 8.45 1 361 1 369 1 8230 1 39.3 1 43.5 1 361 1 12.8 1 18 1 9590 1 555 1 60.2 1 21.8 1 900 1 6.6 1 239 1 3100 1 3275 1 682 1 85.4 1 10290 1 770 1 247.6 1 12320 1 113.1 1 1079 1 45.6 1 1630 1 79.4 1 508 1 3190 1 542 1 1021 1 235 1 251 1 3160 1 479 1 222 1 15.7 1 2540 1 11630 1 1810 1 6.9 1 4.1 1 15.6 1 9820 1 1490 1 15.7 1 45.8 1 7.8 1 12.8 1 100.5333 1 227 1 70.9 1 2500 1 ; /*STEP 2: Repeat the following 50*10 times: i. Fit a logistic regression model on 9/10 of your data (the training dataset) and hold aside the other 1/10 as the test dataset. ii. Use the fitted model to calculate the predicted probability of kyphosis=1 for each observation in the test dataset. iii. Store these predicted probabilities in a new dataset, "predicted".*/ /*The MACRO*/ %macro runit; %let counter=1; %do i=1 %to 50; data kyphosis; set kyphosis; theRandom = ranuni(0); run; *Then, divide the dataset into 10 groups based on the random number*/ /*Open the dataset kRanked to verify that each observation is ranked 0 to 9 (10 even groups).*/; proc rank data=kyphosis out = kRanked groups = 10; var theRandom; run; /*PS: Since we will later be appending observations onto the dataset ‘predicted’, we want to make sure that there is not already a dataset called predicted.*/ proc datasets library = work nodetails nolist; delete predicted&i.; run; quit; %do x = 0 %to 9; * asks SAS to repeat the steps 10 times; %let counter=%eval(&counter+1); proc logistic data = kRanked outmodel = model&counter.; *Fit the logistic model on 9/10 of the data, and output the model into the dataset model0 (when x=0), model1 (when x=1), etc.; model kyphosis (event="1") = y1; where theRandom ne &x; * Omit 1/10 of the data (eg, when x=0, omit the observations where theRandom=0).; run; data training&counter.; set kRanked; where theRandom ne &x; run; data test&counter.; * Put the omitted data into the test dataset, called test0 (when x=0), test1 (when x=1), etc.; set kRanked; where theRandom = &x; run; proc logistic inmodel = model&counter.; *Apply the logistic model to the test dataset and put the predicted probabilities into a dataset predicted0, predicted1, etc.; score data= test&x. out = _predicted&x.; run; proc append base = predicted&i. data = _predicted&counter.; *Keep adding the predicted values to a single dataset ‘predicted.’ ; run; %end; %end; %mend runit; %runit;
Art, CEO, AnalystFinder.com
Hi art297 and all,
Using the same above dataset (kyphosis).
I need to calculate sensitivity, specificity for each result of y1 ( 2.4 to 24000) , append the results in a dataset. as below and repeat the process for each y1 in the dataset.
I was wondering if you can help to update my codes below to have the results I need.
Here is the codes I have used:
DATA kyphosis;
set kyphosis;
** Create a binary variable;
if y1 <= 39 then y11=0;
else y11=1;
run;
* Calculate Sensitivity and Specificity with 39.2 as a cut;
proc freq data = kyphosis order = formatted;
tables kyphosis * y11 / nocol nopercent;
run;
HERE IS THE RESLTS OF SENSITIVITY AND SPECIFICITY FOR Y1 = 39.2
The FREQ Procedure
|
|
The SAS System |
APEND THE RESULTS AS FOLLOWING:
Y1 | Prevalence | Sensitivity | Specificity | PPV ( Positive Predictive Vlue) | NPV ( Negative Predictive value) adjusted |
39.2 | 0.07 | 75.56 | 90.20 | 9.80 | 24.44 |
2.4 | 0.07 |
|
|
|
|
I need to repeat this process for each result of y1 ( 2.4 to 24000) and append the results in a dataset.
I was wondering if you can help to update my code to do this.
Thanks,
I don't know about the rest of the program, but I would skip the proc rank and use rand('table',...) function to randomly assign groups 0 to 9. Here's how:
data kyphosis (drop=_:);
set kyphosis nobs=nrecs;
array needed {10} _temporary_;
retain _nremain;
if _n_=1 then do;
_nremain=nrecs;
do _col=1 to 10; needed{_col}=ceil(nrecs/10); end;
end;
call streaminit(01982066);
array prb{10} _temporary_ ;
do _col=1 to 10; prb{_col}=needed{_col}/_nremain; end;
rnd=rand('table',of prb{*});
needed{rnd}=needed{rnd}-1;
_nremain=_nremain-1;
run;
Moreover, you can do it for 500 variables at once:
data kyphosis (drop=_:);
set kyphosis nobs=nrecs;
array needed {500,10} _temporary_;
retain _nremain;
if _n_=1 then do;
_nremain=nrecs;
do _row=1 to 500;
do _col=1 to 10; needed{_row,_col}=ceil(nrecs/10); end;
end;
end;
call streaminit(01982066);
array _prb{10} _temporary_;
array rnd{500};
do _row=1 to 500;
do _col=1 to 10; _prb{_col}=min(1,needed{_row,_col}/_nremain); end;
rnd{_row}=rand('table',of _prb{*});
needed{_row,rnd{_row}}=needed{_row,rnd{_row}}-1;
end;
_nremain=_nremain-1;
run;
This will eliminate 499 data steps and 500 proc ranks at the beginning of your script.
Many thanks mkeintz,
My query was related to using the existing the kyphosis above to calculate sensitivity and specificity and append the results. The attached file cal illustrate much better what I am trying to achieve.
Any help?
Thanks
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.