BookmarkSubscribeRSS Feed
jeka1212
Obsidian | Level 7

Hi all,

 

The  macro below perform 10 fold cross validation. The macro compute the predicted probabilities in each of  test data generated and append the results in the predicted dataset. 

 

I need to repeat the same process 500 times (currently the process is run 1 time). I was if you would be able to advise how I can modify the macro so that I can repeat this process.

 

Note: the data and the codes are below:

 

 


data kyphosis;
set kyphosis;
theRandom = ranuni(86);
run;

*Then, divide the dataset into 10 groups based on the random number;
/*Open the dataset kRanked to verify that each observation is ranked 0 to 9 (10 even groups).*/;


proc rank data=kyphosis out = kRanked groups = 10;
var theRandom;
run;

/*STEP 2: Repeat the following 10 times:
i. Fit a logistic regression model on 9/10 of your data (the training dataset) and hold aside the other 1/10 as the test dataset.
ii. Use the fitted model to calculate the predicted probability of kyphosis=1 for each observation in the test dataset.
iii. Store these predicted probabilities in a new dataset, "predicted".*/

/*PS: Since we will later be appending observations onto the dataset ‘predicted’, we want to make
sure that there is not already a dataset called predicted.*/

proc datasets library = work nodetails nolist;
delete predicted;
run;
quit;

/*The MACRO*/

%macro runit;
%do x = 0 %to 10; * asks SAS to repeat the steps 10 times;
proc logistic data = kRanked outmodel = model&x.; *Fit the logistic model on 9/10 of the data, and
output the model into the dataset model0 (when x=0), model1 (when x=1), etc.;
model kyphosis (event="1") = y1;
where theRandom ne &x; * Omit 1/10 of the data (eg, when x=0, omit the observations where theRandom=0).;
run;
data training&x.;
set kRanked;
where theRandom ne &x;
run;
data test&x.; * Put the omitted data into the test dataset, called test0 (when x=0), test1 (when x=1), etc.;
set kRanked;
where theRandom = &x;
run;

proc logistic inmodel = model&x.; *Apply the logistic model to the test dataset and put
the predicted probabilities into a dataset predicted0, predicted1, etc.;
score data= test&x. out = predicted&x.;
run;
proc append base = predicted data = predicted&x.; *Keep adding the predicted values to a single dataset ‘predicted.’ ;
run;
%end;
%mend runit;
%runit;

 

My dataset called  kyphosis  can be found below

 

y1 kyphosis
28 0
15.5 0
8.2 0
3.4 0
17.3 0
15.2 0
32.9 0
11.1 0
87.5 0
16.2 0
107.9 0
5.7 0
25.6 0
31.2 0
21.6 0
55.6 0
8.8 0
6.5 0
22.1 0
14.4 0
44.2 0
3.7 0
7.8 0
8.9 0
18 0
6.5 0
4.9 0
10.4 0
5 0
5.3 0
6.5 0
6.9 0
8.2 0
21.8 0
6.6 0
7.6 0
15.4 0
59.2 0
5.1 0
10 0
5.3 0
32.6 0
4.6 0
6.9 0
4 0
3.65 0
7.8 0
32.5 0
11.5 0
4 0
10.2 0
2.4 1
719 1
2106.667 1
24000 1
1715 1
3.6 1
521.5 1
1600 1
454 1
109.7 1
23.7 1
464 1
9810 1
255 1
58.7 1
225 1
90.1 1
50 1
5.6 1
4070 1
592 1
28.6 1
6160 1
1090 1
10.4 1
27.3 1
162 1
3560 1
14.7 1
83.3 1
336 1
55.7 1
1520 1
3.9 1
5.8 1
8.45 1
361 1
369 1
8230 1
39.3 1
43.5 1
361 1
12.8 1
18 1
9590 1
555 1
60.2 1
21.8 1
900 1
6.6 1
239 1
3100 1
3275 1
682 1
85.4 1
10290 1
770 1
247.6 1
12320 1
113.1 1
1079 1
45.6 1
1630 1
79.4 1
508 1
3190 1
542 1
1021 1
235 1
251 1
3160 1
479 1
222 1
15.7 1
2540 1
11630 1
1810 1
6.9 1
4.1 1
15.6 1
9820 1
1490 1
15.7 1
45.8 1
7.8 1
12.8 1
100.5333 1
227 1
70.9 1
2500 1
10 REPLIES 10
art297
Opal | Level 21

The title of your question is totally misleading.

 

Which process do you want to repeat 500 times? Run the macro 500 times or do the entire process 500 times?

 

Since you initially select a random sequence based on a specific seed, do you want to do that same thing 500 times, or replace it with a different seed each time.

 

I presume that you want to generate 500 different output files, but you haven't specified what you want.

 

Art, CEO, AnalystFinder.com

 

jeka1212
Obsidian | Level 7
Sorry for not being clear.

Yes - I want to generate 500 different output files.

so to do that I need to do the entire process 50 times, i.e, 1 process generates 10 different output files.
art297
Opal | Level 21

You could just add an outer loop within your current macro. However, if you use the same seed (like you did), all 50 replications will be identical. As such, I moved the random selection into the macro, and used a seed of 0.

 

Also, I presume you only want 50 files and that the ten files created each time can be overwritten.

 

Check to see if the following does what you want:

data kyphosis;
  infile cards dlm='09'x;
  input y1	kyphosis;
  cards;
28	0
15.5	0
8.2	0
3.4	0
17.3	0
15.2	0
32.9	0
11.1	0
87.5	0
16.2	0
107.9	0
5.7	0
25.6	0
31.2	0
21.6	0
55.6	0
8.8	0
6.5	0
22.1	0
14.4	0
44.2	0
3.7	0
7.8	0
8.9	0
18	0
6.5	0
4.9	0
10.4	0
5	0
5.3	0
6.5	0
6.9	0
8.2	0
21.8	0
6.6	0
7.6	0
15.4	0
59.2	0
5.1	0
10	0
5.3	0
32.6	0
4.6	0
6.9	0
4	0
3.65	0
7.8	0
32.5	0
11.5	0
4	0
10.2	0
2.4	1
719	1
2106.667	1
24000	1
1715	1
3.6	1
521.5	1
1600	1
454	1
109.7	1
23.7	1
464	1
9810	1
255	1
58.7	1
225	1
90.1	1
50	1
5.6	1
4070	1
592	1
28.6	1
6160	1
1090	1
10.4	1
27.3	1
162	1
3560	1
14.7	1
83.3	1
336	1
55.7	1
1520	1
3.9	1
5.8	1
8.45	1
361	1
369	1
8230	1
39.3	1
43.5	1
361	1
12.8	1
18	1
9590	1
555	1
60.2	1
21.8	1
900	1
6.6	1
239	1
3100	1
3275	1
682	1
85.4	1
10290	1
770	1
247.6	1
12320	1
113.1	1
1079	1
45.6	1
1630	1
79.4	1
508	1
3190	1
542	1
1021	1
235	1
251	1
3160	1
479	1
222	1
15.7	1
2540	1
11630	1
1810	1
6.9	1
4.1	1
15.6	1
9820	1
1490	1
15.7	1
45.8	1
7.8	1
12.8	1
100.5333	1
227	1
70.9	1
2500	1
;

/*STEP 2: Repeat the following 50*10 times:
i. Fit a logistic regression model on 9/10 of your data (the training dataset) and hold aside the other 1/10 as the test dataset.
ii. Use the fitted model to calculate the predicted probability of kyphosis=1 for each observation in the test dataset.
iii. Store these predicted probabilities in a new dataset, "predicted".*/


/*The MACRO*/
%macro runit;
  %do i=1 %to 50;
    data kyphosis;
      set kyphosis;
      theRandom = ranuni(0);
    run;

*Then, divide the dataset into 10 groups based on the random number*/
/*Open the dataset kRanked to verify that each observation is ranked 0 to 9 (10 even groups).*/;

proc rank data=kyphosis out = kRanked groups = 10;
var theRandom;
run;
  /*PS: Since we will later be appending observations onto the dataset ‘predicted’, we want to make
  sure that there is not already a dataset called predicted.*/

  proc datasets library = work nodetails nolist;
    delete predicted&i.;
  run;
  quit;

%do x = 0 %to 9; * asks SAS to repeat the steps 10 times;

proc logistic data = kRanked outmodel = model&x.; *Fit the logistic model on 9/10 of the data, and
output the model into the dataset model0 (when x=0), model1 (when x=1), etc.;
model kyphosis (event="1") = y1;
where theRandom ne &x; * Omit 1/10 of the data (eg, when x=0, omit the observations where theRandom=0).;
run;

data training&x.;
set kRanked;
where theRandom ne &x;
run;

data test&x.; * Put the omitted data into the test dataset, called test0 (when x=0), test1 (when x=1), etc.;
set kRanked;
where theRandom = &x;
run;


proc logistic inmodel = model&x.; *Apply the logistic model to the test dataset and put
the predicted probabilities into a dataset predicted0, predicted1, etc.;

score data= test&x. out = _predicted&x.;
run;
proc append base = predicted&i. data = _predicted&x.; *Keep adding the predicted values to a single dataset ‘predicted.’ ;
run;
%end;
%end;
%mend runit;

%runit;

Art, CEO, AnalystFinder.com

 

 

 

jeka1212
Obsidian | Level 7
Many thanks. I really appreciated your help.

So in my case I want to keep all 500 files and that the ten files created each time need to be kept for further analysis.

I was wondering if you can advise me how to keep those 500 files?

Many thanks
art297
Opal | Level 21

The following (not tested though) keeps all 500 files, as well as all 500 models and training datasets:

 

data kyphosis;
  infile cards dlm='09'x;
  input y1	kyphosis;
  cards;
28	0
15.5	0
8.2	0
3.4	0
17.3	0
15.2	0
32.9	0
11.1	0
87.5	0
16.2	0
107.9	0
5.7	0
25.6	0
31.2	0
21.6	0
55.6	0
8.8	0
6.5	0
22.1	0
14.4	0
44.2	0
3.7	0
7.8	0
8.9	0
18	0
6.5	0
4.9	0
10.4	0
5	0
5.3	0
6.5	0
6.9	0
8.2	0
21.8	0
6.6	0
7.6	0
15.4	0
59.2	0
5.1	0
10	0
5.3	0
32.6	0
4.6	0
6.9	0
4	0
3.65	0
7.8	0
32.5	0
11.5	0
4	0
10.2	0
2.4	1
719	1
2106.667	1
24000	1
1715	1
3.6	1
521.5	1
1600	1
454	1
109.7	1
23.7	1
464	1
9810	1
255	1
58.7	1
225	1
90.1	1
50	1
5.6	1
4070	1
592	1
28.6	1
6160	1
1090	1
10.4	1
27.3	1
162	1
3560	1
14.7	1
83.3	1
336	1
55.7	1
1520	1
3.9	1
5.8	1
8.45	1
361	1
369	1
8230	1
39.3	1
43.5	1
361	1
12.8	1
18	1
9590	1
555	1
60.2	1
21.8	1
900	1
6.6	1
239	1
3100	1
3275	1
682	1
85.4	1
10290	1
770	1
247.6	1
12320	1
113.1	1
1079	1
45.6	1
1630	1
79.4	1
508	1
3190	1
542	1
1021	1
235	1
251	1
3160	1
479	1
222	1
15.7	1
2540	1
11630	1
1810	1
6.9	1
4.1	1
15.6	1
9820	1
1490	1
15.7	1
45.8	1
7.8	1
12.8	1
100.5333	1
227	1
70.9	1
2500	1
;

/*STEP 2: Repeat the following 50*10 times:
i. Fit a logistic regression model on 9/10 of your data (the training dataset) and hold aside the other 1/10 as the test dataset.
ii. Use the fitted model to calculate the predicted probability of kyphosis=1 for each observation in the test dataset.
iii. Store these predicted probabilities in a new dataset, "predicted".*/


/*The MACRO*/
%macro runit;
  %let counter=1;
  %do i=1 %to 50;
    data kyphosis;
      set kyphosis;
      theRandom = ranuni(0);
    run;

*Then, divide the dataset into 10 groups based on the random number*/
/*Open the dataset kRanked to verify that each observation is ranked 0 to 9 (10 even groups).*/;

proc rank data=kyphosis out = kRanked groups = 10;
var theRandom;
run;
  /*PS: Since we will later be appending observations onto the dataset ‘predicted’, we want to make
  sure that there is not already a dataset called predicted.*/

  proc datasets library = work nodetails nolist;
    delete predicted&i.;
  run;
  quit;

%do x = 0 %to 9; * asks SAS to repeat the steps 10 times;

%let counter=%eval(&counter+1);

proc logistic data = kRanked outmodel = model&counter.; *Fit the logistic model on 9/10 of the data, and
output the model into the dataset model0 (when x=0), model1 (when x=1), etc.;
model kyphosis (event="1") = y1;
where theRandom ne &x; * Omit 1/10 of the data (eg, when x=0, omit the observations where theRandom=0).;
run;


data training&counter.;
set kRanked;
where theRandom ne &x;
run;

data test&counter.; * Put the omitted data into the test dataset, called test0 (when x=0), test1 (when x=1), etc.;
set kRanked;
where theRandom = &x;
run;


proc logistic inmodel = model&counter.; *Apply the logistic model to the test dataset and put
the predicted probabilities into a dataset predicted0, predicted1, etc.;

score data= test&x. out = _predicted&x.;
run;
proc append base = predicted&i. data = _predicted&counter.; *Keep adding the predicted values to a single dataset ‘predicted.’ ;
run;
%end;
%end;
%mend runit;

%runit;

Art, CEO, AnalystFinder.com

 

 

 

jeka1212
Obsidian | Level 7
Many thanks. It does indeed give me what I wanted.

Much appreciated your help
jeka1212
Obsidian | Level 7

Hi 

 

I need to calculate sensitivity, specificity for each result of y1 ( 2.4 to 24000) , append the results in a  dataset. as below and repeat the process for each y1 in the dataset. 

 

I was wondering if you can help to update my codes below to have the results I need. 

 

 

DATA kyphosis;

set kyphosis;

** Create a binary variable;

if y1 <= 39 then y11=0;

else y11=1;

run;

 

* Calculate Sensitivity and Specificity with 39.2 as a cut;

 

 

proc freq data = kyphosis order = formatted;

            tables kyphosis * y11 / nocol nopercent;

run;

 

HERE IS THE RESLTS OF SENSITIVITY AND SPECIFICITY FOR Y1 = 39.2

 

The FREQ Procedure

Frequency

Row Pct

Table of kyphosis by y11

kyphosis

y11

0

1

Total

0

46

90.20

5

9.80

51

 

1

22

24.44

68

75.56

90

 

Total

68

73

141

The SAS System

 

APEND THE RESULTS AS FOLLOWING:

 

Y1

Prevalence

Sensitivity

Specificity

PPV ( Positive Predictive Vlue)

NPV ( Negative Predictive value) adjusted

39.2

0.07

75.56

90.20

9.80

24.44

 2.4

 0.07

 

 

 

 

 

I need to repeat this process for each result of y1 ( 2.4 to 24000)  and append the results in a  dataset.

I was wondering if you can help to update my code to do this.

 

Thanks,

jeka1212
Obsidian | Level 7
Hi art297,

Using the same above dataset (kyphosis).

I need to calculate sensitivity, specificity for each result of y1 ( 2.4 to 24000) , append the results in a dataset. as below and repeat the process for each y1 in the dataset.



I was wondering if you can help to update my codes below to have the results I need.



Here is the codes I have used:



DATA kyphosis;

set kyphosis;

** Create a binary variable;

if y1 <= 39 then y11=0;

else y11=1;

run;



* Calculate Sensitivity and Specificity with 39.2 as a cut;





proc freq data = kyphosis order = formatted;

tables kyphosis * y11 / nocol nopercent;

run;



HERE IS THE RESLTS OF SENSITIVITY AND SPECIFICITY FOR Y1 = 39.2



The FREQ Procedure

Frequency

Row Pct

Table of kyphosis by y11

kyphosis

y11

0

1

Total

0

46

90.20

5

9.80

51



1

22

24.44

68

75.56

90



Total

68

73

141

The SAS System



APEND THE RESULTS AS FOLLOWING:



Y1 = 39.2

Sensitivity = 75.56

Specificity= 90.20

PPV ( Positive Predictive Value) = 9.80

NPV = 24.44

I need to repeat this process for each result of y1 ( 2.4 to 24000) and append the results in a dataset.


I was wondering if you can help to update my code to do this.



Thanks,
mkeintz
PROC Star

I don't know about the rest of the program, but I would skip the proc rank and use    rand('table',...) function to randomly assign groups 0 to 9.  Here's how:

 

data kyphosis (drop=_:);
  set kyphosis nobs=nrecs;
  array needed {10} _temporary_;
  retain _nremain;
  if _n_=1 then do;
    _nremain=nrecs;
    do _col=1 to 10; needed{_col}=ceil(nrecs/10); end;
  end;

  call streaminit(01982066);
  array prb{10} _temporary_ ;
  do _col=1 to 10;  prb{_col}=needed{_col}/_nremain; end;

  rnd=rand('table',of prb{*});
  needed{rnd}=needed{rnd}-1;
  _nremain=_nremain-1;
run;

 

Moreover, you can do it for 500 variables at once:

data kyphosis (drop=_:);
  set kyphosis nobs=nrecs;
  array needed {500,10} _temporary_;
  retain _nremain;
  if _n_=1 then do;
    _nremain=nrecs;
    do _row=1 to 500;
      do _col=1 to 10; needed{_row,_col}=ceil(nrecs/10); end;
    end;
  end;
  call streaminit(01982066);
  array _prb{10} _temporary_;
  array rnd{500};
  do _row=1 to 500;
    do _col=1 to 10;  _prb{_col}=min(1,needed{_row,_col}/_nremain); end;
    rnd{_row}=rand('table',of _prb{*});
    needed{_row,rnd{_row}}=needed{_row,rnd{_row}}-1;
  end;
  _nremain=_nremain-1;
run;

 

This will eliminate 499 data steps and 500 proc ranks at the beginning of your script.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jeka1212
Obsidian | Level 7

Many thanks  

 

 

 

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2118 views
  • 0 likes
  • 3 in conversation