Hello experts,
I tried to repeat the codes below 200 times but when I add the do loop, SAS says "array subscript out of range". Could you please show me how to achieve what I want? When I change the seed sometimes it works sometimes it does not.
Many thanks!
%let nobs=100;
%let nboot=200;
data weight_1;
do i=1 to &nboot.;
output;
end;
data weight_2;
set weight_1 end=lastobs;
call streaminit (1293);
do n=1 to &nboot.;
array R [&Nobs] _temporary_;
if _N_=1 then do;
do I=1 to &NObs.;
R[I}=rand('gamma',1,1);
end;
diff=&Nobs. - sum(of r[*]);
do i=1 to diff;
R[rand('gamma', &Nobs.,1)] + (diff>0);
end;
VET+sum(of R[*]);
end;
ran=r[_N_];
end;
run;
First, thank you for posting the code in a text box - MUCH easier on my eyes - and it preserves the fixed pitch font in the SAS log - which is occasionally important.
I have checked on my previous comment
I presume the rand('gamma',&NOBs.,1) returns a continuous result in the (0,100] interval - i.e. greater than zero and up through 100.
I should have done this before, because the SAS Functions and Call Routines documentation for the GAMMA distribution just says the the RAND('GAMMA'... function will produce a positive number. There is no upper limit, so you may very well be generating numbers greater than 100, which could cause the error message you report.
So if you have a strategy of a fixed number (100 in your case) of buckets selected based on the RAND('GAMMA' function, you will have to decide whether 100 buckets (i.e. macrovar NOBS=100) is enough, and what to do about results from RAND('GAMMA' that exceed &NOBS. You could replace the single statement
R[rand('gamma', &Nobs.,1)] + (diff>0);
with
J=ceil(rand('gamma', &Nobs.,1));
and then decide whether to add
R[J] + (diff>0);
depending on the value of J.
Of course, it's not at all clear what your goal is, so you may need to take another strategy.
Whenever you get errors in the log, you need to show us the ENTIRE log for this data step. Please show us the ENTIRE log for this DATA step by copying it as text and then pasting it into the window that appears when you click on the </> icon.
1 The SAS System 10:43 Sunday, May 1, 2022 1 ;*';*";*/;quit;run; 2 OPTIONS PAGENO=MIN; 3 %LET _CLIENTTASKLABEL='Program'; 4 %LET _CLIENTPROCESSFLOWNAME='Standalone Not In Project'; 5 %LET _CLIENTPROJECTPATH=''; 6 %LET _CLIENTPROJECTPATHHOST=''; 7 %LET _CLIENTPROJECTNAME=''; 8 %LET _SASPROGRAMFILE=''; 9 %LET _SASPROGRAMFILEHOST=''; 10 11 ODS _ALL_ CLOSE; 12 OPTIONS DEV=SVG; 13 GOPTIONS XPIXELS=0 YPIXELS=0; 14 %macro HTML5AccessibleGraphSupported; 15 %if %_SAS_VERCOMP(9, 4, 4) >= 0 %then ACCESSIBLE_GRAPH; 16 %mend; 17 FILENAME EGHTML TEMP; 18 ODS HTML5(ID=EGHTML) FILE=EGHTML 19 OPTIONS(BITMAP_MODE='INLINE') 20 %HTML5AccessibleGraphSupported 21 ENCODING='utf-8' 22 STYLE=HtmlBlue 23 NOGTITLE 24 NOGFOOTNOTE 25 GPATH=&sasworklocation 26 ; NOTE: Writing HTML5(EGHTML) Body file: EGHTML 27 28 %let nobs=100; 29 30 %let nboot=200; 31 32 33 34 data weight_1; 35 36 do i=1 to &nboot.; 37 38 output; 39 40 end; 41 42 43 NOTE: The data set WORK.WEIGHT_1 has 200 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 44 data weight_2; 45 46 set weight_1 end=lastobs; 47 call streaminit (1293); 48 do n=1 to &nboot.; 49 50 array R [&Nobs] _temporary_; 2 The SAS System 10:43 Sunday, May 1, 2022 51 52 if _N_=1 then do; 53 54 do I=1 to &NObs.; 55 56 R[I}=rand('gamma',1,1); 57 58 end; 59 60 61 62 diff=&Nobs. - sum(of r[*]); 63 64 do i=1 to diff; 65 66 R[rand('gamma', &Nobs.,1)] + (diff>0); 67 68 end; 69 70 VET+sum(of R[*]); 71 72 end; 73 74 ran=r[_N_]; 75 76 end; 77 78 run; ERROR: Array subscript out of range at line 66 column 1. lastobs=0 i=1 n=2 diff=4.7355896866 VET=105.65851235 ran=0.3109976073 _ERROR_=1 _N_=1 NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 2 observations read from the data set WORK.WEIGHT_1. WARNING: The data set WORK.WEIGHT_2 may be incomplete. When this step was stopped there were 0 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.10 seconds cpu time 0.03 seconds 79 80 %LET _CLIENTTASKLABEL=; 81 %LET _CLIENTPROCESSFLOWNAME=; 82 %LET _CLIENTPROJECTPATH=; 83 %LET _CLIENTPROJECTPATHHOST=; 84 %LET _CLIENTPROJECTNAME=; 85 %LET _SASPROGRAMFILE=; 86 %LET _SASPROGRAMFILEHOST=; 87 88 ;*';*";*/;quit;run; 89 ODS _ALL_ CLOSE; 90 91 92 QUIT; RUN; 93
Thanks for your suggestion. The log is attached. You can also just run the code to see it because it is simulated data.
Line 66
66 R[rand('gamma', &Nobs.,1)] + (diff>0);
You are creating a value from a gamma distribution, which is the index of array R. Values from a gamma distribution are continuous, they are not necessarily integers, and the index to array R must be an integer between 1 and the maximum size of the array. How to fix this? I have no idea, as I don't know what this line is trying to do. Please explain, not only this line, but what your program is trying to do.
Thank you so much for your quick response.
I am trying to do fractional bootstrapping (https://arxiv.org/pdf/1808.08199.pdf ) for rare outcomes. The number of resampling (or number of hits) is not an integer but continuous (we call weight) and the sum of these weights have to equal the sample size and variance = mean
For example, I want the expected weight.
N ID expected weight
1 1 0.2
1 2 2.2
1 3 0.6
2 1 0.5
2 2 0.8
2 3 1.7
What is line 66 supposed to be doing?
The most important response has already been provided to you by @PaigeMiller .
To get the best quality help, provide the best quality problem description - in this case a log of your program, which probably tells you exactly which line in your program has the array index out-of-order condition.
However, I will take a guess. You have defined the. array R of 100 elements, with 100 as the upper bound of the array index, and 1 as the lower-bound. You have references to
R[I]
but I seems to always be an integer from 1 to 100 - i.e. within bounds.
So what about the reference (with NOBS=100)
R[rand('gamma', &Nobs.,1)] + (diff>0);
I presume the rand('gamma',&NOBs.,1) returns a continuous result in the (0,100] interval - i.e. greater than zero and up through 100.
This means it may occasionally return a value less than 1.0. Let's say it returns a 0.8. But SAS will interpret the array reference R[0.8] as R[0] which doesn't exist. Probably some seeds generate a RAND result less than 1.0 earlier than others, generating the error message your refer to.
So be aware that SAS rounds down non-integer values to integers when indexing arrays. Is that the behavior you want? For instance, I'm sure you'll never populate the R[100] element, since you can't generate a number greater than 100. Its probability is virtually zero.
Edited additional note: You probably could change
R[rand('gamma', &Nobs.,1)] + (diff>0);
to
R[1+rand('gamma', &Nobs.,1)] + (diff>0);
which would effectively round UP non-integer RAND results. This would likely eliminate the occasional out-of-range notes. But I have no idea whether it will do the task that you intend, or whether it is consistent with your other references to the array.
Thanks for your detailed explanation. I want value 0<"value"<100
@Dunne wrote:
Thanks for your detailed explanation. I want value 0<"value"<100
And isn't that exactly what you are already generating in your RAND('gamma',100,1) expression?
It appears that you want to use that function to randomly draw from 100 "buckets" indexed by the array. Right now you have no bucket for RAND results less than 1, But you do have 100 buckets for RAND results for 1<=RAND<=100. But that last bucket (number 100) is mapped only to a RAND result of exactly 100, which I suspect has a "probability mass" of zero. Its neighbor, bucket number 99, is mapped to RAND results from 99<=RAND<100, with a probability >0
In effect the array reference is using the FLOOR function to convert a non-integer result to the integer below it to identify an array element. Instead you could apply the CEIL (for ceiling) function against the RAND function, as in
R[CEIL(rand('gamma', &Nobs.,1))] + (diff>0);
This would map the RAND function to the array index as follows:
RAND Results | Element number for R[RAND....] |
0<RAND<=1 | 1 |
1<RAND<=2 | 2 |
... | |
... | |
98<RAND<=99 | 99 |
99<RAND<=100 | 100 |
So you would now be randomly drawing from 100 ordered buckets of equal "size" using a gamma distribution.
Thank you so much!
It works now if I generate only 1 sample (by deleting line 22 , 42, 43 below). However, it failed to repeat the process &nboot. times (I want to generate &nboot. samples of the entire process). Could you please help me with this as well?
Instead of pasting an image of the log into your note, could you please capture the text and insert it as fixed width content (use the "</>" icon to open a text box for pasting). I find it difficult to read the image.
1 The SAS System 18:27 Sunday, May 1, 2022 1 ;*';*";*/;quit;run; 2 OPTIONS PAGENO=MIN; 3 %LET _CLIENTTASKLABEL='Program'; 4 %LET _CLIENTPROCESSFLOWNAME='Standalone Not In Project'; 5 %LET _CLIENTPROJECTPATH=''; 6 %LET _CLIENTPROJECTPATHHOST=''; 7 %LET _CLIENTPROJECTNAME=''; 8 %LET _SASPROGRAMFILE=''; 9 %LET _SASPROGRAMFILEHOST=''; 10 11 ODS _ALL_ CLOSE; 12 OPTIONS DEV=SVG; 13 GOPTIONS XPIXELS=0 YPIXELS=0; 14 %macro HTML5AccessibleGraphSupported; 15 %if %_SAS_VERCOMP(9, 4, 4) >= 0 %then ACCESSIBLE_GRAPH; 16 %mend; 17 FILENAME EGHTML TEMP; 18 ODS HTML5(ID=EGHTML) FILE=EGHTML 19 OPTIONS(BITMAP_MODE='INLINE') 20 %HTML5AccessibleGraphSupported 21 ENCODING='utf-8' 22 STYLE=HtmlBlue 23 NOGTITLE 24 NOGFOOTNOTE 25 GPATH=&sasworklocation 26 ; NOTE: Writing HTML5(EGHTML) Body file: EGHTML 27 28 %let nobs=100; 29 %let nboot=200; 30 31 data weight_1; 32 do i=1 to &nobs.; 33 output; 34 end; 35 run; NOTE: The data set WORK.WEIGHT_1 has 100 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 36 37 38 data weight_2; 39 set weight_1 end=lastobs; 40 call streaminit (1293); 41 do n=1 to &nboot.; 42 43 array R [&nobs.] _temporary_; 44 45 if _N_=1 then do; 46 47 do I=1 to (&nobs.); 48 49 R(I)=rand('gamma',1,1); 50 2 The SAS System 18:27 Sunday, May 1, 2022 51 end; 52 53 54 diff=&nobs. - sum(of R[*]); 55 56 do I=1 to diff; 57 58 R[ceil(rand('gamma', &nobs.,1))] + (diff>0); 59 60 end; 61 62 VET+sum(of R[*]); 63 64 end; 65 66 ran=r[_N_]; 67 output; 68 end; 69 70 run; ERROR: Array subscript out of range at line 58 column 1. lastobs=0 i=1 n=2 diff=4.7355896866 VET=105.65851235 ran=0.3109976073 _ERROR_=1 _N_=1 NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 2 observations read from the data set WORK.WEIGHT_1. WARNING: The data set WORK.WEIGHT_2 may be incomplete. When this step was stopped there were 1 observations and 5 variables. WARNING: Data set WORK.WEIGHT_2 was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 71 72 %LET _CLIENTTASKLABEL=; 73 %LET _CLIENTPROCESSFLOWNAME=; 74 %LET _CLIENTPROJECTPATH=; 75 %LET _CLIENTPROJECTPATHHOST=; 76 %LET _CLIENTPROJECTNAME=; 77 %LET _SASPROGRAMFILE=; 78 %LET _SASPROGRAMFILEHOST=; 79 80 ;*';*";*/;quit;run; 81 ODS _ALL_ CLOSE; 82 83 84 QUIT; RUN; 85
If I remove line 41, 67, 68, the code will work. But I want to replicate the code &nboot. times. Should I delete "call streaminit(1293)" to have different seeds for each sample?
Sorry for my slow response because I want to check my code carefully.
First, thank you for posting the code in a text box - MUCH easier on my eyes - and it preserves the fixed pitch font in the SAS log - which is occasionally important.
I have checked on my previous comment
I presume the rand('gamma',&NOBs.,1) returns a continuous result in the (0,100] interval - i.e. greater than zero and up through 100.
I should have done this before, because the SAS Functions and Call Routines documentation for the GAMMA distribution just says the the RAND('GAMMA'... function will produce a positive number. There is no upper limit, so you may very well be generating numbers greater than 100, which could cause the error message you report.
So if you have a strategy of a fixed number (100 in your case) of buckets selected based on the RAND('GAMMA' function, you will have to decide whether 100 buckets (i.e. macrovar NOBS=100) is enough, and what to do about results from RAND('GAMMA' that exceed &NOBS. You could replace the single statement
R[rand('gamma', &Nobs.,1)] + (diff>0);
with
J=ceil(rand('gamma', &Nobs.,1));
and then decide whether to add
R[J] + (diff>0);
depending on the value of J.
Of course, it's not at all clear what your goal is, so you may need to take another strategy.
If I want to generate not integer numbers but continuous numbers>0, how should I modify the below? I did not familiar with simulation but tried to re-use someone's code online (so I don't really understand all the codes). Many thanks!
R[I]
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.