@koyelghosh wrote:
@PaigeMiller @Thank you
You definitely have a valid argument against the suggested approach. However I was thinking, as the length of the set of numbers to choose from increases, the probability that the excluded number will appear will decrease. Thus in very few cases, it will actually enter the loop and spend time there.
When you use RAND('INTEGER',0,7), then the probability of getting the un-desired number 3 is 1/8, regardless of the size of the data set.
@koyelghosh: I think this technique (known as acceptance-rejection method) can be particularly useful in situations where the condition for rejection (here: y=3) is more complex so that it can't be replaced by a simple definition like y=r+(r>=3); (with a suitable random number r) as in your example.
@Anita_n wrote:
I am tring to assign numbers 0,1, 2, 4, 5, 6, 7 (pls note here values are without 3) to a variable y (...)
y=rand("integer" 0, 7);
Hello @Anita_n,
The missing comma after "integer" is a syntax error. Since y is a character variable, you may want to calculate the numeric value (implementing @PaigeMiller's idea) using a temporary numeric variable, say _n_, and then assign the result via PUT function as a character value to y:
_n_=rand('integer', 0, 6);
y=put(_n_+(_n_>2),1.);
Edit: Also, I recommend using the CALL STREAMINIT routine to define a seed value. Otherwise you can't replicate your results (from the RAND function).
Using character values for Y:
data want;
set have;
array yvals {7} _temporary_ ('0' '1' '2' '4' '5' '6' '7');
if x= "8000" and a="0" and y= " " then
y=yvals{ceil(ranuni(12345) * 7))};
run;
Hello all,
Thanks a lot for your contributions. I tested all your suggestions and found out that all of them
gives me the desired output. It depends on which choice one prefers. I'm very greatfull for the help.
Now my question is, is it possible to accept all this possibilities as a solution, since they all work??
Hello @Anita_n,
Glad to hear that the solutions worked for you. You could accept that suggestion as the solution which you eventually adopted, i.e. used in your code, and give likes to the other posts you found helpful.
okay, thanks
I still have a question relating to this topic, why does the dataset increases after executing the do loop. if there anything to add to my code to stop that.
For example I had 1000 datasets in my file after excuting the do loop it increases to 1060 why??
I take it that the number of observations in your dataset increased. Please show the code you ran, I guess you have output statements somewhere.
yes there is an output statement in the code, or else the values wouldn't show
Quote from my previous post:
"Please show the code you ran".
Keep in mind that you only need an output statement in a data step if you want to change the default behaviour of the data step, which is to do an implicit output at the end of each data step iteration. But we can only tell you what's the exact reason for your problem if we see the code.
okay here is a sample code:
data want;
set have;
a=.; b=.; c=.; d=;
if (typ_ill = :"cerv" or typ_ill= :"oval") then sex=2 ;
else if( typ_ill= :"pen" or typ_ill= :"test") then sex =1;
else if (typ_ill ^=:"cerv" or typ_ill ^= :"oval" or typ_ill ^= :"pen" or typ_ill ^= :"test")
then do ;
sex= rand("integer", 1, 2);
put sex;
end;
if ( a=.) and (b=.) then do;
a=rand("integer", 1, 12);
b=rand("integer", 1980, 2000);
c= rand("integer", 1, 2);
d= rand("integer", 1999, 2019);
output;
end;
if x= "8000" and a="0" and y= " " then
do;
y=rand("integer" 0, 7);
do while( y=3);
y=rand("integer" 0, 7);
end;
output;
end;
run;
See my annotations:
data want;
set have;
a=.; b=.; c=.; d=;
if (typ_ill = :"cerv" or typ_ill= :"oval") then sex=2 ;
else if( typ_ill= :"pen" or typ_ill= :"test") then sex =1;
else if (typ_ill ^=:"cerv" or typ_ill ^= :"oval" or typ_ill ^= :"pen" or typ_ill ^= :"test")
/* I guess that the condition immediately above will always be true and is not necessary */
then do ;
sex= rand("integer", 1, 2);
put sex;
end;
if ( a=.) and (b=.) then do;
/* Since you did not change any of the variables a to d, they're still missing,
so this condition will always be TRUE */
a=rand("integer", 1, 12); /* a can only get values between 1 and 12, but will never be zero */
b=rand("integer", 1980, 2000);
c= rand("integer", 1, 2);
d= rand("integer", 1999, 2019);
output;
end;
if x= "8000" and a="0" and y= " " then
/* because of the above, 'a="0"' will never be true, but will cause a NOTE
about the conversion character <> numeric */
/* y will therefore never be set */
do;
y=rand("integer" 0, 7);
do while( y=3);
y=rand("integer" 0, 7);
end;
output;
end;
run;
Because of that, I seriously doubt that this is the complete code you ran, as each incoming observation will enter the first branch that contains an output statement, but never the second.
And you should REALLY start to visually format your code, with consistent indentation to easily identify functional blocks.
So I ask you to provide example data and real code that causes the "increasing observations effect"; just enough observations, in a data step with datalines.
sorry, I can see there was a mistake in the code I sent. Its just that I can't send the real code or data
due to it's delicacy. My only concern is why the output statement is producing duplicates.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.