Solved: Re: Assigning random numbers in a particular range

danwarags · Posted 09-01-2016 11:22 AM

For a given set of 2000 IDs I want to assign random numbers in the range 100000- 102000. How should I code this. For example:

Obs	ID
1	1456
2	3765
3	35632
4	89564
.	.
.	.
.	.
2000	580058

My output should look something like:

Obs	ID	Random
1	1456	100000
2	3765	101997
3	35632	101456
4	89564	100002
.	.
.	.
.	.
2000	580058	102000

In the above example, the number of observations in the dataset are 2000. The Random numbers are assigned such that they are in particular range i.e 100000-102000.

Thanks for your help

Rick_SAS · Posted 09-01-2016 11:51 AM

It sounds like you want to assign a sequence of 2000 numbers to a set of 2000 observations in a random fashion. This is different than "assigning random numbers." If you assign a sequence randomly, there are no duplicate values, whereas you will almost surely get a dupicate value if you assign random numbers.

If there are 2000 observations, I assume you want the RANDOM variable to be values in [10000, 101999] or

[10001, 102000]. I'll assume the latter. Try this:

data Have;   /* test data set */
set Sashelp.class;
run;

/* generate sequence of numbers */
data ID;
if _N_=0 then set Have nobs=N;
do i = 1 to N;
   sortOrder = rand("Uniform");
   ID = 10000 + i;   /* sequence of numbers */
   output;
end;
keep sortOrder ID;
run;

/* randomly permute the ID values */
proc sort data=ID; 
by sortOrder;
run;

/* merge with original data */
data Want;
merge Have ID(drop=sortOrder);
run;

View solution in original post

Rick_SAS · Posted 09-01-2016 11:51 AM

It sounds like you want to assign a sequence of 2000 numbers to a set of 2000 observations in a random fashion. This is different than "assigning random numbers." If you assign a sequence randomly, there are no duplicate values, whereas you will almost surely get a dupicate value if you assign random numbers.

If there are 2000 observations, I assume you want the RANDOM variable to be values in [10000, 101999] or

[10001, 102000]. I'll assume the latter. Try this:

data Have;   /* test data set */
set Sashelp.class;
run;

/* generate sequence of numbers */
data ID;
if _N_=0 then set Have nobs=N;
do i = 1 to N;
   sortOrder = rand("Uniform");
   ID = 10000 + i;   /* sequence of numbers */
   output;
end;
keep sortOrder ID;
run;

/* randomly permute the ID values */
proc sort data=ID; 
by sortOrder;
run;

/* merge with original data */
data Want;
merge Have ID(drop=sortOrder);
run;

danwarags · Posted 09-02-2016 12:13 PM

Hello Rick,

Thank you so much for the code. This really helped me. The only thing I am finding hard is to run the last step (/*merge with original data*/). The error says that I need to specify BY statement. I need the ID variable and the random assigned variable in the same dataset.

This is what I used to run the code. The last step gives me error saying I need to specify BY statement. How do I modify the last step.

data have;
input ID;
datalines;
101
103
107
506
845
;

data ID;
if _N_=0 then set have  nobs=N;
do i = 1 to N;
   sortOrder = rand("Uniform");
   RANDOM_ID = 900000 + i;   /* sequence of numbers */
   output;
end;
keep RANDOM_ID sortOrder ;
run;

proc sort data=ID; 
by sortOrder;
run;

data Want;
merge Have ID(drop=sortOrder);
run;

Thanks!

Rick_SAS · Posted 09-02-2016 12:54 PM

Please post your SAS log. The code you posted runs without error on my version of SAS. The BY statement is optional for a one-to-one merge, so It doesn't make sense that you are seeing an error message.

danwarags · Posted 09-02-2016 02:00 PM

Thanks for quick reply Rick. Please find my attached log:

284 dm 'clear log'; dm 'clear output';
285
286 data have;
287 input ID;
288 datalines;

NOTE: The data set WORK.HAVE has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds

294 ;
295
296 data ID;
297 if _N_=0 then set have nobs=N;
298 do i = 1 to N;
299 sortOrder = rand("Uniform");
300 RANDOM_ID = 900000 + i; /* sequence of numbers */
301 output;
302 end;
303 keep RANDOM_ID sortOrder ;
304 run;

NOTE: DATA STEP stopped due to looping.
NOTE: The data set WORK.ID has 5 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds

305
306 proc sort data=ID;
307 by sortOrder;
308 run;

NOTE: There were 5 observations read from the data set WORK.ID.
NOTE: SAS sort was used.
NOTE: The data set WORK.ID has 5 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

309
310 data Want;
311 merge Have ID(drop=sortOrder);
312 run;

ERROR: No BY statement was specified for a MERGE statement.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0
observations and 2 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds

Thanks!

Rick_SAS · Posted 09-02-2016 03:50 PM

OK. Your site must be configured to run with the MERGENOBY option set to ERROR. Run the following:

proc options option=mergenoby value; run;

On my system the SAS log reports:

Option Value Information For SAS Option MERGENOBY
Value: NOWARN
Scope: Default
How option value set: Shipped Default

I'm guessing that your value is ERROR.

You have two options, The easiest is to change the option:

options MERGENOBY=NOWARN;

If you can't do that, then use

obs = _N_;

to create a common variable for both data sets and then include BY OBS; when you merge.

danwarags · Posted 09-02-2016 04:09 PM

Woww. It worked when I set options MERGENOBY= NOWARN. Thank you Rick. Appreciate your help.

Astounding · Posted 09-01-2016 11:53 AM

Here's one way (there's usually more than one way in SAS):

random = 99999 + ceil( 2001 * ranuni(12345) );

It is possible to assign the same random number on two separate observations this way.

ballardw · Posted 09-01-2016 11:54 AM

If your unstated requirement is not to repeat any of the values then one approach:

1) assign a random number using Ranuni or Rand('uniform') functions

2) sort by the random number;

3) Replace the random number with 100000 + _n_;

4) You may want to resort by the Id.

Question: Do you want to exhaust all of the numbers from 100000 to 102000? That is 2001 numbers. So you may be looking for 100000 to 101999 (If so then 100000+_n_ -1) or 100001 to 102000 (which the above does)

KachiM · Posted 09-01-2016 01:56 PM

A simpler way with the use of Sashelp.class:

There are 19 observations just as your 2000. Use an array to hold the random number selected and fill the cell with say 1. Next time if the same random number is generated(check the array), take a new random number,

data want;
   array k[19] _temporary_;
   set sashelp.class;
THERE:
   id = ceil(ranuni(123) * 19);
  if k[id] then goto THERE;
  k[id] = 1;
run;

Rick_SAS · Posted 09-01-2016 02:28 PM

Unfortunately @KachiM's idea will require many iterations to completely obtain the 2000 ID values. This is because the "Birthday Problem" says that duplicates appear often in random sequences.

KachiM · Posted 09-02-2016 05:18 PM

Rick:

I know there will be duplicates but never realized that there would be so many duplicates.

Thank you for knowing it.

SAS Innovate 2025: Call for Content

Classroom Training Available!