For a given set of 2000 IDs I want to assign random numbers in the range 100000- 102000. How should I code this. For example:
Obs | ID |
1 | 1456 |
2 | 3765 |
3 | 35632 |
4 | 89564 |
. | . |
. | . |
. | . |
2000 | 580058 |
My output should look something like:
Obs | ID | Random |
1 | 1456 | 100000 |
2 | 3765 | 101997 |
3 | 35632 | 101456 |
4 | 89564 | 100002 |
. | . | |
. | . | |
. | . | |
2000 | 580058 | 102000 |
In the above example, the number of observations in the dataset are 2000. The Random numbers are assigned such that they are in particular range i.e 100000-102000.
Thanks for your help
It sounds like you want to assign a sequence of 2000 numbers to a set of 2000 observations in a random fashion. This is different than "assigning random numbers." If you assign a sequence randomly, there are no duplicate values, whereas you will almost surely get a dupicate value if you assign random numbers.
If there are 2000 observations, I assume you want the RANDOM variable to be values in [10000, 101999] or
[10001, 102000]. I'll assume the latter. Try this:
data Have; /* test data set */
set Sashelp.class;
run;
/* generate sequence of numbers */
data ID;
if _N_=0 then set Have nobs=N;
do i = 1 to N;
sortOrder = rand("Uniform");
ID = 10000 + i; /* sequence of numbers */
output;
end;
keep sortOrder ID;
run;
/* randomly permute the ID values */
proc sort data=ID;
by sortOrder;
run;
/* merge with original data */
data Want;
merge Have ID(drop=sortOrder);
run;
It sounds like you want to assign a sequence of 2000 numbers to a set of 2000 observations in a random fashion. This is different than "assigning random numbers." If you assign a sequence randomly, there are no duplicate values, whereas you will almost surely get a dupicate value if you assign random numbers.
If there are 2000 observations, I assume you want the RANDOM variable to be values in [10000, 101999] or
[10001, 102000]. I'll assume the latter. Try this:
data Have; /* test data set */
set Sashelp.class;
run;
/* generate sequence of numbers */
data ID;
if _N_=0 then set Have nobs=N;
do i = 1 to N;
sortOrder = rand("Uniform");
ID = 10000 + i; /* sequence of numbers */
output;
end;
keep sortOrder ID;
run;
/* randomly permute the ID values */
proc sort data=ID;
by sortOrder;
run;
/* merge with original data */
data Want;
merge Have ID(drop=sortOrder);
run;
Hello Rick,
Thank you so much for the code. This really helped me. The only thing I am finding hard is to run the last step (/*merge with original data*/). The error says that I need to specify BY statement. I need the ID variable and the random assigned variable in the same dataset.
This is what I used to run the code. The last step gives me error saying I need to specify BY statement. How do I modify the last step.
data have;
input ID;
datalines;
101
103
107
506
845
;
data ID;
if _N_=0 then set have nobs=N;
do i = 1 to N;
sortOrder = rand("Uniform");
RANDOM_ID = 900000 + i; /* sequence of numbers */
output;
end;
keep RANDOM_ID sortOrder ;
run;
proc sort data=ID;
by sortOrder;
run;
data Want;
merge Have ID(drop=sortOrder);
run;
Thanks!
Please post your SAS log. The code you posted runs without error on my version of SAS. The BY statement is optional for a one-to-one merge, so It doesn't make sense that you are seeing an error message.
Thanks for quick reply Rick. Please find my attached log:
284 dm 'clear log'; dm 'clear output';
285
286 data have;
287 input ID;
288 datalines;
NOTE: The data set WORK.HAVE has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
294 ;
295
296 data ID;
297 if _N_=0 then set have nobs=N;
298 do i = 1 to N;
299 sortOrder = rand("Uniform");
300 RANDOM_ID = 900000 + i; /* sequence of numbers */
301 output;
302 end;
303 keep RANDOM_ID sortOrder ;
304 run;
NOTE: DATA STEP stopped due to looping.
NOTE: The data set WORK.ID has 5 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
305
306 proc sort data=ID;
307 by sortOrder;
308 run;
NOTE: There were 5 observations read from the data set WORK.ID.
NOTE: SAS sort was used.
NOTE: The data set WORK.ID has 5 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
309
310 data Want;
311 merge Have ID(drop=sortOrder);
312 run;
ERROR: No BY statement was specified for a MERGE statement.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0
observations and 2 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
Thanks!
OK. Your site must be configured to run with the MERGENOBY option set to ERROR. Run the following:
proc options option=mergenoby value; run;
On my system the SAS log reports:
Option Value Information For SAS Option MERGENOBY
Value: NOWARN
Scope: Default
How option value set: Shipped Default
I'm guessing that your value is ERROR.
You have two options, The easiest is to change the option:
options MERGENOBY=NOWARN;
If you can't do that, then use
obs = _N_;
to create a common variable for both data sets and then include BY OBS; when you merge.
Here's one way (there's usually more than one way in SAS):
random = 99999 + ceil( 2001 * ranuni(12345) );
It is possible to assign the same random number on two separate observations this way.
If your unstated requirement is not to repeat any of the values then one approach:
1) assign a random number using Ranuni or Rand('uniform') functions
2) sort by the random number;
3) Replace the random number with 100000 + _n_;
4) You may want to resort by the Id.
Question: Do you want to exhaust all of the numbers from 100000 to 102000? That is 2001 numbers. So you may be looking for 100000 to 101999 (If so then 100000+_n_ -1) or 100001 to 102000 (which the above does)
A simpler way with the use of Sashelp.class:
There are 19 observations just as your 2000. Use an array to hold the random number selected and fill the cell with say 1. Next time if the same random number is generated(check the array), take a new random number,
data want;
array k[19] _temporary_;
set sashelp.class;
THERE:
id = ceil(ranuni(123) * 19);
if k[id] then goto THERE;
k[id] = 1;
run;
Unfortunately @KachiM's idea will require many iterations to completely obtain the 2000 ID values. This is because the "Birthday Problem" says that duplicates appear often in random sequences.
Rick:
I know there will be duplicates but never realized that there would be so many duplicates.
Thank you for knowing it.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.