Hi Team,
I wanted to create a random number using a column in the data to use min and max (for boundaries) and also not to duplicate what's in that column already(ideally an unique number). if possible the newly created integer should be close enough to the number already assigned before or after in relation to the column type.(please see want output for reference)
my main goal is to create an random number which i can use as a primary key for this table, my original data has 2000 row i used a sample data.
i have included both have and want data steps below:
data have ;
input(type baseline pinchvalue
voltage Threshold dimmer) ($)
mpc_no
;
cards;
a <25 * * * * 1242
a 25-50 <=500 * * * *
a 25-50 <=500 * * >2 3424
b >50 >850 * * * 989
b >50 <=657 * * * 345
c <25 * * * * 644
c 25-50 <=500 * <=8 * *
r 25-50 >500 * >8 * * 867
r >50 <=657 <=850 * * *
r >50 <=657 >850 * * 1098
;
data want ;
input(type baseline pinchvalue
voltage Threshold dimmer) ($)
mpc_no
;
cards;
a <25 * * * * 1242
a 25-50 <=500 * * * 1265
a 25-50 <=500 * * >2 3424
b >50 >850 * * * 989
b >50 <=657 * * * 345
c <25 * * * * 644
c 25-50 <=500 * <=8 * 690
r 25-50 >500 * >8 * 867
r >50 <=657 <=850 * * 1095
r >50 <=657 >850 * * 1098
;
Generate the set of values that are not already used. If you need them randomized the make another variable with a random number and sort by that. Then when you see a missing value take the next from the list of possible values.
Example:
data have;
input id @@ ;
cards;
3 . . 8 . . 12 .
;
data possible;
set have;
where not missing(id);
by id;
retain newid ;
if _n_>1 then do newid=newid to id-1;
rand = rand('uniform');
output;
end;
newid=id+1;
drop id;
rename newid=id;
run;
proc sort;
by rand;
run;
proc print;
run;
data want;
set have;
if missing(id) then set possible(keep=id);
run;
proc print;
run;
Result:
OBS id 1 3 2 4 3 6 4 8 5 9 6 5 7 12 8 7
Do you want a random number or a unique number? The two are kind of contradictory.
If you want to use it as a KEY then you just need it to be UNIQUE.
Sorry I meant, I want a random number generated but not a duplicate number for that column.
If you want to generate a unique identifier for the rows in a table just use a sequential number. Randomness does add any value (and adds a lot of complications).
data want;
row+1;
set have;
run;
If that does not work for you then you need to explain in more detail what you are trying to do.
Yeah now I think of it is complicated, I will try my best to articulate it this time.
In my have data I have some random numbers with few values missing under column mpc_no, I want to fill those empty spaces with a random number with some limitations such as - the range of that generated number should be the min and max values of that type AND that number shouldn't be in the column (mpc_no), like no duplicates.
see below image if this makes sense.
here i want second value in column mpc_no to be between 1242 & 3424 but not equal to 1298(which is last value in that row) and do that for all the null values in column mpc_no.
hope this is clear.
Generate the set of values that are not already used. If you need them randomized the make another variable with a random number and sort by that. Then when you see a missing value take the next from the list of possible values.
Example:
data have;
input id @@ ;
cards;
3 . . 8 . . 12 .
;
data possible;
set have;
where not missing(id);
by id;
retain newid ;
if _n_>1 then do newid=newid to id-1;
rand = rand('uniform');
output;
end;
newid=id+1;
drop id;
rename newid=id;
run;
proc sort;
by rand;
run;
proc print;
run;
data want;
set have;
if missing(id) then set possible(keep=id);
run;
proc print;
run;
Result:
OBS id 1 3 2 4 3 6 4 8 5 9 6 5 7 12 8 7
data steps that throw invalid data messages probably need to be corrected before even asking questions:
602 data have ; 603 input(type baseline pinchvalue 604 voltage Threshold dimmer) ($) 605 mpc_no 606 ; 607 cards; NOTE: Invalid data for mpc_no in line 609 21-21. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-- 609 a 25-50 <=500 * * * * type=a baseline=25-50 pinchvalue=<=500 voltage=* Threshold=* dimmer=* mpc_no=. _ERROR_=1 _N_=2 NOTE: Invalid data for mpc_no in line 614 24-24. 614 c 25-50 <=500 * <=8 * * type=c baseline=25-50 pinchvalue=<=500 voltage=* Threshold=<=8 dimmer=* mpc_no=. _ERROR_=1 _N_=7 NOTE: Invalid data for mpc_no in line 615 22-22. 615 r 25-50 >500 * >8 * * 867 type=r baseline=25-50 pinchvalue=>500 voltage=* Threshold=>8 dimmer=* mpc_no=. _ERROR_=1 _N_=8 NOTE: Invalid data for mpc_no in line 616 23-23. 616 r >50 <=657 <=850 * * * type=r baseline=>50 pinchvalue=<=657 voltage=<=850 Threshold=* dimmer=* mpc_no=. _ERROR_=1 _N_=9 NOTE: The data set WORK.HAVE has 10 observations and 7 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
If you want to indicate a missing value in datalines use a . not an * .
Second you ask about " using a column in the data to use min and max (for boundaries) ". Which column? It is hard to use a single column for both a min and max boundary on a single record and you don't even mention which column to consider or how it is used. Note: for anything to be consistent you really want to provide min and max as numeric values. You didn't provide any. Character values as the min or max of what should be numeric, apparently your Mpc_no variable don't make much sense and in general are poor boundaries because rules comparing character values are quite different from numeric.
Why is randomness desirable? Uniqueness?
If you actually need random and unique then perhaps working with the data to create all possible valid values and then using Proc Surveyselect to select without replacement is a solution. But that requires more information than you have currently provided.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.