BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
PrudhviB
Obsidian | Level 7

Hi Team,

 

I wanted to create a random number using a column in the data to use min and max (for boundaries) and also not to duplicate what's in that column already(ideally an unique number). if possible the newly created integer should be close enough to the number already assigned before or after in relation to the column type.(please see want output for reference)

 

my main goal is to create an random number which i can use as a primary key for this table, my original data has 2000 row i used a sample data. 

 

i have included both have and want data steps below:

data have ;
  input(type baseline pinchvalue 
voltage Threshold dimmer) ($)
        mpc_no
  ;
cards;
a <25 * * * * 1242
a 25-50 <=500 * * * *
a 25-50 <=500 * *  >2 3424
b >50 >850 * *   * 989
b >50 <=657 *  * *  345
c <25 * * * *   644 
c 25-50 <=500  * <=8 * *
r 25-50 >500  * >8 * * 867
r >50 <=657 <=850 * * *
r >50 <=657 >850 * * 1098
;
data want ;
  input(type baseline pinchvalue 
voltage Threshold dimmer) ($)
        mpc_no
  ;
cards;
a <25 * * * * 1242
a 25-50 <=500 * * *  1265
a 25-50 <=500 * *  >2 3424
b >50 >850 * *   * 989
b >50 <=657 *  * *  345
c <25 * * * *   644 
c 25-50 <=500  * <=8 *  690
r 25-50 >500  * >8 *  867
r >50 <=657 <=850 * * 1095
r >50 <=657 >850 * * 1098
;
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Generate the set of values that are not already used.  If you need them randomized the make another variable with a random number and sort by that.  Then when you see a missing value take the next from the list of possible values.

 

Example:

data have;
  input id @@ ;
cards;
3 . .  8 . . 12 . 
;

data possible;
  set have;
  where not missing(id);
  by id;
  retain newid ;
  if _n_>1 then do newid=newid to id-1;
     rand = rand('uniform');
     output;
  end;
  newid=id+1;
  drop id;
  rename newid=id;
run;

proc sort;
  by rand;
run;

proc print;
run;

data want;
  set have;
  if missing(id) then set possible(keep=id);
run;

proc print;
run;

Result:

OBS    id

 1      3
 2      4
 3      6
 4      8
 5      9
 6      5
 7     12
 8      7

View solution in original post

6 REPLIES 6
Tom
Super User Tom
Super User

Do you want a random number or a unique number?  The two are kind of contradictory.

If you want to use it as a KEY then you just need it to be UNIQUE.

PrudhviB
Obsidian | Level 7

Sorry I meant, I want a random number generated but not a duplicate number for that column. 

Tom
Super User Tom
Super User

If you want to generate a unique identifier for the rows in a table just use a sequential number.  Randomness does add any value (and adds a lot of complications).

data want; 
  row+1;
  set have;
run;

If that does not work for you then you need to explain in more detail what you are trying to do. 

PrudhviB
Obsidian | Level 7

Yeah now I think of it is complicated, I will try my best to articulate it this time.

In my have data I have some random numbers with few values missing under column mpc_no, I want to fill those empty spaces with a random number with some limitations such as  - the range of that generated number should be the min and max values of that type AND that number shouldn't be in the column (mpc_no), like no duplicates. 

see below image if this makes sense. 

PrudhviB_0-1670258998236.png

 

here i want second value in column mpc_no to be between 1242 & 3424 but not equal to 1298(which is last value in that row) and do that for all the null values in column mpc_no. 
hope this is clear. 

 

Tom
Super User Tom
Super User

Generate the set of values that are not already used.  If you need them randomized the make another variable with a random number and sort by that.  Then when you see a missing value take the next from the list of possible values.

 

Example:

data have;
  input id @@ ;
cards;
3 . .  8 . . 12 . 
;

data possible;
  set have;
  where not missing(id);
  by id;
  retain newid ;
  if _n_>1 then do newid=newid to id-1;
     rand = rand('uniform');
     output;
  end;
  newid=id+1;
  drop id;
  rename newid=id;
run;

proc sort;
  by rand;
run;

proc print;
run;

data want;
  set have;
  if missing(id) then set possible(keep=id);
run;

proc print;
run;

Result:

OBS    id

 1      3
 2      4
 3      6
 4      8
 5      9
 6      5
 7     12
 8      7
ballardw
Super User

data steps that throw invalid data messages probably need to be corrected before even asking questions:

602  data have ;
603    input(type baseline pinchvalue
604  voltage Threshold dimmer) ($)
605          mpc_no
606    ;
607  cards;

NOTE: Invalid data for mpc_no in line 609 21-21.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
609        a 25-50 <=500 * * * *
type=a baseline=25-50 pinchvalue=<=500 voltage=* Threshold=* dimmer=* mpc_no=. _ERROR_=1 _N_=2
NOTE: Invalid data for mpc_no in line 614 24-24.
614        c 25-50 <=500  * <=8 * *
type=c baseline=25-50 pinchvalue=<=500 voltage=* Threshold=<=8 dimmer=* mpc_no=. _ERROR_=1 _N_=7
NOTE: Invalid data for mpc_no in line 615 22-22.
615        r 25-50 >500  * >8 * * 867
type=r baseline=25-50 pinchvalue=>500 voltage=* Threshold=>8 dimmer=* mpc_no=. _ERROR_=1 _N_=8
NOTE: Invalid data for mpc_no in line 616 23-23.
616        r >50 <=657 <=850 * * *
type=r baseline=>50 pinchvalue=<=657 voltage=<=850 Threshold=* dimmer=* mpc_no=. _ERROR_=1 _N_=9
NOTE: The data set WORK.HAVE has 10 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

If you want to indicate a missing value in datalines use a .  not an * .

 

Second you ask about " using a column in the data to use min and max (for boundaries) ". Which column? It is hard to use a single column for both a min and max boundary on a single record and you don't even mention which column to consider or how it is used. Note: for anything to be consistent you really want to provide min and max as numeric values. You didn't provide any. Character values as the min or max of what should be numeric, apparently your Mpc_no variable don't make much sense and in general are poor boundaries because rules comparing character values are quite different from numeric.

 

Why is randomness desirable? Uniqueness?

If you actually need random and unique then perhaps working with the data to create all possible valid values and then using Proc Surveyselect to select without replacement is a solution. But that requires more information than you have currently provided.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 742 views
  • 1 like
  • 3 in conversation