BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
DarrylLawrence
Obsidian | Level 7

I need to create a random number in a file that is exported from SAS Customer Intelligence Studio.

 

I used this function in the Calculated Items section: rand("integer",100000,999999) to generate random numbers between 100000 and 999999.

 

The random number must be converted to character and concatenated with other text values to produce this output.: r9_C1000_130922_512202

SAS Code: NEWVAR='r9_C1000_'||strip(put(today(),ddmmyy7.))||'_'||put(rand("integer",100000,999999),$6.);

 

However, the expression fails as soon as I change it to put(rand("integer",100000,999999),$6.) in the Calculated field in SAS CI Studio, creating a new Character Calculated Field.

SAS Eguide CodeSAS Eguide CodeCalculated Item in SAS CI StudioCalculated Item in SAS CI Studio

 

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

You numeric format for the numeric value returned RAND.  put(-numeric-,F6.);  No $sign.

View solution in original post

9 REPLIES 9
data_null__
Jade | Level 19

You numeric format for the numeric value returned RAND.  put(-numeric-,F6.);  No $sign.

DarrylLawrence
Obsidian | Level 7

Hi 

Thanks for the tip - it worked, but we discovered that the RAND function assigns the same random number to more than one record in a dataset.

I have a file of 15000 customer records - see screenshot below - there are many duplicate random numbers.

 

My output file cannot have duplicate random numbers as the generated string is used in a key in the target system that the file is loaded into.

 

How do I prevent duplicate random numbers?

 

DarrylLawrence_0-1663165150584.png

 

ballardw
Super User

@DarrylLawrence wrote:

Hi 

Thanks for the tip - it worked, but we discovered that the RAND function assigns the same random number to more than one record in a dataset.

I have a file of 15000 customer records - see screenshot below - there are many duplicate random numbers.

 

My output file cannot have duplicate random numbers as the generated string is used in a key in the target system that the file is loaded into.

 

 

 


Numbers are either random or not duplicated. Pick one. You had a 15000/899999 chance of a duplication on each record. Which is on the order of 1.7% of records that should be duplicates.

 

So some details that will now need to be addressed:

Will you be adding other "numbers" to this data set? If so you have a moderate data management issue because you need to keep track of all "used" numbers before adding any.

If this data set is static, meaning that there will never be more values added you can do some thing like:

data rawnums;
   do num=100000 to 999999;
      output;
   end;
run;

proc surveyselect data=rawnums out=selected
   sampsize=15000; /*<- that number must be exactly equal to the number of records in your other set*/
run;

data want;
   /* this is a one-to-one merge and
    why the sampsize above must match our set
   */
   merge yourdataset selected;
   /* use NUM variable from selected instead
      of the Rand function
   */
   segment=catx('_','r9_1000',put(today(),ddmmyy7.),put(num,f6.));
run;

If you will be adding values then you want to keep a permanent data set where you remove the "selected" so that the next time you do this then no duplicates. But if you do not segregate the "new" records you will have to do a little work so the "merge" is only for the new ones. And then update the rawnums set again.

DarrylLawrence
Obsidian | Level 7

Hi

 

This is how it will work.

 

We use SAS Custiomer Intelligence Studio to generate files that are loaded into a call centre application.

We cannot determine beforehand how many records will be in every file, but in every file, there must be a unique, random number assigned to every row in the file.

 

The ideal solution is to add the function in a Calculated variable in SAS Customer Intelligence Studio, which is added into the file at runtime. 

 

I cannot insert the logic that you sent to check if the number has been used. I can only insert the function.

 

I thought "random" meant that the random number would not be repeated in a file/table. 

 

ballardw
Super User

@DarrylLawrence wrote:

Hi

 

This is how it will work.

 

We use SAS Custiomer Intelligence Studio to generate files that are loaded into a call centre application.

We cannot determine beforehand how many records will be in every file, but in every file, there must be a unique, random number assigned to every row in the file.

 

The ideal solution is to add the function in a Calculated variable in SAS Customer Intelligence Studio, which is added into the file at runtime. 

 

I cannot insert the logic that you sent to check if the number has been used. I can only insert the function.

 

I thought "random" meant that the random number would not be repeated in a file/table. 

 


If each file is treated separately then you do not need to track the random numbers. If you mean that the random number can't duplicate in different files then you would. If your file is large enough, or you have enough records, as soon as you hit 900,000 records you must duplicate a number. Your restriction for selection of range 100000 to 999999 only allows the option of 899,999 unique numbers.

 

You reduce the likelihood of duplicates if you go to more digits but it never goes away.

If you research the literature on computer simulations and random numbers (prepare to lose a few weeks if serious) you will find that people complain if there are no duplicates.

 

Another approach might be to just sequentially number the records and use that for the variable. Include a random number as a second variable. Sort by the random number. The result of that constructed variable will appear random within the file but will not duplicate.

Tom
Super User Tom
Super User

No idea about SAS CI Studio. 

Does it have a way for you to run actual SAS code? 

 

If so use that to generate the distinct numbers.  The easiest way to generate distinct numbers is sequentially.

data want;
   set have;
   num + 1;
run;

If you need to remember what numbers you used before create a place to store the largest number used so far and then start from there.  For example you could make a separate table called say LAST_NUMBER.

data want last_number(keep=num);
   if _n_=1 then set last_number;
   set have end=eof;
   num + 1;
   output want;
   if eof then output last_number;
run;
FreelanceReinh
Jade | Level 19

Hi @DarrylLawrence,

 

Here's a method similar to @Tom's suggestion, but creating numbers that look much more random than 1, 2, 3, 4, ..., namely 486956, 853476, 775530, 758721, ...:

Use the recursive formula

n=mod(386957*(n-99999),900001)+99999;

with the initial value n=100000 in the very first application and with the most recently created n in subsequent applications (if you need to add more of these pseudo-random numbers to an existing list after an interruption).

 

You can't get around saving and using the most recently created number (or an equivalent information) if you need to interrupt and later continue the number generating process. This is because without any information about the numbers created in the past you are always at risk of repeating one of those numbers. And the risk of getting duplicates would be extremely high, as you have seen with the RAND function: In your case the probability of getting no duplicates is approx. 2.6E-55 (if the RAND function works as it should).

 

With the formula above, however, you are guaranteed to get no duplicates until the 900,001st number (which is necessarily one of the 900,000 previous numbers). All created numbers are between 100,000 and 999,999, as desired.

 

Look at the result of this DATA step:

data want;
n=100000;
do i=1 to 15000; /* upper limit must be <=900000 to avoid duplicates */
  n=mod(386957*(n-99999),900001)+99999;
  output;
end;
run;

Maybe graphically:

proc sgplot data=want(obs=1000);
scatter x=i y=n;
run;

 

ballardw
Super User

That would fail because the result of the RAND function is numeric. So using a $6. Character field fails. Likely you want a 6. or F6.

 

If you paste code as TEXT we can easily make suggestions. Instead of all those Strip(put()) || bits you can use the CATX function to place an _ character between strings such as

 

segment=catx('_','r9_1000',put(today(),ddmmyy7.),put(rand('integer',100000,999999),f6.));

 

The catx function will strip leasing and trailing spaces between the character expressions and then place the first parameter between each.

Note that first parameter may be multiple characters if desired.

Tom
Super User Tom
Super User

The format used in the PUT() function has to match the value passed to it.  Since the RAND() function generates a NUMBER you need to use a NUMERIC format.

put(rand("integer",100000,999999),Z6.)

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 833 views
  • 3 likes
  • 5 in conversation