BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Chugvenk
Calcite | Level 5

Team,

 

I have a dataset of 5000 rows. I want to create a new column (say x) that generates random integers so that the sum of those integers (in the 5000 rows) comes to 25000. How do I achieve this in SAS?

 

Regards,

Suhrit

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

I don't see integers in the Excel link you provide.

This works:

data HAVE;
  do VAL=1 to 5000;
    output; 
  end;  
run;

data WANT;
  set HAVE end=LASTOBS;
  array R [5000] _temporary_;
  if _N_=1 then do; 
    do I=1 to 5000;
      R[I]=int(rand('UNIFORM',10,0));
    end;
    DIFF=25000-sum(of R[*]);
    do I=1 to DIFF;
      R[int(rand('UNIFORM',5000,1))]+(DIFF>0);
    end;
    VET+sum(of R[*]);
  end;
  RAN=R[_N_];
run;          

proc print data=WANT(firstobs=4990 keep=VAL RAN VET) noobs;
run;

VAL VET RAN
4990 25000 5
4991 25000 10
4992 25000 8
4993 25000 6
4994 25000 9
4995 25000 4
4996 25000 2
4997 25000 8
4998 25000 9
4999 25000 1
5000 25000 1

 

View solution in original post

15 REPLIES 15
ballardw
Super User

@Chugvenk wrote:

Team,

 

I have a dataset of 5000 rows. I want to create a new column (say x) that generates random integers so that the sum of those integers (in the 5000 rows) comes to 25000. How do I achieve this in SAS?

 

Regards,

Suhrit


Do they have to be positive integers?

What test for randomness do they have to pass?

 

And the purpose of this exercise is what?

Chugvenk
Calcite | Level 5

Yes, they need to be positive integers. 

Purpose: For creating a mockup for reporting, until we actually get the data from the warehouse.

Test for Randomness: Honestly, I'm not sure what this means. It should just be 5000 random numbers summing to 25000. I can do this in excel, I'm new to SAS, so I am hoping there are functions for this.

Patrick
Opal | Level 21

@Chugvenk 

To create random numbers or unique id's (i.e. a sequence number) is no problem. It's the "summing up to 25000" requirement where one would need to come up with some custom logic and then implement the code for it..

Not sure how you would implement such a "summing up to 25000" easily in Excel. Can you explain?

Chugvenk
Calcite | Level 5

Patrick,

 

It can be done in excel, I have done it before. I need to do this in SAS.

 

https://stackoverflow.com/questions/21782329/generate-n-random-numbers-whose-sum-is-a-constant-k-exc...

 

And yes, we would need a custom logic, I was asking help with that since I am new to SAS. 

 

Regards,

Suhrit

 

 

ChrisNZ
Tourmaline | Level 20

I don't see integers in the Excel link you provide.

This works:

data HAVE;
  do VAL=1 to 5000;
    output; 
  end;  
run;

data WANT;
  set HAVE end=LASTOBS;
  array R [5000] _temporary_;
  if _N_=1 then do; 
    do I=1 to 5000;
      R[I]=int(rand('UNIFORM',10,0));
    end;
    DIFF=25000-sum(of R[*]);
    do I=1 to DIFF;
      R[int(rand('UNIFORM',5000,1))]+(DIFF>0);
    end;
    VET+sum(of R[*]);
  end;
  RAN=R[_N_];
run;          

proc print data=WANT(firstobs=4990 keep=VAL RAN VET) noobs;
run;

VAL VET RAN
4990 25000 5
4991 25000 10
4992 25000 8
4993 25000 6
4994 25000 9
4995 25000 4
4996 25000 2
4997 25000 8
4998 25000 9
4999 25000 1
5000 25000 1

 

Chugvenk
Calcite | Level 5

@ChrisNZ  Thank you so much. This absolutely works!

 

I had no idea this code would be so complex. You are awesome! Thanks again!

ballardw
Super User

@Chugvenk wrote:

Yes, they need to be positive integers. 

Purpose: For creating a mockup for reporting, until we actually get the data from the warehouse.

Test for Randomness: Honestly, I'm not sure what this means. It should just be 5000 random numbers summing to 25000. I can do this in excel, I'm new to SAS, so I am hoping there are functions for this.


There are statistical tests that provide information as to how likely your data is to be "random" just like a Ttest can be used to see if the means of two sets are the same (or not).

 

 

PGStats
Opal | Level 21

50 random integers summing up to 250:

 

%let sum=250;
%let nb=50;

data want;
call streaminit(976785676);
array r {0:&nb} _temporary_;
r{0} = 0;
do i = 1 to &nb;
    r{i} = r{i-1} + rand("uniform");
    end;
do i = 1 to &nb;
    r{i} = round(r{i}*&sum/r{&nb});
    number = r{i} - r{i-1};
    output;
    end;
keep number;
run;
PG
Chugvenk
Calcite | Level 5

@PGStats  Unfortunately, this didn't work. The solution provided by @ChrisNZ  is the correct one I was looking for.

 

That said, thanks for getting back to me. I appreciate it!

 

PGStats
Opal | Level 21

The solution I proposed above DID work, I tested it. If it didn't work for you, I'd like to know how.

PG
Chugvenk
Calcite | Level 5

 the dataset you created was a bunch of numbers that went uptil 250 (250 was the last number), which was not quite what I was looking for. I was looking for a column of random numbers and the total of that should have been 250. Again, maybe I didn't explain it correctly, so my bad. Thanks for getting back to me! Have a good day!

PGStats
Opal | Level 21

My own testing goes like this:

 

NOTE: AUTOEXEC processing completed.

1    %let sum=250;
2    %let nb=50;
3
4    data want;
5    call streaminit(976785676);
6    array r {0:&nb} _temporary_;
7    r{0} = 0;
8    do i = 1 to &nb;
9        r{i} = r{i-1} + rand("uniform");
10       end;
11   do i = 1 to &nb;
12       r{i} = round(r{i}*&sum/r{&nb});
13       number = r{i} - r{i-1};
14       output;
15       end;
16   keep number;
17   run;

NOTE: The data set WORK.WANT has 50 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.90 seconds
      cpu time            0.15 seconds


18
19   data _null_;
20   do j = 1 to 5;
21       do i = 1 to 10;
22           set want;
23           put number 2. @;
24           end;
25       put;
26       end;
27   run;

 7 4 5 5 3 7 5 7 2 5
 6 2 3 4 8 5 4 0 7 4
 8 2 5 8 7 3 7 6 8 5
 4 6 2 8 1 4 7 2 4 3
 4 5 9 9 4 2 7 6 3 8
NOTE: There were 50 observations read from the data set WORK.WANT.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds

with the sum of these 50 numbers being 250.

PG
FreelanceReinh
Jade | Level 19

Hi @Chugvenk,

 

You asked for random numbers, but your specifications (5000 positive integers with sum 25000) do not uniquely determine a distribution. That's why different solutions have been presented. Below is an approach that allows you to specify a suitable distribution. This is likely overkill for your purpose, but later readers of this thread may have other ideas what to do with those random integers.

 

Obviously, a "suitable" distribution must have an expected value (close to) 25000/5000=5. So, the uniform distribution on {1, 2, ..., 9} is a good candidate. If you want this (and you have SAS 9.4M5 or later), specify

%let dist=rand('integer', 9);

But maybe you would like to see at least a few two-digit integers. Then, how about a binomial distribution with n=16, p=1/4 and adding 1 to the random variates to avoid zeros and to adjust the expected value? Check out the results of the code below.

 

%let sum=25000;
%let nb=5000;
%let dist=rand('binom', 0.25, 16)+1; /* suitable distribution */
%let max=17; /* an upper bound for the above distribution */
%let seed=271828182;
%let maxiter=10000;

/* Create one-row dataset with frequencies n1, ..., n&max for integers 1, ..., &max */

data frq(keep=n:);
call streaminit(&seed);
array tn[&max] _temporary_;
array  n[&max];
d=1e15;
do _n_=1 to &maxiter until(s=&sum);
  call missing(of tn[*]);
  do j=1 to &nb;
    tn[&dist]+1;
  end;
  ts=0;
  do k=1 to &max;
    tn[k]+0;
    ts+k*tn[k];
  end;
  td=abs(&sum-ts);
  if td<d then do;
    d=td;
    s=ts;
    do k=1 to &max;
      n[k]=tn[k];
    end;    
  end;    
end;
put "The optimum sum (found in <=&maxiter iterations) is " s +(-1) '.';
if s ne &sum then put 'Sorry!';
run;

/* Create sorted dataset (view) with &nb integers X */

data rndv(keep=x) / view=rndv;
set frq;
array n[&max];
do x=1 to &max;
  do i=1 to n[x];
    output;
  end;
end;
run;

/* Sort integers randomly */

proc surveyselect data=rndv rate=1 seed=&seed out=rnd outrandom noprint;
run;

/* Final checks -- optional */

proc freq data=rnd;
tables x;
run;

proc means data=rnd sum;
var x;
run;

/* Combine HAVE dataset with random integers */

data want;
merge have rnd;
run;
Chugvenk
Calcite | Level 5

@FreelanceReinh  Thanks so much. I tested this and it works perfectly! Awesome!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 4279 views
  • 3 likes
  • 7 in conversation