BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
buddha_d
Pyrite | Level 9

SAS Gurus,

             Could you please help me to populate an empty dataset with random data for example:

 

data class;

         set sashelp.class (obs=0);

run;

 

Since I have empty dataset class, how to populate this dataset with 1000 random observations. Please don't recommend proc iml; (it doesn't work for me and I don't want to download the university edition of SAS)? 

 

Thanks a lot in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@buddha_d wrote:

2,541,626)?  Below is the error I am getting.


 

The value arrays include

  2 sexes

83 ages   (18:100)
61 heights (24:84)  - in inches

251 weights (50:300)

 

Although the layout is treated as a 4-dimensional array (read "matrix") the total number of elements is the product: 2,541,626.  So the array statement declared the size and upper/lower bounds for each of the 4 dimensions, and then told sas to initialize all 2,541,626 cells to zero.

 

As to the error messages, I suppose your version of sas (mine is 9.4 TS1M5) hasn't added the "integer" distribution for the RAND function.  But you do have the "uniform" distribution.  So instead of:

 

	  _sx=rand('integer',1,2);
	  age=rand('integer',18,100);
	  height=rand('integer',24,84);
	  weight=rand('integer',50,300);

 use

      _sx=ceil(rand('uniform',0,2));
      age=ceil(rand('uniform',17,100));
      height=ceil(rand('uniform',23,84));
      weight=ceil(rand('uniform',49,300));

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

9 REPLIES 9
Reeza
Super User

How does the source data come into play? Do you want 1000 records that are repeat entries of CLASS or ...? Can you show a small example of what your input data looks like and what you expect as output. 

 


@buddha_d wrote:

SAS Gurus,

             Could you please help me to populate an empty dataset with random data for example:

 

data class;

         set sashelp.class (obs=0);

run;

 

Since I have empty dataset class, how to populate this dataset with 1000 random observations. Please don't recommend proc iml; (it doesn't work for me and I don't want to download the university edition of SAS)? 

 

Thanks a lot in advance.


 

buddha_d
Pyrite | Level 9

The ideal situation is to have mix and match (repeats and no repeats), but if not then all could be different. 

Thanks Reeza

Reeza
Super User
I have no idea of what that means. Please provide a small example.
mkeintz
PROC Star

 

  1. You identified a data set with 5 variables (name age sex height weight)
  2. You apparently want 1,000 "random" observations with those variables.

 

What do you mean by "random data"?  Do you mean to randomly sample 1,000 draws from sashelp.class (which only has 19 observations)?  If so, then you will certainly get repeats.

 

Or, I guess there are 58,140 possible combinations of values present in sashelp.class (19 NAMEs * 2 SEXs * 6 AGEs * 17 HEIGHTs * 15 WEIGHTs.  Do you want sample without replacement from those 58,140?

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
r_behata
Barite | Level 11

 

Alphabetic List of Variables and Attributes			
#	Variable	Type	Len
3	Age     	Num	8
4	Height	         Num	8
1	Name	         Char	8
2	Sex	        Char	1
5	Weight	        Num	8

The sashelp.class data you used as an example has a combination of Numeric and Character type Fields. Now define what do you mean by populating random values for each type.  

 

Few examples :

Do you have a threshold for populating the numeric data ? (values between 100 and 220 for weight etc .?)

Do you have a rule for generating the character data ( Can it be Any value between A-Z for Sex or just the possibility of only M or F ?)

 

buddha_d
Pyrite | Level 9

Sorry guys for hazy explanation. Yes,  I want to have weights between 50 - 300 lbs, sex between M or F, Age between 18-100, Name can be any name (no specification) and height between 2-7 feet. 

 

Thanks

mkeintz
PROC Star

You want a sample, with no duplicates taken from an array with values ranging as you specified.  Think of the array as 4-dimensional.

 

First dimension (sex) lower bound 1, upper bound 2

2nd dimension (age)    18:100

3rd dimension (height)  24:84   (in inches)

4th dimension (weight(=)  50:300

 

That's a total of 2,541,626 elements.

 

So, after initializing the array to all zero's, repeat 100 times the following:

  1. Generate a set of value within the constraints above
  2. See if the corresponding element is still a zero (i.e. unsampled)
    1. If it isn't, repeat until you find a zero.
    2. If it is
      1.  mark the corresponding array element - assign it a value of 1
      2. output the record:
data want (drop=_:);
  if 0 then set sashelp.class;

  call streaminit(102985688);

  array smpl {2,18:100,24:84,50:300} _temporary_ (2541626*0);

  do _n_=1 to 100;
    name='NAME_' || put(_n_,z3.);
    do until (smpl{_sx,age,height,weight}=0);
      _sx=rand('integer',1,2);
      age=rand('integer',18,100);
      height=rand('integer',24,84);
      weight=rand('integer',50,300);
    end;
    smpl{_sx,age,height,weight}=1;
    sex=char('MF',_sx);
    output;
  end;
  stop;
run;

 

 

Note the definition of the SMPL array has 4 dimensions.  But instead of just specifying the SIZE of each dimension, it specifies the range.  That makes it easy to generate random numbers for age between 18 and 100 an assign it directly to the SMPL array. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
buddha_d
Pyrite | Level 9

2,541,626)?  Below is the error I am getting.

 

 

Here is the Log:

 

 

sas log.PNG

mkeintz
PROC Star

@buddha_d wrote:

2,541,626)?  Below is the error I am getting.


 

The value arrays include

  2 sexes

83 ages   (18:100)
61 heights (24:84)  - in inches

251 weights (50:300)

 

Although the layout is treated as a 4-dimensional array (read "matrix") the total number of elements is the product: 2,541,626.  So the array statement declared the size and upper/lower bounds for each of the 4 dimensions, and then told sas to initialize all 2,541,626 cells to zero.

 

As to the error messages, I suppose your version of sas (mine is 9.4 TS1M5) hasn't added the "integer" distribution for the RAND function.  But you do have the "uniform" distribution.  So instead of:

 

	  _sx=rand('integer',1,2);
	  age=rand('integer',18,100);
	  height=rand('integer',24,84);
	  weight=rand('integer',50,300);

 use

      _sx=ceil(rand('uniform',0,2));
      age=ceil(rand('uniform',17,100));
      height=ceil(rand('uniform',23,84));
      weight=ceil(rand('uniform',49,300));

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3339 views
  • 1 like
  • 4 in conversation