BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
brophymj
Quartz | Level 8

I want to generate numbers randomly from a csv table of 40,000 number. I've attached an sample of the table of numbers "sample.csv".

 

I want to then use the numbers generated to produce 100 iterations of the formula:

 

iteration(i)  = sum(random number (j)) from j = 1 to x

 

So, the first cell in the table below will generate 15 numbers from the table sample.csv and then sum them together and output to the cell, the second iteration will do the same but it will be a new set of 15 random numbers from the table sample.csv, and so on. 

 

In the second row, x is 144 so each iteration will sum 144 random numbers from the sample.csv table.

 

Here is the code to import the csv:

 

proc import datafile="C:\Users\mbrophy\Downloads\sample.csv"
out=mydata dbms=csv replace;
getnames=no;
run;

 

  pic.PNG

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @brophymj 

 

Thank you for sharing your documents.

I have gone through the code and I have corrected the below elements:

- keep no_of_spin in the macro %null_table

- put out_&i._&j. in the macro %append

- update the SET statement in the last step to be sure that table will be append in the right order

It should work now.

 

Let me know Smiley Happy

 

Best,

/*Create macro-variables*/
data _null_;
	set analysis2_ind;
	call symputx ("nb_obs"||left(_n_),no_of_spins); /* store each value of x in a macrovariable*/
	call symputx ("nb_row",_n_); /* store the total number of rows in the dataset "have"*/
run;

/*Generate random numbers from sample data and create a table per draw with the sum:
name of the dataset = out_<sample size>_<iteration number> */



%macro random;

	%do i=1 %to &nb_row;
		%do j=1 %to 5;

			proc surveyselect data=sample method=srs n=&&nb_obs&i out=out_&i._&j.;
			run;
			
			proc means data=out_&i._&j. sum;
				output out=out_&i._&j. sum=it&j.;
			run;

		%end;
	%end;

%mend;

%random;

/*Initialize empty tables */
		
%macro null_table;

	%do i=1 %to &nb_row;
	
		data analysis2_ind_&i.;
			set analysis2_ind (keep=no_of_spins);
		run;
	%end;
	
%mend;

%null_table

/*Creation of one table per draw by retrieving all iterations for one sample size */

%macro append;

	%do i=1 %to &nb_row;
		%do j=1 %to 5;
			data analysis2_ind_&i. ;
				set analysis2_ind_&i. ;
				set out_&i._&j. (drop=_type_ _freq_); /*<=== put out_&i._&j. and not out_&&nb_obs&i.._&j.*/
			run;
		%end;
	%end;
	
%mend;

%append;

/*Append all draws */

data analysis2_compil (drop=no_of_spins);
	set analysis2_ind_1-analysis2_ind_&nb_row;
run;

data want;
	set analysis2_ind;
	set analysis2_compil;
run;

 

View solution in original post

8 REPLIES 8
brophymj
Quartz | Level 8

HI Kurt, I've done that now. 

ed_sas_member
Meteorite | Level 14

HI @brophymj 

 

You can try the below code. Let me know if this works. Hope this help!

 

data have;
	input x @@;
	cards;
15 144 3 4 66 2 4 5 5
;
run;

data sample;/* <== put the csv file */
	input r @@;
	cards;
-2 -2 -2 -2 -2 -6 -2 -2 -2 -2 
;
run;

/*Create macro-variables*/
data _null_;
	set have;
	call symputx ("nb_obs"||left(_n_),x); /* store each value of x in a macrovariable*/
	call symputx ("nb_row",_n_); /* store the total number of rows in the dataset "have"*/
run;

/*Generate random numbers from sample data and create a table per draw with the sum:
name of the dataset = out_<sample size>_<iteration number> */

%macro random;

	%do i=1 %to &nb_row;
		%do j=1 %to 100;

			proc surveyselect data=sample method=srs n=&&nb_obs&i out=out_&&nb_obs&i.._&j.;
			run;
			
			proc means data=out_&&nb_obs&i.._&j. sum;
				output out=out_&&nb_obs&i.._&j. sum=it&j.;
			run;

		%end;
	%end;

%mend;

%random;

/*Initialize empty tables */
		
%macro null_table;

	%do i=1 %to &nb_row;
	
		data have_&i.;
			set have;
		run;
	%end;
	
%mend;

%null_table

/*Creation of one table per draw by retrieving all iterations for one sample size */

%macro append;

	%do i=1 %to &nb_row;
		%do j=1 %to 100;
			data have_&i. ;
				set have_&i. ;
				set out_&&nb_obs&i.._&j. (drop=_type_ _freq_);
			run;
		%end;
	%end;
	
%mend;

%append;

/*Append all draws */

data have_compil (drop=x);
	set have_:;
run;

data want;
	set have;
	set have_compil;
run;
brophymj
Quartz | Level 8

Thanks 

 

That works really well. The only small thing is that the random numbers generated are the same rows which have the same x. So, in the dataset "have", I'm getting the same number for the last two rows (where x = 5). Any ideas what's causing this? Thanks

ed_sas_member
Meteorite | Level 14

Hi @brophymj 

 

Thank you for your feedback!

I think you can replace out_&&nb_obs&i.._&j. by out_&i._&j. in the code as follows. In fact, in the previous code, the name of the datasets created for each draw was dependent on the x variable. So when you had multiple similar values for x, the datasets where erased.

Best,

 

data have;
	input x @@;
	cards;
15 144 3 4 66 2 4 5 5
;
run;

data sample;/* <== put the csv file */
	input r @@;
	cards;
-2 -2 -2 -2 -2 -6 -2 -2 -2 -2 
;
run;

/*Create macro-variables*/
data _null_;
	set have;
	call symputx ("nb_obs"||left(_n_),x); /* store each value of x in a macrovariable*/
	call symputx ("nb_row",_n_); /* store the total number of rows in the dataset "have"*/
run;

/*Generate random numbers from sample data and create a table per draw with the sum:
name of the dataset = out_<sample size>_<iteration number> */

%macro random;

	%do i=1 %to &nb_row;
		%do j=1 %to 100;

			proc surveyselect data=sample method=srs n=&&nb_obs&i out=out_&i._&j.;
			run;
			
			proc means data=out_&i._&j. sum;
				output out=out_&i._&j. sum=it&j.;
			run;

		%end;
	%end;

%mend;

%random;

/*Initialize empty tables */
		
%macro null_table;

	%do i=1 %to &nb_row;
	
		data have_&i.;
			set have;
		run;
	%end;
	
%mend;

%null_table

/*Creation of one table per draw by retrieving all iterations for one sample size */

%macro append;

	%do i=1 %to &nb_row;
		%do j=1 %to 100;
			data have_&i. ;
				set have_&i. ;
				set out_&i._&j. (drop=_type_ _freq_);
			run;
		%end;
	%end;
	
%mend;

%append;

/*Append all draws */

data have_compil (drop=x);
	set have_:;
run;

data want;
	set have;
	set have_compil;
run;
brophymj
Quartz | Level 8

Thanks Ed_sas_member

 

It's not working for me and it's probably because i didn't show you the files i'm using. I've attached the sas code and two input files. When i run it, the first row is repeated for all 18 observations. It seems it's almost there but I'm missing something. 

Patrick
Opal | Level 21

@brophymj 

Using the files you've attached here an approach which doesn't require macro coding. I've stored the data from windist.csv in a variable called number. If you call it the same and you call the source tables work.analysis2_ind and work.windist then below code should run for you without any changes required.


%let n_iter=100;
data want(drop=_:);

  if _n_=1 then
    do;
      /* define hash for number list */
      dcl hash numbers();
      numbers.defineKey('_n');
      numbers.defineData('_number');
      numbers.defineDone();

      /* load hash with number list */
      do while(not last);
        set windist(keep=number rename=(number=_number)) end=last;
        _n+1;
        numbers.add();
        /* keep total sum in retained variable
           used later for exception handling
        */
        _tot_sum_num=sum(_tot_sum_num,_number);
        retain _tot_sum_num;
      end;

      /* define hash to hold _n of already selected items */
      dcl hash selected();
      selected.defineKey('_n');
      selected.defineDone();
    end;

    /* read master table */
    set analysis2_ind;

    array sum_num_ {&n_iter} 8;

    /* &n_iter iterations */
    do _iterInd=1 to dim(sum_num_);
      selected.clear();

      /*  pick no_of_spins items from hash randomly without replacement */ 

      if no_of_spins>numbers.num_items then
        do;
          /* exception handling */
          put 'More items requested than available in number list ';
          put 'All items will get selected for summarization';
          sum_num_[_iterInd]=_tot_sum_num;
          continue;
        end;

      do _spinInd=1 to no_of_spins;
        /* generate random item number not yet used */
        do while(1);
          _n=RAND('INTEGER',numbers.num_items);
          if selected.check() ne 0 then
            do;
              selected.add();
              leave;
            end;
        end;

        /* read new item from hash and sum */
        numbers.find();
        sum_num_[_iterInd]=sum(sum_num_[_iterInd], _number);
      end;
    end;
run;

proc print data=want;
  var no_of_spins sum_num_:;
run;

Capture.JPG

 

I wasn't really sure where without replacement applies for randomly picking a number from your list. In the code above it's per cell summation but the same number could get picked for the sum in the next cell. Is that what you're after?

ed_sas_member
Meteorite | Level 14

Hi @brophymj 

 

Thank you for sharing your documents.

I have gone through the code and I have corrected the below elements:

- keep no_of_spin in the macro %null_table

- put out_&i._&j. in the macro %append

- update the SET statement in the last step to be sure that table will be append in the right order

It should work now.

 

Let me know Smiley Happy

 

Best,

/*Create macro-variables*/
data _null_;
	set analysis2_ind;
	call symputx ("nb_obs"||left(_n_),no_of_spins); /* store each value of x in a macrovariable*/
	call symputx ("nb_row",_n_); /* store the total number of rows in the dataset "have"*/
run;

/*Generate random numbers from sample data and create a table per draw with the sum:
name of the dataset = out_<sample size>_<iteration number> */



%macro random;

	%do i=1 %to &nb_row;
		%do j=1 %to 5;

			proc surveyselect data=sample method=srs n=&&nb_obs&i out=out_&i._&j.;
			run;
			
			proc means data=out_&i._&j. sum;
				output out=out_&i._&j. sum=it&j.;
			run;

		%end;
	%end;

%mend;

%random;

/*Initialize empty tables */
		
%macro null_table;

	%do i=1 %to &nb_row;
	
		data analysis2_ind_&i.;
			set analysis2_ind (keep=no_of_spins);
		run;
	%end;
	
%mend;

%null_table

/*Creation of one table per draw by retrieving all iterations for one sample size */

%macro append;

	%do i=1 %to &nb_row;
		%do j=1 %to 5;
			data analysis2_ind_&i. ;
				set analysis2_ind_&i. ;
				set out_&i._&j. (drop=_type_ _freq_); /*<=== put out_&i._&j. and not out_&&nb_obs&i.._&j.*/
			run;
		%end;
	%end;
	
%mend;

%append;

/*Append all draws */

data analysis2_compil (drop=no_of_spins);
	set analysis2_ind_1-analysis2_ind_&nb_row;
run;

data want;
	set analysis2_ind;
	set analysis2_compil;
run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1726 views
  • 1 like
  • 4 in conversation