BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Anna_NZ
Calcite | Level 5

Hi, 

 

I have two data sets a and b with 200 entries each.  While we have the capacity to test 350 samples each month, I have to use one of the data sets each month (in alternating mode so a b a b etc), and top the remaining places up with samples from the other dataset. 

 

So say, for January, I use dataset a and top it up with 150 samples of dataset b. 

In this case I could simply use Proc Sql... obs = 150

 

However, the samples set sizes change every month. 

What do I need to do to :

 

choose a and fill up samples from sample set b till samples size = 350. 

Any ideas are highly welcome. 

Many thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Anna_NZ

Here one way to go:

data have_a;
  ds='A';
  do i=1 to 200;
    output;
  end;
run;

data have_b;
  ds='B';
  do i=1 to 200;
    output;
  end;
run;


%let n_sample=350;
%let ds_all=have_a;
%let ds_sample=have_b;

data want(drop=_:);

  /* ds with all records */
  do while(not last_A);
    set &ds_all nobs=nobs_A end=last_A;
    output;
  end;


  /*** based on: http://support.sas.com/kb/24/722.html ***/
  /*  Method 3: Using SAS DATA Step with no sort required  */

  /* Initialize _K to the number of sample obs needed and _N to the */
  /*  total number of obs in the data set.                          */
  _k= &n_sample - nobs_A;
  _n=nobs_A;

  do while(1);

    set &ds_sample;

    /* To randomly select the first observation for the sample, use the */
    /* fact that each obs in the data set has an equal chance of being  */
    /* selected: k/n. If a random number between 0 and 1 is less than   */
    /* or equal to k/n, we select that the first obs for our sample     */
    /* and also adjust k and the number of obs needed to complete the   */
    /* sample.                                                          */

     if ranuni(0) <= _k/_n then
      do;
        output;
        _k=_k-1;
      end;

    /* At every iteration, adjust N, the number of obs left to */
    /* sample from.                                            */
    _n=_n-1;

    /* Once the desired number of sample points are taken, stop iterating */
    if _k=0 then leave;

  end;

  stop;

run;

View solution in original post

2 REPLIES 2
Patrick
Opal | Level 21

@Anna_NZ

Here one way to go:

data have_a;
  ds='A';
  do i=1 to 200;
    output;
  end;
run;

data have_b;
  ds='B';
  do i=1 to 200;
    output;
  end;
run;


%let n_sample=350;
%let ds_all=have_a;
%let ds_sample=have_b;

data want(drop=_:);

  /* ds with all records */
  do while(not last_A);
    set &ds_all nobs=nobs_A end=last_A;
    output;
  end;


  /*** based on: http://support.sas.com/kb/24/722.html ***/
  /*  Method 3: Using SAS DATA Step with no sort required  */

  /* Initialize _K to the number of sample obs needed and _N to the */
  /*  total number of obs in the data set.                          */
  _k= &n_sample - nobs_A;
  _n=nobs_A;

  do while(1);

    set &ds_sample;

    /* To randomly select the first observation for the sample, use the */
    /* fact that each obs in the data set has an equal chance of being  */
    /* selected: k/n. If a random number between 0 and 1 is less than   */
    /* or equal to k/n, we select that the first obs for our sample     */
    /* and also adjust k and the number of obs needed to complete the   */
    /* sample.                                                          */

     if ranuni(0) <= _k/_n then
      do;
        output;
        _k=_k-1;
      end;

    /* At every iteration, adjust N, the number of obs left to */
    /* sample from.                                            */
    _n=_n-1;

    /* Once the desired number of sample points are taken, stop iterating */
    if _k=0 then leave;

  end;

  stop;

run;
Reeza
Super User
Grab the number of records in each table. Subtract them and create a macro variable with that value, then use that in your SQL code.

Here’s some ways to get those numbers
http://www.sascommunity.org/wiki/Determining_the_number_of_observations_in_a_SAS_data_set_efficientl...

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 882 views
  • 1 like
  • 3 in conversation