BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Anna_NZ
Calcite | Level 5

Hi, 

 

I have two data sets a and b with 200 entries each.  While we have the capacity to test 350 samples each month, I have to use one of the data sets each month (in alternating mode so a b a b etc), and top the remaining places up with samples from the other dataset. 

 

So say, for January, I use dataset a and top it up with 150 samples of dataset b. 

In this case I could simply use Proc Sql... obs = 150

 

However, the samples set sizes change every month. 

What do I need to do to :

 

choose a and fill up samples from sample set b till samples size = 350. 

Any ideas are highly welcome. 

Many thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Anna_NZ

Here one way to go:

data have_a;
  ds='A';
  do i=1 to 200;
    output;
  end;
run;

data have_b;
  ds='B';
  do i=1 to 200;
    output;
  end;
run;


%let n_sample=350;
%let ds_all=have_a;
%let ds_sample=have_b;

data want(drop=_:);

  /* ds with all records */
  do while(not last_A);
    set &ds_all nobs=nobs_A end=last_A;
    output;
  end;


  /*** based on: http://support.sas.com/kb/24/722.html ***/
  /*  Method 3: Using SAS DATA Step with no sort required  */

  /* Initialize _K to the number of sample obs needed and _N to the */
  /*  total number of obs in the data set.                          */
  _k= &n_sample - nobs_A;
  _n=nobs_A;

  do while(1);

    set &ds_sample;

    /* To randomly select the first observation for the sample, use the */
    /* fact that each obs in the data set has an equal chance of being  */
    /* selected: k/n. If a random number between 0 and 1 is less than   */
    /* or equal to k/n, we select that the first obs for our sample     */
    /* and also adjust k and the number of obs needed to complete the   */
    /* sample.                                                          */

     if ranuni(0) <= _k/_n then
      do;
        output;
        _k=_k-1;
      end;

    /* At every iteration, adjust N, the number of obs left to */
    /* sample from.                                            */
    _n=_n-1;

    /* Once the desired number of sample points are taken, stop iterating */
    if _k=0 then leave;

  end;

  stop;

run;

View solution in original post

2 REPLIES 2
Patrick
Opal | Level 21

@Anna_NZ

Here one way to go:

data have_a;
  ds='A';
  do i=1 to 200;
    output;
  end;
run;

data have_b;
  ds='B';
  do i=1 to 200;
    output;
  end;
run;


%let n_sample=350;
%let ds_all=have_a;
%let ds_sample=have_b;

data want(drop=_:);

  /* ds with all records */
  do while(not last_A);
    set &ds_all nobs=nobs_A end=last_A;
    output;
  end;


  /*** based on: http://support.sas.com/kb/24/722.html ***/
  /*  Method 3: Using SAS DATA Step with no sort required  */

  /* Initialize _K to the number of sample obs needed and _N to the */
  /*  total number of obs in the data set.                          */
  _k= &n_sample - nobs_A;
  _n=nobs_A;

  do while(1);

    set &ds_sample;

    /* To randomly select the first observation for the sample, use the */
    /* fact that each obs in the data set has an equal chance of being  */
    /* selected: k/n. If a random number between 0 and 1 is less than   */
    /* or equal to k/n, we select that the first obs for our sample     */
    /* and also adjust k and the number of obs needed to complete the   */
    /* sample.                                                          */

     if ranuni(0) <= _k/_n then
      do;
        output;
        _k=_k-1;
      end;

    /* At every iteration, adjust N, the number of obs left to */
    /* sample from.                                            */
    _n=_n-1;

    /* Once the desired number of sample points are taken, stop iterating */
    if _k=0 then leave;

  end;

  stop;

run;
Reeza
Super User
Grab the number of records in each table. Subtract them and create a macro variable with that value, then use that in your SQL code.

Here’s some ways to get those numbers
http://www.sascommunity.org/wiki/Determining_the_number_of_observations_in_a_SAS_data_set_efficientl...

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1022 views
  • 1 like
  • 3 in conversation