BookmarkSubscribeRSS Feed
pinkyc
Calcite | Level 5

I have data like this:

ID     Admit               Discharge     Paid_Amount

A       01/22/2012     2/22/2012     0

A                                                   23.00

A                                                   5.00

A                                                   -23.00

...

Each ID may have several hundred rows, but different values for Admit, Discharge and Paid_Amount.  I need to randomly select 10 IDs from the dataset.  If I use the SURVEYSELECT to, it will return 10 IDs but not all the observations associated with the ID.  Is there a way to select 10 IDs and all observations with the IDs?

3 REPLIES 3
Astounding
PROC Star

Does it have to select exactly 10 IDs, or could it be approximately 10?

Do you know how many IDs are in the data?

Does the selection have to be 100% statistically random, or would a somewhat random selection be acceptable?

There are many ways to skin this cat, but your answers determine whether a quick, easy way is possible.

pinkyc
Calcite | Level 5

It could be approximately 10, there are 1751 distinct IDs.  I just need to select around 10 to check some details, but the exact number isn't important.  Does not have to be statistically random, just any 10 mostly random is fine, so yes somewhat random would be definitely acceptable as long as all of the observations per ID is accounted for.

billfish
Quartz | Level 8

One can use a 1-(DATA STEP) solution.

First, some simulated data;


Second, a proposed 1-(DATA STEP) solution. Here, one wants to chose the minimum of (10, (# unique id)) = zSize = min(10,zId); One can use some number other than 10.

/*****************************/
/**** some simulated data ****/
/*****************************/
data t_a;
  do id = 100 to 120;
     do store_id = 'A','B','C','D';
        Sales = ceil(500*ranuni(3));
        output;
     end;
  end;
run;


/*****************************/
/**** a proposed solution ****/
/*****************************/
data t_b(keep=id store_id Sales);
   do until(zDone);
      set t_a(keep=id) end=zDone;
      by id;
      if first.id then zId+1;
   end;

   zSize = min(10,zId);

   do until(xDone);
      set t_a end=xDone;
      by id;

      if first.id then do;
            r1= ceil(zId*ranuni(3));
            if (r1 <= zSize) then do; zOutput=1; zId+(-1); zSize+(-1); end;
                             else do; zOutput=0; zId+(-1); end;
      end;

      if (zOutput=1) then output;
   end;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1110 views
  • 0 likes
  • 3 in conversation