Need Logic for picking random observations ?

Reply
Contributor
Posts: 46

Need Logic for picking random observations ?

Hi SAS Community,

I am in need of smart logic for my problem. I have an input table with the following columns and its values

Customer_ID     Bill_Date     Bill_Amount    

AA001             10/112013          250

AA050              08/10/2013        450    

AA001              23/11/2013        150

AA003               13/11/2013       890

AA078              16/09/2013        234

Ak123               02/10/2013       678

AK147               10/12/2013      198

PM001             09/10/2013         245

BH104              31/10/2013        675

The table has 5 yrs historical data and so it is pretty huge. I would like to pick random number of Customer_Id's and pass those list Customer_Id's as a filter when i need them for further analysis on them. Until now, I am doing plain guess work by listing certain Customer_Id's just by typing in a where condition like for example Where Customer_ID in ('AA040', 'AH345', 'PL251'......etc). However I want this list to be generated randomly using a logic first and then I can apply the random generated list as filter in the query. Random generation would help my analysis by a long shot.

It would be great to have anywhere between 12-20 Customer_id's generated. Can anybody help?

Super User
Posts: 17,963

Re: Need Logic for picking random observations ?

Have you looked at proc surveyselect?

SAS/STAT(R) 9.2 User's Guide, Second Edition

Contributor
Posts: 46

Re: Need Logic for picking random observations ?

Thanks. Certainly a very helpful procedure. However, I am trying to work on a Datastep based solution and eventually parameterizing it using a macro. Also, the reason is that I want greater control over the process and be in a position to dynamically change and manipulate as and when the requirements change. Nevertheless, Thanks for the very helpful response.

Respected Advisor
Posts: 3,777

Re: Need Logic for picking random observations ?

is right surveyselect looks like the best choice.  You can tell it how many customer ids you want and it will "generate the list" and take the sample.

data huge;
   input Customer_ID $ Bill_Date:ddmmyy.  Bill_Amount;
   format bill_date ddmmyy10.;
  
do x = 1 to 5;
     
output;
     
end;
  
cards;
AA001             10/11/2013          250
AA050              08/10/2013        450    
AA001              23/11/2013        150
AA003               13/11/2013       890
AA078              16/09/2013        234
Ak123               02/10/2013       678
AK147               10/12/2013      198
PM001             09/10/2013         245
BH104              31/10/2013        675
AA003               13/11/2013       890
;;;;
   run;

proc surveyselect sampsize=3 out=sample seed=354624589;
  
samplingunit customer_id;
   run;
proc print;
  
run;
Contributor
Posts: 46

Re: Need Logic for picking random observations ?

Thanks DN and Reeza. Appreciate it!

Ask a Question
Discussion stats
  • 4 replies
  • 230 views
  • 2 likes
  • 3 in conversation