07-13-2011 08:19 PM
I like to ask two questions on random sampling in EG4.2 as follows. Any responses would be greatly appreciated.
First, I have got a date set made of 1000 rows with two columns, say “date” and “temperature”. When I extract a 100 random samples (in EG4.2: task>data>random sample) from this set, the random sample are placed in the same order (i.e., in ascending order by date) just as in the original sample. What I am looking for is not just a 100 random samples but also to place them in random order. How can it be done?
Second, from this 1000 rows of original data set, I want to extract 50 different sets of random samples, each set consisting 100 samples (altogether 50X100=5000 rows), and put them all in the same table. Which means, this new data set should have now 5,000 rows and three columns “set no.”, “date”, and “temperature”. How can it be done?
07-14-2011 10:37 PM
An easy way to get a random order for the resulting sample is to run a query to add a new column containing a random value [e.g. ranuni() ] and then sort on that column.
I can't think of how to do the second without running the task thread 50 times. I think that you can do it with a macro variable (parameter) that you assign as the set number and then end the thread with an append task to tack it onto the previous output. You could do that and just key in the sequence number for each run and it shouldn't take too long. It doesn't scale very well; I frequently want to get a thousand samples, and typing 1 to 1000 is a lot more work than typing 1 to 50!
If you want to do it using a program task, I'd consider using the %bootsamp macro (just search support.sas.com for it).
07-27-2011 01:16 PM
I think you might need to look up PROC SURVEYSELECT, guessing you'll need to code this though, not use the GUI
Look up a paper called "Don't be Loopy" on lexjansen.com for sample code.