03-25-2018 08:14 PM - edited 03-25-2018 08:16 PM
I have data where people (cases) are clustered in groups. I want to randomly select 1 case from each cluster and use only selected cases in the further analysis. (And I want to do this process many times). Is there a simple way to do that? Any suggestions would be very much appreciated.
Thank you in advance.
P.S. So far I understood how to select the first or last case in a cluster but in my situation I need to select a case randomly.
03-25-2018 08:55 PM
Your question requires more details before experts can help. Can you revise your question to include more information?
Review this checklist:
To edit your original message, select the "blue gear" icon at the top of the message and select Edit Message. From there you can adjust the title and add more details to the body of the message. Or, simply reply to this message with any additional information you can supply.
SAS experts are eager to help -- help them by providing as much detail as you can.
This prewritten response was triggered for you by fellow SAS Support Communities member @mkeintz.
03-25-2018 08:56 PM
Please provide some sample data in Data step and the output your looking for.
You can your macros to do repetitive work. You can use CALL SYMPUT on condition to hold the values in macros and call them later. Also check RAND function.
03-25-2018 09:27 PM - edited 03-25-2018 09:32 PM
Thank you for your reply. My data look something like this:
ID Cluster Var1 Var2
1 1 2 3
2 1 1 5
3 2 4 4
4 2 6 2
5 2 1 3
6 3 4 1
7 4 4 6
8 4 7 3
9 4 5 5
10 4 2 1
Each case (as indicated by ID) is clustered. For example, Cases 1 and 2 are clustered within Cluster 1; Cases 3, 4, and 5 are clustered within Cluster 2, Case 6 is the only case in Cluster 3; and Cases 7-10 are clustered within Cluster 4. I also have values for each case on Var1 and Var2.
What I want to do is to select randomly 1 case per cluster. So, after selection, I will have a data set of 4 cases, each belonging to a unique cluster. After that, I will use this newly formed data set to work with Var1 and Var2, e.g., test for mean differences between Var1 and Var2.
I understand how to do repetitive work -- I can just have a loop. Also, the RAND statement, I believe, will not help me as I am not generating data -- I already have data from which I just need to randomly select 1 case per cluster.
Thank you in advance.
Need further help from the community? Please ask a new question.