Hi everyone. I'm working on some figures for my client and this is part of the instructions in the spec:
"If there are ≥150 patients with data for a given figure, please randomly and equally distribute patients into multiple graphs so that there are <150 patients in each graph. For example, if there are 160 patients, create 2 graphs with 80 patients each. If there are 300 patients with data, create 3 graphs with 100 patients each."
I'm not even sure where to start with this. Any ideas?
Some questions that you have to answer.
How do we know how many "patients" are intended for any one graph (before splitting)? Do you have a variable that indicates that? Or is this really "I have X number of observations in general and need to split them for graphing based on the number X?"
The content and structure of your data set may be quite important if this involves pre-indentified "graphs" that certain groups of patients may be currently assigned.
When it comes to random selection then the procedure is almost certain to be Proc SurveySelect. But as I say, the content of your current data and how it is to be set up for selection is important.
A basic when you know the number of groups that you want is to use the GROUPS=option.
This is a brief example that you can run using a data set that should be included in your installation:
Proc surveyselect data=sashelp.class out=work.grouped groups=3; run;
You can look at the output data set, Work.Grouped, and see that a variable Groupid has been added. It will have nearly equal numbers of observations assigned to each group.
When graphing this data you would sort the data by the GroupID variable and use a BY GroupId in Proc Sgplot (or which ever procedure you intend) to create separate plots for each group. Or use the GroupId as a Panelby variable in Proc Sgpanel.
What happens if N cannot be split into exactly equal groups? If N is a prime number (and sometimes even if it is not prime), you cannot get equal numbers in each group. N=173 (a prime number) can be split into groups of 86 and 87, or 90 and 83, or ...
What to do then?
What if N=400, is that 4 groups of 100, or 5 groups of 80, or 3 groups of 133, 133 and 134?
@ballardw also raises some good points, and all of these are things you need to think about -- and discuss with your client to get his/her agreement, long before you start writing SAS code.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.