Hey,
Have:
Patient membership records spanning from 2015 to 2018, not every patient would have all years of membership enrollment.
Want:
I want to get a random sample of patients coming from each year (2015 to 2018) at a 38, 22,19,21% respectively without repeating the same patient ID.
Is it possible to do all of these in one proc?
Thanks
>Is it possible to do all of these in one proc?
I don't think so.
>not every patient would have all years of membership enrollment.
No preference in terms of percentage of various patient tenures?
You have not provided a sample data set, so my suggestion is totally untested. I presume you have a data set with ID and YEAR variables (or date variable from which YEAR can be extracted). Each ID may have any number of records (including zero records) in each year.
You want a random sample (at different sampling rates) for each of 4 years. And if an ID is drawn for one year, it is not eligible to be drawn from another year.
It is conceivable that this is not possible. Consider exactly 100 patients, each with one record in each year. Then your samples of 38%, 22%, 19% and 21% means you would draw one record from each of the patients. Now imagine that the 38% year (call it year X) is missing from the "last" id (i.e. the id is present only in the other 3 years). The ramdom sample size of 38% of 99 is still presumably 38 obs. If your randomization scheme, over the course of the first 99 draws, selects a complete complement for the other years, and 37 for yearX, then the 100th observation is not sampled - it is not available for yearX and it is not needed for the other years.
I.e. it is possible your data may be pathological enough to make it impossible to get the sample you want - even if all the sampling rates were identical 25%. This is because the same ID may be present in multiple years, yet is not allowed in more than one stratum (i.e. one year).
This task will probably require some data step coding.
"I want to get a random sample of patients coming from each year (2015 to 2018) at a 38, 22,19,21% respectively without repeating the same patient ID."
Does that mean you only consider a specific patient id for your sample in the first year it appears in your source data or does this mean as long as you haven't selected a specific patient id in another year, it's still up for grabs for your sample and you just don't want repeated ID's in your sample.
And depending on your answer:
What does 21% for your last year mean? 21% based on the total rows in your source, or 21% of the source rows for this specific year (and "excluded" Id's counted or not?), or 21% of rows in the sample to be from the last year?
How did you come up with these percentages per year in first place? Are they based on your current source data and you just want to end-up with the same number of patients per year in your sample?
Here an attempt to create sample HAVE data for your case. Can you please verify if this data is suitable.
/* create sample Have data */
data _null_;
length year id 8;
dcl hash h1(multidata:'n');
h1.defineKey('year','id');
h1.defineData('year','id');
h1.defineDone();
call streaminit(2);
do year=2016 to 2019;
_stop=rand('integer',1000,3000);
do _j=1 to _stop;
id=rand('integer',1,10000);
_rc=h1.ref();
end;
end;
h1.output(dataset:'have');
stop;
run;
The code you shared was through error.
Here is how the data is structured:
ID Year
A 2015
A 2016
B 2015
B 2017
B 2018
C 2016
D 2018
Patient ID, not repeating in same year and across years.
% comes from a case group whose disease index year distribution is in the mentioned rates.
Does this help? @Patrick
The code I've shared works for me as posted. Looks like you're on a too old SAS version for something in the code.
I still don't understand where the percentages would need to be applied and you haven't explained this further/answered my questions.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: