Hi, everyone!
I want to make a change on the PROC SURVEYSELECT so the results are based on filters of the source table.
I've attached a sample of my data. The rules are:
Is there a way to do this on the procedure? Or should I use another one?
Thanks!
data have;
infile cards dlm=',';
input idAction $ control $ idClient $ ;
datalines;
28004,N,40045
28004,N,40311
28004,N,40404
28004,N,35386
28004,N,162426
28004,N,163213
28004,S,149327
28004,S,163481
28004,N,163645
28004,N,149653
28004,N,157771
28004,N,303829
28004,N,304119
28004,S,290727
28004,S,286589
28004,N,304922
28004,S,286922
28004,N,292085
28004,S,450891
28004,S,506107
;;;;
run;
proc freq data=have;
where control = 'N';
table idAction / out=counts;
run;
data sampSizeSpecifications;
set counts;
_nsize_ = ceil(0.1*Count);
run;
proc surveyselect data=have method = srs sampsize = sampSizeSpecifications out=selected;
where control = 'S';
strata idAction;
run;
Are you sure you don't want PROC PSMATCH and case control matching?
So people have an idea what the "data" looks like:
idAction;control;idClient 28004;N;40045 28004;N;40311 28004;N;40404 28004;N;35386 28004;N;162426 28004;N;163213 28004;S;149327 28004;S;163481 28004;N;163645 28004;N;149653 28004;N;157771 28004;N;303829 28004;N;304119 28004;S;290727 28004;S;286589 28004;N;304922 28004;S;286922 28004;N;292085 28004;S;450891 28004;S;506107
Now,
What does this mean?
The 10% of each idAction have to be defined by the number of rows with control = 'N' The random output have to be only rows with control = 'S'
If the output only consists of records where control=S then I do not understand how "10% of each idAction have to be defined by the number of rows with control='N'.
Please describe in much more detail how the control = 'N' records are actually used. LOTS more detail.
Hi, @ballardw! Thanks for replying!
So I think you've got the idea, but to clarify more, here are some more details:
The clients marked with control = N are the ones targeted, and the sample must be created based on its total.
On the other hand, control = S are my control group, which need to be 10% of the targeted group.
That's why I need to "cross" these proportions.
I don't know if I've made myself clear (English isn't my native languague), but I'm available to give anymore informations.
Thanks again!
data have;
infile cards dlm=',';
input idAction $ control $ idClient $ ;
datalines;
28004,N,40045
28004,N,40311
28004,N,40404
28004,N,35386
28004,N,162426
28004,N,163213
28004,S,149327
28004,S,163481
28004,N,163645
28004,N,149653
28004,N,157771
28004,N,303829
28004,N,304119
28004,S,290727
28004,S,286589
28004,N,304922
28004,S,286922
28004,N,292085
28004,S,450891
28004,S,506107
;;;;
run;
proc freq data=have;
where control = 'N';
table idAction / out=counts;
run;
data sampSizeSpecifications;
set counts;
_nsize_ = ceil(0.1*Count);
run;
proc surveyselect data=have method = srs sampsize = sampSizeSpecifications out=selected;
where control = 'S';
strata idAction;
run;
Are you sure you don't want PROC PSMATCH and case control matching?
Thank you very much, @Reeza!
About your observation of using the PROC PSMATCH, I'll use a method to pair the groups after, so I won't need this right now.
Nonetheless, thanks for the advice.
Regards, Renan.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.