- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello ... everyone. I'm a beginner at using SAS. I need help with the problem I am facing with the data process I am doing. I have 900,000 observations without duplication of data, where this data consists of the variables id, id_sex, visit date, disease (which I have processed so that it will contain yes or no, 1/0). From this data I want to try to make a comparison with a certain percentage and use simple random sampling without replacement with the following comparison and desired results:
F --- 20% -------> F
F --- 80% ------->M
M --- 20% ------->M
M --- 80% ------->F
and the expected results table is like this:
Gender | Yes | No |
F | ……. % | ……. % |
M | ……. % | ……. % |
Could anyone help me, please? Thank you for your kindness.
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Use PROC SURVEYSELECT
First build a small table with the size/proportions needed to pass to the proc.
Proc surveyselect data=have samprate=sample_proportions method=srs
Seed=123 out = want;
Strata sex;
Run;
I’m not sure I understand your sample rate table so if you could explain that it may help. This should help you get started though.
Documentation:
@Monhieq wrote:
Hello ... everyone. I'm a beginner at using SAS. I need help with the problem I am facing with the data process I am doing. I have 900,000 observations without duplication of data, where this data consists of the variables id, id_sex, visit date, disease (which I have processed so that it will contain yes or no, 1/0). From this data I want to try to make a comparison with a certain percentage and use simple random sampling without replacement with the following comparison and desired results:
F --- 20% -------> F
F --- 80% ------->M
M --- 20% ------->M
M --- 80% ------->F
and the expected results table is like this:
Gender
Yes
No
F
……. %
……. %
M
……. %
……. %
Could anyone help me, please? Thank you for your kindness.
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am sorry Reeza, could you please give me the example? I want to see how many% of patients if I use my data randomly for simple random sampling so that it will produce a comparison table F to F a percentage of 20 for the results of yes and no, F to M with a percentage of 80; and vice versa with M, as I have written above. Do you mean I have to generate my data to make a sample table? Forgive me if I do not understand. Please give me an explanation. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't know what this means:
I want to see how many% of patients if I use my data randomly for simple random sampling so that it will produce a comparison table F to F a percentage of 20 for the results of yes and no, F to M with a percentage of 80; and vice versa with M, as I have written above.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Reeza wrote:I don't know what this means:
I want to see how many% of patients if I use my data randomly for simple random sampling so that it will produce a comparison table F to F a percentage of 20 for the results of yes and no, F to M with a percentage of 80; and vice versa with M, as I have written above.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You seem to skipped a track in the middle of the explanation of what you want.
Do you want to calculate a statistic on the full 900K dataset?
Do you want to take a sample from the 900K dataset and then calculate a statistic?
What statistic do you want to calculate? I seems like you want to calculate the percent that are male (and hence also the percent that are femaile).
What do you want to compare? Do you want to compare the statistic calculated on the sample to the statistics calculated on the full dataset?
Or are you just saying you want to sample the males and females separately so that you can maintain the same relative frequency of males and females in the sample as existed in the full dataset?
Or do you want the sample in a way that the relative frequency of males and females in the sample is different than in the full dataset? Perhaps matching some preset ration, like 4 to 1 (80%/20%)?
But what does your 2x2 table represent? Are there two gender variables in your dataset? Do some people change sex?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, Tom, you guided me on how to say what I mean
I have one variable (call it Diss) in my dataset that defines patients with disease A, and is filled with Yes or No.
I want a sample where the relative frequencies of males and females in the sample are different and using the complete dataset.
the expected matching ratio is like F: M (20:80) and for M: F (20:80).
table 2x2 represents the comparison after sampling so that the table will contain percentage results for gender F with Diss valued Yes and No, and likewise with gender M.
Do I now explain well what I want? Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Monhieq wrote:
Do I now explain well what I want? Thank you.
Yes, much clearer!
Here's an example of how this can happen.
The proc freq is check only, you can delete it or comment it out.
Since you need multiple samples I wrote a short macro where you just put the percentages and it provides you with an output data set.
If you're not familiar with macros review these links:
UCLA introductory tutorial on macro variables and macros
https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/
Tutorial on converting a working program to a macro
This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md
Examples of common macro usage
https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...
*dataset has to be sorted;
proc sort data=sashelp.class out=class;
by sex;
run;
%macro randomSelect(dsn= , rate_F = , rate_M= , out_dsn=);
%*dsn = input data set name;
%*rate_F = rate of females selected;
%*rate_M = rate of males selected;
%*NOTE rate_F + rate_M should add up to 100;
%*out_dsn = output data set name;
Proc surveyselect data=&dsn samprate=(&rate_f. &rate_M.) method=srs
Seed=123 out = &out_dsn. ;
Strata sex;
Run;
*check rates;
proc freq data=&out_dsn;
table sex;
run;
%mend;
%randomSelect(dsn=class, rate_f=0.2, rate_M=0.8, out_dsn=want1);
%randomSelect(dsn=class, rate_f=0.8, rate_M=0.2, out_dsn=want2);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank so much Reeza ... i understand. i will learn that and try to .
Thank you 🙂