BookmarkSubscribeRSS Feed
Monhieq
Fluorite | Level 6

Hello ... everyone. I'm a beginner at using SAS. I need help with the problem I am facing with the data process I am doing. I have 900,000 observations without duplication of data, where this data consists of the variables id, id_sex, visit date, disease (which I have processed so that it will contain yes or no, 1/0). From this data I want to try to make a comparison with a certain percentage and use simple random sampling without replacement with the following comparison and desired results:

F  --- 20% -------> F

F  --- 80% ------->M

M --- 20% ------->M

M --- 80% ------->F

 

and the expected results table is like this:

Gender

Yes

No

F

……. %

……. %

M

……. %

……. %

 

Could anyone help me, please?  Thank you for your kindness.

 

Regards,

 

 

8 REPLIES 8
Reeza
Super User

Use PROC SURVEYSELECT

 

First build a small table with the size/proportions needed to pass to the proc. 

Proc surveyselect data=have samprate=sample_proportions method=srs 
Seed=123 out = want;
Strata sex;
Run;


I’m not sure I understand your sample rate table so if you could explain that it may help. This should help you get started though. 

Documentation:

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_surveyselect_syntax01.htm&docsetV...

 


@Monhieq wrote:

Hello ... everyone. I'm a beginner at using SAS. I need help with the problem I am facing with the data process I am doing. I have 900,000 observations without duplication of data, where this data consists of the variables id, id_sex, visit date, disease (which I have processed so that it will contain yes or no, 1/0). From this data I want to try to make a comparison with a certain percentage and use simple random sampling without replacement with the following comparison and desired results:

F  --- 20% -------> F

F  --- 80% ------->M

M --- 20% ------->M

M --- 80% ------->F

 

and the expected results table is like this:

Gender

Yes

No

F

……. %

……. %

M

……. %

……. %

 

Could anyone help me, please?  Thank you for your kindness.

 

Regards,

 

 


 

 

Monhieq
Fluorite | Level 6

I am sorry Reeza, could you please give me the example? I want to see how many% of patients if I use my data randomly for simple random sampling so that it will produce a comparison table F to F a percentage of 20 for the results of yes and no, F to M with a percentage of 80; and vice versa with M, as I have written above. Do you mean I have to generate my data to make a sample table? Forgive me if I do not understand. Please give me an explanation. Thank you.

Reeza
Super User

I don't know what this means:

 I want to see how many% of patients if I use my data randomly for simple random sampling so that it will produce a comparison table F to F a percentage of 20 for the results of yes and no, F to M with a percentage of 80; and vice versa with M, as I have written above. 

Hg3
Calcite | Level 5 Hg3
Calcite | Level 5

@Reeza wrote:

I don't know what this means:

 I want to see how many% of patients if I use my data randomly for simple random sampling so that it will produce a comparison table F to F a percentage of 20 for the results of yes and no, F to M with a percentage of 80; and vice versa with M, as I have written above. 


 

Tom
Super User Tom
Super User

You seem to skipped a track in the middle of the explanation of what you want.

Do you want to calculate a statistic on the full 900K dataset?

Do you want to take a sample from the 900K dataset and then calculate a statistic?

 

What statistic do you want to calculate?  I seems like you want to calculate the percent that are male (and hence also the percent that are femaile).

 

What do you want to compare? Do you want to compare the statistic calculated on the sample to the statistics calculated on the full dataset?  

 

Or are you just saying you want to sample the males and females separately so that you can maintain the  same relative frequency of males and females in the sample as existed in the full dataset?

 

Or do you want the sample in a way that the relative frequency of males and females in the sample is different than in the full dataset? Perhaps matching some preset ration, like 4 to 1 (80%/20%)?

 

But what does your 2x2 table represent?  Are there two gender variables in your dataset? Do some people change sex?

Monhieq
Fluorite | Level 6

Thank you, Tom, you guided me on how to say what I mean

 

I have one variable (call it Diss) in my dataset that defines patients with disease A, and is filled with Yes or No.

 

I want a sample where the relative frequencies of males and females in the sample are different and using the complete dataset.

the expected matching ratio is like F: M (20:80) and for M: F (20:80).

 

table 2x2 represents the comparison after sampling so that the table will contain percentage results for gender F with Diss valued Yes and No, and likewise with gender M.

 

Do I now explain well what I want?  Thank you.

Reeza
Super User

@Monhieq wrote:

 

Do I now explain well what I want?  Thank you.


Yes, much clearer!

 

Here's an example of how this can happen. 

The proc freq is  check only, you can delete it or comment it out. 

Since you need multiple samples I wrote a short macro where you just put the percentages and it provides you with an output data set. 

 

If you're not familiar with macros review these links:

UCLA introductory tutorial on macro variables and macros

https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

Tutorial on converting a working program to a macro

This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

Examples of common macro usage

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

 

 

 

*dataset has to be sorted;
proc sort data=sashelp.class out=class;
by sex;
run;


%macro randomSelect(dsn= , rate_F = , rate_M= , out_dsn=);

%*dsn = input data set name;
%*rate_F = rate of females selected;
%*rate_M = rate of males selected;
%*NOTE rate_F + rate_M should add up to 100;
%*out_dsn = output data set name; Proc surveyselect data=&dsn samprate=(&rate_f. &rate_M.) method=srs Seed=123 out = &out_dsn. ; Strata sex; Run; *check rates; proc freq data=&out_dsn; table sex; run; %mend; %randomSelect(dsn=class, rate_f=0.2, rate_M=0.8, out_dsn=want1); %randomSelect(dsn=class, rate_f=0.8, rate_M=0.2, out_dsn=want2);
Monhieq
Fluorite | Level 6

Thank so much Reeza ... i understand. i will learn that and try to .

 

Thank you 🙂 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2742 views
  • 5 likes
  • 4 in conversation