Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-01-2019 01:24 AM
(1483 views)

Hello ... everyone. I'm a beginner at using SAS. I need help with the problem I am facing with the data process I am doing. I have 900,000 observations without duplication of data, where this data consists of the variables id, id_sex, visit date, disease (which I have processed so that it will contain yes or no, 1/0). From this data I want to try to make a comparison with a certain percentage and use simple random sampling without replacement with the following comparison and desired results:

F --- 20% -------> F

F --- 80% ------->M

M --- 20% ------->M

M --- 80% ------->F

and the expected results table is like this:

Gender | Yes | No |

F | ……. % | ……. % |

M | ……. % | ……. % |

Could anyone help me, please? Thank you for your kindness.

Regards,

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Use PROC SURVEYSELECT

First build a small table with the size/proportions needed to pass to the proc.

```
Proc surveyselect data=have samprate=sample_proportions method=srs
Seed=123 out = want;
Strata sex;
Run;
```

I’m not sure I understand your sample rate table so if you could explain that it may help. This should help you get started though.

Documentation:

@Monhieq wrote:

Hello ... everyone. I'm a beginner at using SAS. I need help with the problem I am facing with the data process I am doing. I have 900,000 observations without duplication of data, where this data consists of the variables id, id_sex, visit date, disease (which I have processed so that it will contain yes or no, 1/0). From this data I want to try to make a comparison with a certain percentage and use simple random sampling without replacement with the following comparison and desired results:

F --- 20% -------> F

F --- 80% ------->M

M --- 20% ------->M

M --- 80% ------->F

and the expected results table is like this:

Gender

Yes

No

F

……. %

……. %

M

……. %

……. %

Could anyone help me, please? Thank you for your kindness.

Regards,

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don't know what this means:

I want to see how many% of patients if I use my data randomly for simple random sampling so that it will produce a comparison table F to F a percentage of 20 for the results of yes and no, F to M with a percentage of 80; and vice versa with M, as I have written above.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Reeza wrote:I don't know what this means:

I want to see how many% of patients if I use my data randomly for simple random sampling so that it will produce a comparison table F to F a percentage of 20 for the results of yes and no, F to M with a percentage of 80; and vice versa with M, as I have written above.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You seem to skipped a track in the middle of the explanation of what you want.

Do you want to calculate a statistic on the full 900K dataset?

Do you want to take a sample from the 900K dataset and then calculate a statistic?

What statistic do you want to calculate? I seems like you want to calculate the percent that are male (and hence also the percent that are femaile).

What do you want to compare? Do you want to compare the statistic calculated on the sample to the statistics calculated on the full dataset?

Or are you just saying you want to sample the males and females separately so that you can maintain the same relative frequency of males and females in the sample as existed in the full dataset?

Or do you want the sample in a way that the relative frequency of males and females in the sample is different than in the full dataset? Perhaps matching some preset ration, like 4 to 1 (80%/20%)?

But what does your 2x2 table represent? Are there two gender variables in your dataset? Do some people change sex?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you, Tom, you guided me on how to say what I mean

I have one variable (call it Diss) in my dataset that defines patients with disease A, and is filled with Yes or No.

I want a sample where the relative frequencies of males and females in the sample are different and using the complete dataset.

the expected matching ratio is like F: M (20:80) and for M: F (20:80).

table 2x2 represents the comparison after sampling so that the table will contain percentage results for gender F with Diss valued Yes and No, and likewise with gender M.

Do I now explain well what I want? Thank you.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Monhieq wrote:

Do I now explain well what I want? Thank you.

Yes, much clearer!

Here's an example of how this can happen.

The proc freq is check only, you can delete it or comment it out.

Since you need multiple samples I wrote a short macro where you just put the percentages and it provides you with an output data set.

If you're not familiar with macros review these links:

UCLA introductory tutorial on macro variables and macros

https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

Tutorial on converting a working program to a macro

This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

Examples of common macro usage

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

```
*dataset has to be sorted;
proc sort data=sashelp.class out=class;
by sex;
run;
%macro randomSelect(dsn= , rate_F = , rate_M= , out_dsn=);
%*dsn = input data set name;
%*rate_F = rate of females selected;
%*rate_M = rate of males selected;
%*NOTE rate_F + rate_M should add up to 100;
```

%*out_dsn = output data set name;
Proc surveyselect data=&dsn samprate=(&rate_f. &rate_M.) method=srs
Seed=123 out = &out_dsn. ;
Strata sex;
Run;
*check rates;
proc freq data=&out_dsn;
table sex;
run;
%mend;
%randomSelect(dsn=class, rate_f=0.2, rate_M=0.8, out_dsn=want1);
%randomSelect(dsn=class, rate_f=0.8, rate_M=0.2, out_dsn=want2);

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank so much Reeza ... i understand. i will learn that and try to .

Thank you 🙂

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

Upcoming Events

- VALSUG User Group Meeting | 07-Nov-2024
- Ask the Expert: How Do I Perform Customer Segmentation With SAS Intelligent Decisioning? | 12-Nov-2024
- Ask the Expert: How Do I Perform Customer Segmentation With SAS Intelligent Decisioning? | 12-Nov-2024
- Club SAS de Quebec | 13-Nov-2024
- Hands-on Workshop: SAS® Viya® Workbench (at SAS Headquarters) | 13-Nov-2024
- SAS Bowl XLV, SAS Visual Analytics | 13-Nov-2024
- Ask the Expert: Leveraging R for Statistical Analysis in LSAF | 19-Nov-2024

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.