Contributor
Posts: 38

# Random Sample Selection with a Condition

Hi, everybody,

Did anybody know how I could select a random sample with a condition? Say, I have a 1 MM customers loan dataset. I wanted to select two random samples with a condition that the average loan balance in each sample should close to \$800. Did anybody know how I could do it?

Super User
Posts: 20,701

## Re: Random Sample Selection with a Condition

What does your data look like? and What are your other criteria?

Would a sample of 3 loans, say \$10, \$800, 1590 be a valid sample?

Regular Contributor
Posts: 233

## Re: Random Sample Selection with a Condition

You probably would have to adjust the code according to your requirement but this would help as start up for you.

Code:

DATA TEMP;
INPUT SAMPLE LOAN;
CARDS;
1 800
1 800
2 100
3 800
4 800
3 800
4 800
;
RUN;

PROC SQL;
SELECT SAMPLE, AVG(LOAN) AS AVG_LOAN FROM TEMP
WHERE RANUNI(111) < 0.55
GROUP BY SAMPLE HAVING AVG_LOAN = 800;
QUIT;

Output:

sample  avg_loan

1       800

4       800

Posts: 4,234

## Re: Random Sample Selection with a Condition

Not being a statistician in my simple world I would expect a RANDOM sample to have the same characteristics than the universe you draw it from. If this is true then you would first have to "tailor" yourself a universe with the desired characteristics.

I assume you first would have to sub-set your source table and then draw the sample from this sub-set (http://support.sas.com/kb/24/722.html).

SAS Super FREQ
Posts: 3,834

## Re: Random Sample Selection with a Condition

A similar question was asked at  https://communities.sas.com/message/122173

Lots of suggestions there.  It's not clear how your \$800 requirement relates to the properties of the data. Is \$800 the average balance among all customers?

Posts: 5,043

## Re: Random Sample Selection with a Condition

Two things to consider that would considerably change the approach taken:

1. What should the variance of the sample sets be? The same for both sets or the same as the population?
2. What is the purpose of the control over the mean (=800\$). To eliminate the effect of a cofactor or to match the overall loan population?
PG
Discussion stats
• 5 replies
• 536 views
• 0 likes
• 6 in conversation