BookmarkSubscribeRSS Feed
c8826024
Calcite | Level 5


Hi, everybody,

Did anybody know how I could select a random sample with a condition? Say, I have a 1 MM customers loan dataset. I wanted to select two random samples with a condition that the average loan balance in each sample should close to $800. Did anybody know how I could do it?

Thanks in advance.

5 REPLIES 5
Reeza
Super User

What does your data look like? and What are your other criteria?

Would a sample of 3 loans, say $10, $800, 1590 be a valid sample?

Hima
Obsidian | Level 7

You probably would have to adjust the code according to your requirement but this would help as start up for you.

Code:

DATA TEMP;
INPUT SAMPLE LOAN;
CARDS;
1 800
1 800
2 100
3 800
4 800
3 800
4 800
;
RUN;

PROC SQL;
SELECT SAMPLE, AVG(LOAN) AS AVG_LOAN FROM TEMP
WHERE RANUNI(111) < 0.55
GROUP BY SAMPLE HAVING AVG_LOAN = 800;
QUIT;

Output:

                                         sample  avg_loan

                                     

                                              1       800

                                              4       800

Patrick
Opal | Level 21

Not being a statistician in my simple world I would expect a RANDOM sample to have the same characteristics than the universe you draw it from. If this is true then you would first have to "tailor" yourself a universe with the desired characteristics.

I assume you first would have to sub-set your source table and then draw the sample from this sub-set (http://support.sas.com/kb/24/722.html).

Rick_SAS
SAS Super FREQ

A similar question was asked at  https://communities.sas.com/message/122173

Lots of suggestions there.  It's not clear how your $800 requirement relates to the properties of the data. Is $800 the average balance among all customers?

PGStats
Opal | Level 21

Two things to consider that would considerably change the approach taken:

  1. What should the variance of the sample sets be? The same for both sets or the same as the population?
  2. What is the purpose of the control over the mean (=800$). To eliminate the effect of a cofactor or to match the overall loan population?
PG

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3346 views
  • 0 likes
  • 6 in conversation