BookmarkSubscribeRSS Feed
c8826024
Calcite | Level 5


Hi, everybody,

Did anybody know how I could select a random sample with a condition? Say, I have a 1 MM customers loan dataset. I wanted to select two random samples with a condition that the average loan balance in each sample should close to $800. Did anybody know how I could do it?

Thanks in advance.

5 REPLIES 5
Reeza
Super User

What does your data look like? and What are your other criteria?

Would a sample of 3 loans, say $10, $800, 1590 be a valid sample?

Hima
Obsidian | Level 7

You probably would have to adjust the code according to your requirement but this would help as start up for you.

Code:

DATA TEMP;
INPUT SAMPLE LOAN;
CARDS;
1 800
1 800
2 100
3 800
4 800
3 800
4 800
;
RUN;

PROC SQL;
SELECT SAMPLE, AVG(LOAN) AS AVG_LOAN FROM TEMP
WHERE RANUNI(111) < 0.55
GROUP BY SAMPLE HAVING AVG_LOAN = 800;
QUIT;

Output:

                                         sample  avg_loan

                                     

                                              1       800

                                              4       800

Patrick
Opal | Level 21

Not being a statistician in my simple world I would expect a RANDOM sample to have the same characteristics than the universe you draw it from. If this is true then you would first have to "tailor" yourself a universe with the desired characteristics.

I assume you first would have to sub-set your source table and then draw the sample from this sub-set (http://support.sas.com/kb/24/722.html).

Rick_SAS
SAS Super FREQ

A similar question was asked at  https://communities.sas.com/message/122173

Lots of suggestions there.  It's not clear how your $800 requirement relates to the properties of the data. Is $800 the average balance among all customers?

PGStats
Opal | Level 21

Two things to consider that would considerably change the approach taken:

  1. What should the variance of the sample sets be? The same for both sets or the same as the population?
  2. What is the purpose of the control over the mean (=800$). To eliminate the effect of a cofactor or to match the overall loan population?
PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2906 views
  • 0 likes
  • 6 in conversation