BookmarkSubscribeRSS Feed
c8826024
Calcite | Level 5


Hi, everybody,

Did anybody know how I could select a random sample with a condition? Say, I have a 1 MM customers loan dataset. I wanted to select two random samples with a condition that the average loan balance in each sample should close to $800. Did anybody know how I could do it?

Thanks in advance.

5 REPLIES 5
Reeza
Super User

What does your data look like? and What are your other criteria?

Would a sample of 3 loans, say $10, $800, 1590 be a valid sample?

Hima
Obsidian | Level 7

You probably would have to adjust the code according to your requirement but this would help as start up for you.

Code:

DATA TEMP;
INPUT SAMPLE LOAN;
CARDS;
1 800
1 800
2 100
3 800
4 800
3 800
4 800
;
RUN;

PROC SQL;
SELECT SAMPLE, AVG(LOAN) AS AVG_LOAN FROM TEMP
WHERE RANUNI(111) < 0.55
GROUP BY SAMPLE HAVING AVG_LOAN = 800;
QUIT;

Output:

                                         sample  avg_loan

                                     

                                              1       800

                                              4       800

Patrick
Opal | Level 21

Not being a statistician in my simple world I would expect a RANDOM sample to have the same characteristics than the universe you draw it from. If this is true then you would first have to "tailor" yourself a universe with the desired characteristics.

I assume you first would have to sub-set your source table and then draw the sample from this sub-set (http://support.sas.com/kb/24/722.html).

Rick_SAS
SAS Super FREQ

A similar question was asked at  https://communities.sas.com/message/122173

Lots of suggestions there.  It's not clear how your $800 requirement relates to the properties of the data. Is $800 the average balance among all customers?

PGStats
Opal | Level 21

Two things to consider that would considerably change the approach taken:

  1. What should the variance of the sample sets be? The same for both sets or the same as the population?
  2. What is the purpose of the control over the mean (=800$). To eliminate the effect of a cofactor or to match the overall loan population?
PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2826 views
  • 0 likes
  • 6 in conversation