Hi Everyone,
Is there a way to select on sorted data by type using a specific distribution? this my sound confusing b/c i dont know the exact terminology but here is my problem below:
I have 10k records and I need to select the top 1k records (based on balance). However, each record also has a Type, and to be fair i want to select the same percentage of records from each type. For example, in my data 50% of the records are Type A, 30% are type B, and 20% are type C. So I need to select the top 500 records of type A, top 300 records of type B and top 200 records of type C. All together this equals 1000. In reality my dataset has 6 types, my total dataset is 16M records and i have to select approx. 4.2M from this list.
i know i can calculate these percentages by hand, then write custom select statements in SAS, but I was thinking that this is a common enough problem that there must be a more automated way of doing this. Does anyone have any suggestions?
Dataset:
Id Type Bal
1 A 1000
2 C 90
3 A 1
4 B 203
5 B 980
6 A 89
... ... ...
Thanks in Advance!
... View more