Hi All,
I'm kind off stuck in a middle of a code and need some help. I have a population where it needs to be split 60/40. Then the 40% population should be evenly distributed(50/50) based on Rank variable. Any left over records after 50/50 split should be deleted.
So for example I have
27 total records
14 Records- segA
13 Records- SegB
The SegB population has total 13 records out of which 10 records with Rank = 'Y' and 3 records with Rank = 'N'. Now the 13 records should be evenly split based on Rank flag. Meaning Randomly select 3 accounts from Rank = 'Y' and delete the remaining 7 left over records.
So if i have 13 records in the segB population
10 - Y's
3 - N's
I would expect to see 6 total records
3 Y's(remove 7 Y records randomly)
3 N's
So my total final volume would be 20(14+6) records.
Sample data set
Data vendor_seg;
Infile cards;
Input ID $ 1-4 Rank $ 6;
cards;
1771 Y
3456 Y
5134 N
9087 Y
2829 Y
2749 Y
9987 Y
3094 Y
2938 N
8320 Y
5681 Y
2922 N
0987 Y
1108 N
3879 N
8734 N
1930 N
4752 Y
5609 N
7541 Y
8745 Y
9109 N
5904 Y
7145 Y
0876 N
2939 Y
7340 Y
;
Run;
Data vendor_seg2;
Set vendor_seg;
ranspt = ranuni(1239);
if ranspt < 0.6 then popul = 'SegA';else popu = 'SegB';
Run;
Now the catch here is sometimes Rank = 'N' count can be higher in segB population. So I have to take the lowest Rank record count and adjust the other rank volume accordingly to get a even 50/50 count.
I tried to do it in multiple steps by counting no of records in Rank = N population and using _n_ to delete the Y records from segB but it wasn't working. I am sure there is a best way to do this and you experts can get this all done in a single data step!.
Thanks in advance for your help!!
Compute the required sampling rates in each strata (Y and N), use surveyselect to draw the sample and use a WHERE clause to drop the strata=N from the selected sample:
data vendor_seg;
Input ID $ 1-4 Rank $ 6;
datalines;
1771 Y
3456 Y
5134 N
9087 Y
2829 Y
2749 Y
9987 Y
3094 Y
2938 N
8320 Y
5681 Y
2922 N
0987 Y
1108 N
3879 N
8734 N
1930 N
4752 Y
5609 N
7541 Y
8745 Y
9109 N
5904 Y
7145 Y
0876 N
2939 Y
7340 Y
;
proc sort data=vendor_seg; by rank; run;
proc sql;
create table strataRate as
select
rank,
0.2*sum(n)/n as _rate_
from
(select rank, count(*) as n from vendor_seg group by rank);
proc surveyselect data=vendor_seg seed=1239
rate=strataRate out=sample(where=(rank="Y"));
strata rank;
run;
PG
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.