Dear community,
I have a business data set that needs to be stratified based on business characteristics, namely
Each stratum shall include at least 'N' observations. The figure is fixed annually.
Variables STATE and BTYPE I already have, but BSIZE needs to be formed from variable SALES (annual sales). Each business has a unique register number (BREG).
Business size classes (based on sales) have been selected with following criteria (GE means >=, LT means <):
data CLASSES; input CLASS LOWLIMITGE UPLIMITLT; cards;
1 0 2000
2 2000 4000
3 4000 8000
4 8000 15000
5 15000 25000
6 25000 50000
7 50000 100000
8 100000 250000
9 250000 500000
10 500000 750000
11 750000 1000000
12 1000000 1500000
13 1500000 3000000
14 3000000 .
run;
Business size classes have been regulated to following strata:
Class 2 and 3: Superior or equal to 2000 and strictly inferior to 8000.
Class 4 and 5: Superior or equal to 8000 and strictly inferior to 25000.
Class 6 and 7: Superior or equal to 25000 and strictly inferior to 100000.
Class 8 and 9: Superior or equal to 100000 and strictly inferior to 500000.
Class 10 and 11: Superior or equal to 500000 and strictly inferior to 1000000.
Class 12-14: Superior or equal to 1000000.
Since this does not always work due to limited amount of business falling into these classes, there are three alternative classes available:
Class 3 and 4: Superior or equal to 4000 and strictly inferior to 15000.
Class 3-5: Superior or equal to 4000 and strictly inferior to 25000.
Class 10-14: Superior or equal to 500000.
I'm looking for a way to formalize this into SAS code so that business would be stratified into regulated classes so that given criteria (minimum N observations in stratum) is met.
Please help me out.
Example data. In reality I have 10000's of lines and target is to have around hundred observations for each stratum.
data BUSINESS;
input BREG SALES STATE BTYPE;
cards;
82586 34000 1 A
33371 14000 2 B
53592 37000 3 C
7462 73000 4 D
88541 16000 1 A
77074 207000 2 B
33588 484000 3 C
86921 356000 4 D
41686 720000 1 A
11111 597000 2 B
59248 85000 3 C
58803 439000 4 D
57636 660000 1 A
97406 103000 2 B
98838 539000 3 C
36371 2000 4 D
75789 624000 1 A
81511 509000 2 B
4898 14000 3 C
98233 6000 4 D
60726 86000 1 A
62752 63000 2 B
47901 49000 3 C
62261 50000 4 D
25577 23000 1 A
47730 98000 2 B
54319 88000 3 C
16655 26000 4 D
87014 60000 1 A
run;Please post some example data in usable form (as you have already been shown in the solution to https://communities.sas.com/t5/SAS-Programming/Referring-to-fixed-constraints-and-calculated-values-...), and what you expect to get out of it.
I have added sample data and supplemented the post with additional information.
Such classification is usually handled best with formats:
data BUSINESS;
infile datalines dsd dlm='09'x;
input BREG $ SALES STATE BTYPE $;
datalines;
82586 34000 1 A
33371 14000 2 B
53592 37000 3 C
7462 73000 4 D
88541 16000 1 A
77074 207000 2 B
33588 484000 3 C
86921 356000 4 D
41686 720000 1 A
11111 597000 2 B
59248 85000 3 C
58803 439000 4 D
57636 660000 1 A
97406 103000 2 B
98838 539000 3 C
36371 2000 4 D
75789 624000 1 A
81511 509000 2 B
4898 14000 3 C
98233 6000 4 D
60726 86000 1 A
62752 63000 2 B
47901 49000 3 C
62261 50000 4 D
25577 23000 1 A
47730 98000 2 B
54319 88000 3 C
16655 26000 4 D
87014 60000 1 A
;
proc format;
value strat
0-<2000 = "1"
2000-<4000 = "2"
4000-<8000 = "3"
8000-<15000 = "4"
15000-<25000 = "5"
25000-<50000 = "6"
50000-<100000 = "7"
100000-<250000 = "8"
250000-<500000 = "9"
500000-<750000 = "10"
750000-<1000000 = "11"
1000000-<1500000 = "12"
1500000-<3000000 = "13"
3000000-high = "14"
;
run;
data want;
set business;
strat = put(sales,strat.);
run;
You can now run proc freq:
proc freq data=want;
tables strat;
run;
to decide if you need to adapt your format.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.