Dear community,
I have a business data set that needs to be stratified based on business characteristics, namely
Each stratum shall include at least 'N' observations. The figure is fixed annually.
Variables STATE and BTYPE I already have, but BSIZE needs to be formed from variable SALES (annual sales). Each business has a unique register number (BREG).
Business size classes (based on sales) have been selected with following criteria (GE means >=, LT means <):
data CLASSES; input CLASS LOWLIMITGE UPLIMITLT; cards;
1 0 2000
2 2000 4000
3 4000 8000
4 8000 15000
5 15000 25000
6 25000 50000
7 50000 100000
8 100000 250000
9 250000 500000
10 500000 750000
11 750000 1000000
12 1000000 1500000
13 1500000 3000000
14 3000000 .
run;
Business size classes have been regulated to following strata:
Class 2 and 3: Superior or equal to 2000 and strictly inferior to 8000.
Class 4 and 5: Superior or equal to 8000 and strictly inferior to 25000.
Class 6 and 7: Superior or equal to 25000 and strictly inferior to 100000.
Class 8 and 9: Superior or equal to 100000 and strictly inferior to 500000.
Class 10 and 11: Superior or equal to 500000 and strictly inferior to 1000000.
Class 12-14: Superior or equal to 1000000.
Since this does not always work due to limited amount of business falling into these classes, there are three alternative classes available:
Class 3 and 4: Superior or equal to 4000 and strictly inferior to 15000.
Class 3-5: Superior or equal to 4000 and strictly inferior to 25000.
Class 10-14: Superior or equal to 500000.
I'm looking for a way to formalize this into SAS code so that business would be stratified into regulated classes so that given criteria (minimum N observations in stratum) is met.
Please help me out.
Example data. In reality I have 10000's of lines and target is to have around hundred observations for each stratum.
data BUSINESS;
input BREG SALES STATE BTYPE;
cards;
82586 34000 1 A
33371 14000 2 B
53592 37000 3 C
7462 73000 4 D
88541 16000 1 A
77074 207000 2 B
33588 484000 3 C
86921 356000 4 D
41686 720000 1 A
11111 597000 2 B
59248 85000 3 C
58803 439000 4 D
57636 660000 1 A
97406 103000 2 B
98838 539000 3 C
36371 2000 4 D
75789 624000 1 A
81511 509000 2 B
4898 14000 3 C
98233 6000 4 D
60726 86000 1 A
62752 63000 2 B
47901 49000 3 C
62261 50000 4 D
25577 23000 1 A
47730 98000 2 B
54319 88000 3 C
16655 26000 4 D
87014 60000 1 A
run;
Please post some example data in usable form (as you have already been shown in the solution to https://communities.sas.com/t5/SAS-Programming/Referring-to-fixed-constraints-and-calculated-values-...), and what you expect to get out of it.
I have added sample data and supplemented the post with additional information.
Such classification is usually handled best with formats:
data BUSINESS;
infile datalines dsd dlm='09'x;
input BREG $ SALES STATE BTYPE $;
datalines;
82586 34000 1 A
33371 14000 2 B
53592 37000 3 C
7462 73000 4 D
88541 16000 1 A
77074 207000 2 B
33588 484000 3 C
86921 356000 4 D
41686 720000 1 A
11111 597000 2 B
59248 85000 3 C
58803 439000 4 D
57636 660000 1 A
97406 103000 2 B
98838 539000 3 C
36371 2000 4 D
75789 624000 1 A
81511 509000 2 B
4898 14000 3 C
98233 6000 4 D
60726 86000 1 A
62752 63000 2 B
47901 49000 3 C
62261 50000 4 D
25577 23000 1 A
47730 98000 2 B
54319 88000 3 C
16655 26000 4 D
87014 60000 1 A
;
proc format;
value strat
0-<2000 = "1"
2000-<4000 = "2"
4000-<8000 = "3"
8000-<15000 = "4"
15000-<25000 = "5"
25000-<50000 = "6"
50000-<100000 = "7"
100000-<250000 = "8"
250000-<500000 = "9"
500000-<750000 = "10"
750000-<1000000 = "11"
1000000-<1500000 = "12"
1500000-<3000000 = "13"
3000000-high = "14"
;
run;
data want;
set business;
strat = put(sales,strat.);
run;
You can now run proc freq:
proc freq data=want;
tables strat;
run;
to decide if you need to adapt your format.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.