BookmarkSubscribeRSS Feed
1Alina
Fluorite | Level 6

Dear community,

 

I have a business data set that needs to be stratified based on business characteristics, namely

 

  1. STATE (location),
  2. BTYPE (business type) and
  3. BSIZE (business size class).

 

Each stratum shall include at least 'N' observations. The figure is fixed annually.

 

Variables STATE and BTYPE I already have, but BSIZE needs to be formed from variable SALES (annual sales). Each business has a unique register number (BREG).

 

Business size classes (based on sales) have been selected with following criteria (GE means >=, LT means <):

data CLASSES;
 input CLASS LOWLIMITGE UPLIMITLT;
cards;

1 0 2000
2 2000 4000
3 4000 8000
4 8000 15000
5 15000 25000
6 25000 50000
7 50000 100000
8 100000 250000
9 250000 500000
10 500000 750000
11 750000 1000000
12 1000000 1500000
13 1500000 3000000
14 3000000 .

run;

 

 

Business size classes have been regulated to following strata:

 

Class 2 and 3: Superior or equal to 2000 and strictly inferior to 8000.

Class 4 and 5: Superior or equal to 8000 and strictly inferior to 25000.

Class 6 and 7: Superior or equal to 25000 and strictly inferior to 100000.

Class 8 and 9: Superior or equal to 100000 and strictly inferior to 500000.

Class 10 and 11: Superior or equal to 500000 and strictly inferior to 1000000.

Class 12-14: Superior or equal to 1000000.

 

Since this does not always work due to limited amount of business falling into these classes, there are three alternative classes available:

 

Class 3 and 4: Superior or equal to 4000 and strictly inferior to 15000.

Class 3-5: Superior or equal to 4000 and strictly inferior to 25000.

Class 10-14: Superior or equal to 500000.

 

I'm looking for a way to formalize this into SAS code so that business would be stratified into regulated classes so that given criteria (minimum N observations in stratum) is met.

 

Please help me out.

 

Example data. In reality I have 10000's of lines and target is to have around hundred observations for each stratum.

data BUSINESS;
 input BREG SALES STATE BTYPE;
cards;

82586	34000	1	A
33371	14000	2	B
53592	37000	3	C
7462	73000	4	D
88541	16000	1	A
77074	207000	2	B
33588	484000	3	C
86921	356000	4	D
41686	720000	1	A
11111	597000	2	B
59248	85000	3	C
58803	439000	4	D
57636	660000	1	A
97406	103000	2	B
98838	539000	3	C
36371	2000	4	D
75789	624000	1	A
81511	509000	2	B
4898	14000	3	C
98233	6000	4	D
60726	86000	1	A
62752	63000	2	B
47901	49000	3	C
62261	50000	4	D
25577	23000	1	A
47730	98000	2	B
54319	88000	3	C
16655	26000	4	D
87014	60000	1	A
run;



3 REPLIES 3
1Alina
Fluorite | Level 6

I have added sample data and supplemented the post with additional information.

Kurt_Bremser
Super User

Such classification is usually handled best with formats:

data BUSINESS;
infile datalines dsd dlm='09'x;
input BREG $ SALES STATE BTYPE $;
datalines;
82586	34000	1	A
33371	14000	2	B
53592	37000	3	C
7462	73000	4	D
88541	16000	1	A
77074	207000	2	B
33588	484000	3	C
86921	356000	4	D
41686	720000	1	A
11111	597000	2	B
59248	85000	3	C
58803	439000	4	D
57636	660000	1	A
97406	103000	2	B
98838	539000	3	C
36371	2000	4	D
75789	624000	1	A
81511	509000	2	B
4898	14000	3	C
98233	6000	4	D
60726	86000	1	A
62752	63000	2	B
47901	49000	3	C
62261	50000	4	D
25577	23000	1	A
47730	98000	2	B
54319	88000	3	C
16655	26000	4	D
87014	60000	1	A
;

proc format;
value strat
  0-<2000 = "1"
  2000-<4000 = "2"
  4000-<8000 = "3"
  8000-<15000 = "4"
  15000-<25000 = "5"
  25000-<50000 = "6"
  50000-<100000 = "7"
  100000-<250000 = "8"
  250000-<500000 = "9"
  500000-<750000 = "10"
  750000-<1000000 = "11"
  1000000-<1500000 = "12"
  1500000-<3000000 = "13"
  3000000-high = "14"
;
run;

data want;
set business;
strat = put(sales,strat.);
run;

You can now run proc freq:

proc freq data=want;
tables strat;
run;

to decide if you need to adapt your format.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 576 views
  • 0 likes
  • 2 in conversation