## Can we use proc univariate for getting a clue on group cut off lines

Solved
Super Contributor
Posts: 338

# Can we use proc univariate for getting a clue on group cut off lines

Hi Collegues,

I have the attahced data set with a single variable.

Problem:

I need to categorize these obs into some meaningful income groups not driven by any business logic but based on the distribution of data. I do not have any clue what cut off points that I should impose for getting the groups. I was just given a huge data set by the employer.

So, I have run the below code to get a sense how to decide the "income group boundaries"

proc univariate data = have plots;

var income;

run;

Then I got the following "Quantiles".

 100% Max 393 99% 393 95% 392 90% 360 75% Q3 138 50% Median 41 25% Q1 0 10% 0 5% -9 1% -10 0% Min -10

Based on above qunatiles, I decided the following boundaries.

data want;

length Income_Range \$20;

set have;

if Income = . then Income_Range='Missing';

else if Income   <=-9   then Income_Range = '<=-9';

else if Income   <=-8   then Income_Range = '-8 to -9';

else if Income   <=-1   then Income_Range = '-1 to -8';

else if Income   <=0  then Income_Range ='-7 to 0';

else if Income   <=41 then  Income_Range ='1-41';

else if Income   <=392 then  Income_Range ='42-392';

else Income_Range = '>392';

run;

Question:

1). Can we use Univariate approach like this to decide "boundaries" for income range if we do not have any clue what the boundary cut offs are?

2). My SAS code looks fine for this small data set, but they sometimes dangerously omit some data when apply to large data set. Could any expert make sure this code is error free

Thank you for the help

Mirisage

Accepted Solutions
Solution
‎07-11-2012 11:22 PM
Occasional Contributor
Posts: 5

## Re: Can we use proc univariate for getting a clue on group cut off lines

You can use proc univariate to create 20 groups/10 equal groups.

One way is to use proc rank with groups=10 option

SAS Code:

proc rank data = have groups = 10 out = out1;

var income;

ranks predgroup;

run;

Here out1 will have rank variable taking values 0 to 9. 10 equal groups.

Another way is to use proc univariate with output statement;

SAS Code:

proc univariate data = have;

var income;

output out=out1 pctlpts = 10 to 100 by 10 pctlpre = inc;

run;

This gives you 10%,20%...90% & 100% percentile points.

You can create proc format to create formats which can be easily used over the course of diff programs.

There are some advantages in using proc univariate:

1. Weight option is available in proc univariate(not available in proc rank)

2. You dont have to create the whole dataset again. It is simple

Thanks,

zilok

All Replies
Solution
‎07-11-2012 11:22 PM
Occasional Contributor
Posts: 5

## Re: Can we use proc univariate for getting a clue on group cut off lines

You can use proc univariate to create 20 groups/10 equal groups.

One way is to use proc rank with groups=10 option

SAS Code:

proc rank data = have groups = 10 out = out1;

var income;

ranks predgroup;

run;

Here out1 will have rank variable taking values 0 to 9. 10 equal groups.

Another way is to use proc univariate with output statement;

SAS Code:

proc univariate data = have;

var income;

output out=out1 pctlpts = 10 to 100 by 10 pctlpre = inc;

run;

This gives you 10%,20%...90% & 100% percentile points.

You can create proc format to create formats which can be easily used over the course of diff programs.

There are some advantages in using proc univariate:

1. Weight option is available in proc univariate(not available in proc rank)

2. You dont have to create the whole dataset again. It is simple

Thanks,

zilok

Posts: 2,048

## Re: Can we use proc univariate for getting a clue on group cut off lines

```Question:
1). Can we use Univariate approach like this to decide "boundaries" for income range if we do not have any clue what the boundary cut offs are?
```

Yes, of course, you can take SAS (or any other software) and chop up a continuous variable into "groups". But just because you can doesn't mean you should. The information contained in a continuous variable like income is destroyed by chopping it up into intervals. Depending on what you plan to do with this data, your best approach may be to leave the data as continuous, not groups. And even so, if you MUST have intervals, the proper intervals depend on what you plan to do with this data, and you haven't told us. You asked for "meaningful income groups", but your result is empirical groups, not meaningful groups.

I think you are going down a dangerous path. You are trying to impose a statistical solution to a problem that logically does not have a statistical solution (at least, not until we know more about what you plan to do, and then groups might not be the best answer).

--
Paige Miller
Frequent Contributor
Posts: 139

## Re: Can we use proc univariate for getting a clue on group cut off lines

You could also use the HISTOGRAM option after var income / histogram; to see what SAS comes up with.

Super Contributor
Posts: 338

## Re: Can we use proc univariate for getting a clue on group cut off lines

Hi zilok, PaigeMiller and darrylovia,

Thanks very much for all of you. I learned a lot from your statistical and SAS expertise.

Warm regards

Mirisage

🔒 This topic is solved and locked.