## data into different groups

Solved
Occasional Contributor
Posts: 17

# data into different groups

Hi

For each fyear I am looking to put my data into 10 different groups based on the size of AT. I would like to create a new row containing the appropriate number (between 1-10).

I have heard this is achievable through commands but I cant seem to get it right. Help would be gratefully appreciated.

Accepted Solutions
Solution
‎08-17-2012 09:18 PM
Posts: 5,523

## Re: How to put data into different_decile_groups

Yes, of course :

proc sql;

create table wantAve as

select year, at_decile, mean(at) as aveAt

from want

group by year, at_decile;

quit;

PG

PG

All Replies
Valued Guide
Posts: 765

## Re: How to put data into different_decile_groups

hi ... this will create a new variable (at_decile) that ranges from 0 to 9 within each year (you have a lot of missing values for AT that do not get assigned to any group ...

data have;

infile 'z:\practice.csv' dsd firstobs=2 truncover;

input lpermno year at;

run;

proc sort data=have;

by year;

run;

proc rank data=have out=want groups=10;

var at;

ranks at_decile;

by year;

run;

if you use the following, you'll see 10% of the observations within each year are in each group (plus how may missing values in each year) ...

proc freq data=want;

table year*at_decile / nocol nopercent missprint;

run;

if you want the deciles to range from 1 to 10 you'll have to use a data step after PROC RANK to add 1 to each decile value

Posts: 1,337

## Re: How to put data into different_decile_groups

What Mike suggests is just the way to go if you're breaking a given data set into deciles.  But we have applications here in which you categorize one dataset based on deciles derived from another (for example size deciles from NYSE-listed companies applied to companies listed on AMEX and NASDAQ - don't ask me why):

proc univariate data=nyse (keep=at) noprint;
var at;
output out=nyse_brkpoints

pctlpts=0 to 100 by 10

pctlpre=PCTL_   /*  vars to be named pctl_0 pctl_10 ... pctl_100 */

;

run;

data want (drop=pctl_;

if _n_=1 then do;

set pctlpts;

array cats {1:10} pctl_0 -- pctl_90;

pctl_0=.;

end;

set mydata;

if not (missing(at)) then do decile=10 to 1 by -1 until (at>=cats{decile}); end;

run;

Occasional Contributor
Posts: 17

## Re: How to put data into different_decile_groups

Cool thanks guys, works hassle free!

Solution
‎08-17-2012 09:18 PM
Posts: 5,523

## Re: How to put data into different_decile_groups

Yes, of course :

proc sql;

create table wantAve as

select year, at_decile, mean(at) as aveAt

from want

group by year, at_decile;

quit;

PG

PG
Occasional Contributor
Posts: 17

Thanks!

Posts: 1,337

## Re: How to put data into different_decile_groups

Yes, but I suspect you are trying to get year-specific deciles - at least that's what is typical here.  So if you have a DATE variable then:

proc univariate data=nyse noprint;
var at;

class date ;  format date year4.;
output out=nyse_brkpoints

pctlpts=0 to 100 by 10

pctlpre=PCTL_   /*  vars to be named pctl_0 pctl_10 ... pctl_100 */

;

run;

** So in the above you created one row per year.  Below put each of those rows in a row of the CATS matrix.  Assuming you have no data prior to 1980 and none after 2010, I bounded the row dimension of CATS to 1980:2010.

data want (drop=pctl_;

** First populate the CATS matrix **;

if _n_=1 then do until (end_of_pctl);

set brkpoints  end=end_of_pctl;

array pcts {1:10} pctl_0 -- pctl_90;

array cats {1980:2010,1:10}  _temporary_;

year=year(date);

do C=1 to 10; cats{year,C}=pcts{C};end;

end;

** Now select a row in CATs to determine deciles for records in MYDATA

set mydata;

year=year(date);   ** use this YEAR value to go to desired row of CATS **;

if not (missing(at)) then do decile=10 to 1 by -1 until (at>=cats{year,decile}); end;

run;

BTW, often you want to use last year's deciles to categorize this years' companies (numerous investment strategies do this).  If so, then change the UNTIL expression to:

until (at>cats(year-1,decile))

Of course this only works if the calibration dataset has data in the year preceding the earliest year in MYDATA.

Then go ahead and make means for DECILE*YEAR  or DECILE*DATE (formatted at year4.).

As to averages,

🔒 This topic is solved and locked.