Solved: Re: How to put data into different_decile_groups

redrover99 · Posted 08-17-2012 11:19 AM

Hi

For each fyear I am looking to put my data into 10 different groups based on the size of AT. I would like to create a new row containing the appropriate number (between 1-10).

I have heard this is achievable through commands but I cant seem to get it right. Help would be gratefully appreciated.

Thanking u in advance

PGStats · Posted 08-17-2012 09:18 PM

Yes, of course :

proc sql;

create table wantAve as

select year, at_decile, mean(at) as aveAt

from want

group by year, at_decile;

quit;

PG

View solution in original post

MikeZdeb · Posted 08-17-2012 11:45 AM

hi ... this will create a new variable (at_decile) that ranges from 0 to 9 within each year (you have a lot of missing values for AT that do not get assigned to any group ...

data have;

infile 'z:\practice.csv' dsd firstobs=2 truncover;

input lpermno year at;

run;

proc sort data=have;

by year;

run;

proc rank data=have out=want groups=10;

var at;

ranks at_decile;

by year;

run;

if you use the following, you'll see 10% of the observations within each year are in each group (plus how may missing values in each year) ...

proc freq data=want;

table year*at_decile / nocol nopercent missprint;

run;

if you want the deciles to range from 1 to 10 you'll have to use a data step after PROC RANK to add 1 to each decile value

mkeintz · Posted 08-17-2012 02:17 PM

What Mike suggests is just the way to go if you're breaking a given data set into deciles. But we have applications here in which you categorize one dataset based on deciles derived from another (for example size deciles from NYSE-listed companies applied to companies listed on AMEX and NASDAQ - don't ask me why):

proc univariate data=nyse (keep=at) noprint;
var at;
output out=nyse_brkpoints

pctlpts=0 to 100 by 10

pctlpre=PCTL_ /* vars to be named pctl_0 pctl_10 ... pctl_100 */

;

run;

data want (drop=pctl_:);

if _n_=1 then do;

set pctlpts;

array cats {1:10} pctl_0 -- pctl_90;

pctl_0=.;

end;

set mydata;

if not (missing(at)) then do decile=10 to 1 by -1 until (at>=cats{decile}); end;

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

redrover99 · Posted 08-17-2012 09:08 PM

Cool thanks guys, works hassle free!

PGStats · Posted 08-17-2012 09:18 PM

Yes, of course :

proc sql;

create table wantAve as

select year, at_decile, mean(at) as aveAt

from want

group by year, at_decile;

quit;

PG

redrover99 · Posted 08-17-2012 10:02 PM

Thanks!

mkeintz · Posted 08-17-2012 10:06 PM

Yes, but I suspect you are trying to get year-specific deciles - at least that's what is typical here. So if you have a DATE variable then:

proc univariate data=nyse noprint;
var at;

class date ; format date year4.;
output out=nyse_brkpoints

pctlpts=0 to 100 by 10

pctlpre=PCTL_ /* vars to be named pctl_0 pctl_10 ... pctl_100 */

;

run;

** So in the above you created one row per year. Below put each of those rows in a row of the CATS matrix. Assuming you have no data prior to 1980 and none after 2010, I bounded the row dimension of CATS to 1980:2010.

data want (drop=pctl_:);

** First populate the CATS matrix **;

if _n_=1 then do until (end_of_pctl);

set brkpoints end=end_of_pctl;

array pcts {1:10} pctl_0 -- pctl_90;

array cats {1980:2010,1:10} _temporary_;

year=year(date);

do C=1 to 10; cats{year,C}=pcts{C};end;

end;

** Now select a row in CATs to determine deciles for records in MYDATA

set mydata;

year=year(date); ** use this YEAR value to go to desired row of CATS **;

if not (missing(at)) then do decile=10 to 1 by -1 until (at>=cats{year,decile}); end;

run;

BTW, often you want to use last year's deciles to categorize this years' companies (numerous investment strategies do this). If so, then change the UNTIL expression to:

until (at>cats(year-1,decile))

Of course this only works if the calibration dataset has data in the year preceding the earliest year in MYDATA.

Then go ahead and make means for DECILE*YEAR or DECILE*DATE (formatted at year4.).

As to averages,

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Classroom Training Available!