Hi
For each fyear I am looking to put my data into 10 different groups based on the size of AT. I would like to create a new row containing the appropriate number (between 1-10).
I have heard this is achievable through commands but I cant seem to get it right. Help would be gratefully appreciated.
Thanking u in advance
Yes, of course :
proc sql;
create table wantAve as
select year, at_decile, mean(at) as aveAt
from want
group by year, at_decile;
quit;
PG
hi ... this will create a new variable (at_decile) that ranges from 0 to 9 within each year (you have a lot of missing values for AT that do not get assigned to any group ...
data have;
infile 'z:\practice.csv' dsd firstobs=2 truncover;
input lpermno year at;
run;
proc sort data=have;
by year;
run;
proc rank data=have out=want groups=10;
var at;
ranks at_decile;
by year;
run;
if you use the following, you'll see 10% of the observations within each year are in each group (plus how may missing values in each year) ...
proc freq data=want;
table year*at_decile / nocol nopercent missprint;
run;
if you want the deciles to range from 1 to 10 you'll have to use a data step after PROC RANK to add 1 to each decile value
What Mike suggests is just the way to go if you're breaking a given data set into deciles. But we have applications here in which you categorize one dataset based on deciles derived from another (for example size deciles from NYSE-listed companies applied to companies listed on AMEX and NASDAQ - don't ask me why):
proc univariate data=nyse (keep=at) noprint;
var at;
output out=nyse_brkpoints
pctlpts=0 to 100 by 10
pctlpre=PCTL_ /* vars to be named pctl_0 pctl_10 ... pctl_100 */
;
run;
data want (drop=pctl_:);
if _n_=1 then do;
set pctlpts;
array cats {1:10} pctl_0 -- pctl_90;
pctl_0=.;
end;
set mydata;
if not (missing(at)) then do decile=10 to 1 by -1 until (at>=cats{decile}); end;
run;
Cool thanks guys, works hassle free!
Yes, of course :
proc sql;
create table wantAve as
select year, at_decile, mean(at) as aveAt
from want
group by year, at_decile;
quit;
PG
Thanks!
Yes, but I suspect you are trying to get year-specific deciles - at least that's what is typical here. So if you have a DATE variable then:
proc univariate data=nyse noprint;
var at;
class date ; format date year4.;
output out=nyse_brkpoints
pctlpts=0 to 100 by 10
pctlpre=PCTL_ /* vars to be named pctl_0 pctl_10 ... pctl_100 */
;
run;
** So in the above you created one row per year. Below put each of those rows in a row of the CATS matrix. Assuming you have no data prior to 1980 and none after 2010, I bounded the row dimension of CATS to 1980:2010.
data want (drop=pctl_:);
** First populate the CATS matrix **;
if _n_=1 then do until (end_of_pctl);
set brkpoints end=end_of_pctl;
array pcts {1:10} pctl_0 -- pctl_90;
array cats {1980:2010,1:10} _temporary_;
year=year(date);
do C=1 to 10; cats{year,C}=pcts{C};end;
end;
** Now select a row in CATs to determine deciles for records in MYDATA
set mydata;
year=year(date); ** use this YEAR value to go to desired row of CATS **;
if not (missing(at)) then do decile=10 to 1 by -1 until (at>=cats{year,decile}); end;
run;
BTW, often you want to use last year's deciles to categorize this years' companies (numerous investment strategies do this). If so, then change the UNTIL expression to:
until (at>cats(year-1,decile))
Of course this only works if the calibration dataset has data in the year preceding the earliest year in MYDATA.
Then go ahead and make means for DECILE*YEAR or DECILE*DATE (formatted at year4.).
As to averages,
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.