Hi
For each fyear I am looking to put my data into 10 different groups based on the size of AT. I would like to create a new row containing the appropriate number (between 1-10).
I have heard this is achievable through commands but I cant seem to get it right. Help would be gratefully appreciated.
Thanking u in advance
Yes, of course :
proc sql;
create table wantAve as
select year, at_decile, mean(at) as aveAt
from want
group by year, at_decile;
quit;
PG
hi ... this will create a new variable (at_decile) that ranges from 0 to 9 within each year (you have a lot of missing values for AT that do not get assigned to any group ...
data have;
infile 'z:\practice.csv' dsd firstobs=2 truncover;
input lpermno year at;
run;
proc sort data=have;
by year;
run;
proc rank data=have out=want groups=10;
var at;
ranks at_decile;
by year;
run;
if you use the following, you'll see 10% of the observations within each year are in each group (plus how may missing values in each year) ...
proc freq data=want;
table year*at_decile / nocol nopercent missprint;
run;
if you want the deciles to range from 1 to 10 you'll have to use a data step after PROC RANK to add 1 to each decile value
What Mike suggests is just the way to go if you're breaking a given data set into deciles. But we have applications here in which you categorize one dataset based on deciles derived from another (for example size deciles from NYSE-listed companies applied to companies listed on AMEX and NASDAQ - don't ask me why):
proc univariate data=nyse (keep=at) noprint;
var at;
output out=nyse_brkpoints
pctlpts=0 to 100 by 10
pctlpre=PCTL_ /* vars to be named pctl_0 pctl_10 ... pctl_100 */
;
run;
data want (drop=pctl_:);
if _n_=1 then do;
set pctlpts;
array cats {1:10} pctl_0 -- pctl_90;
pctl_0=.;
end;
set mydata;
if not (missing(at)) then do decile=10 to 1 by -1 until (at>=cats{decile}); end;
run;
Cool thanks guys, works hassle free!
Yes, of course :
proc sql;
create table wantAve as
select year, at_decile, mean(at) as aveAt
from want
group by year, at_decile;
quit;
PG
Thanks!
Yes, but I suspect you are trying to get year-specific deciles - at least that's what is typical here. So if you have a DATE variable then:
proc univariate data=nyse noprint;
var at;
class date ; format date year4.;
output out=nyse_brkpoints
pctlpts=0 to 100 by 10
pctlpre=PCTL_ /* vars to be named pctl_0 pctl_10 ... pctl_100 */
;
run;
** So in the above you created one row per year. Below put each of those rows in a row of the CATS matrix. Assuming you have no data prior to 1980 and none after 2010, I bounded the row dimension of CATS to 1980:2010.
data want (drop=pctl_:);
** First populate the CATS matrix **;
if _n_=1 then do until (end_of_pctl);
set brkpoints end=end_of_pctl;
array pcts {1:10} pctl_0 -- pctl_90;
array cats {1980:2010,1:10} _temporary_;
year=year(date);
do C=1 to 10; cats{year,C}=pcts{C};end;
end;
** Now select a row in CATs to determine deciles for records in MYDATA
set mydata;
year=year(date); ** use this YEAR value to go to desired row of CATS **;
if not (missing(at)) then do decile=10 to 1 by -1 until (at>=cats{year,decile}); end;
run;
BTW, often you want to use last year's deciles to categorize this years' companies (numerous investment strategies do this). If so, then change the UNTIL expression to:
until (at>cats(year-1,decile))
Of course this only works if the calibration dataset has data in the year preceding the earliest year in MYDATA.
Then go ahead and make means for DECILE*YEAR or DECILE*DATE (formatted at year4.).
As to averages,
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.