BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
redrover99
Calcite | Level 5

Hi

For each fyear I am looking to put my data into 10 different groups based on the size of AT. I would like to create a new row containing the appropriate number (between 1-10).

I have heard this is achievable through commands but I cant seem to get it right. Help would be gratefully appreciated.

Thanking u in advance

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Yes, of course :

proc sql;

create table wantAve as

select year, at_decile, mean(at) as aveAt

from want

group by year, at_decile;

quit;

PG

PG

View solution in original post

6 REPLIES 6
MikeZdeb
Rhodochrosite | Level 12

hi ... this will create a new variable (at_decile) that ranges from 0 to 9 within each year (you have a lot of missing values for AT that do not get assigned to any group ...

data have;

infile 'z:\practice.csv' dsd firstobs=2 truncover;

input lpermno year at;

run;

proc sort data=have;

by year;

run;


proc rank data=have out=want groups=10;

var at;

ranks at_decile;

by year;

run;

if you use the following, you'll see 10% of the observations within each year are in each group (plus how may missing values in each year) ...

proc freq data=want;

table year*at_decile / nocol nopercent missprint;

run;

if you want the deciles to range from 1 to 10 you'll have to use a data step after PROC RANK to add 1 to each decile value

mkeintz
PROC Star

What Mike suggests is just the way to go if you're breaking a given data set into deciles.  But we have applications here in which you categorize one dataset based on deciles derived from another (for example size deciles from NYSE-listed companies applied to companies listed on AMEX and NASDAQ - don't ask me why):

proc univariate data=nyse (keep=at) noprint;
  var at;
  output out=nyse_brkpoints

       pctlpts=0 to 100 by 10

       pctlpre=PCTL_   /*  vars to be named pctl_0 pctl_10 ... pctl_100 */

      ;

run;

data want (drop=pctl_:);

  if _n_=1 then do;

     set pctlpts;

     array cats {1:10} pctl_0 -- pctl_90;

     pctl_0=.;

  end;

  set mydata;

  if not (missing(at)) then do decile=10 to 1 by -1 until (at>=cats{decile}); end;

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
redrover99
Calcite | Level 5

Cool thanks guys, works hassle free!

PGStats
Opal | Level 21

Yes, of course :

proc sql;

create table wantAve as

select year, at_decile, mean(at) as aveAt

from want

group by year, at_decile;

quit;

PG

PG
mkeintz
PROC Star

Yes, but I suspect you are trying to get year-specific deciles - at least that's what is typical here.  So if you have a DATE variable then:

proc univariate data=nyse noprint;
  var at;

  class date ;  format date year4.;
  output out=nyse_brkpoints

       pctlpts=0 to 100 by 10

       pctlpre=PCTL_   /*  vars to be named pctl_0 pctl_10 ... pctl_100 */

      ;

run;

** So in the above you created one row per year.  Below put each of those rows in a row of the CATS matrix.  Assuming you have no data prior to 1980 and none after 2010, I bounded the row dimension of CATS to 1980:2010.


data want (drop=pctl_:);

** First populate the CATS matrix **;

  if _n_=1 then do until (end_of_pctl);

     set brkpoints  end=end_of_pctl;

     array pcts {1:10} pctl_0 -- pctl_90;

     array cats {1980:2010,1:10}  _temporary_;

     year=year(date);

     do C=1 to 10; cats{year,C}=pcts{C};end;

  end;

  ** Now select a row in CATs to determine deciles for records in MYDATA

   set mydata;

      year=year(date);   ** use this YEAR value to go to desired row of CATS **;

      if not (missing(at)) then do decile=10 to 1 by -1 until (at>=cats{year,decile}); end;

run;

BTW, often you want to use last year's deciles to categorize this years' companies (numerous investment strategies do this).  If so, then change the UNTIL expression to:

        until (at>cats(year-1,decile))

Of course this only works if the calibration dataset has data in the year preceding the earliest year in MYDATA.

Then go ahead and make means for DECILE*YEAR  or DECILE*DATE (formatted at year4.).

As to averages,

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 4097 views
  • 6 likes
  • 4 in conversation