BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
redrover99
Calcite | Level 5

Hi

For each fyear I am looking to put my data into 10 different groups based on the size of AT. I would like to create a new row containing the appropriate number (between 1-10).

I have heard this is achievable through commands but I cant seem to get it right. Help would be gratefully appreciated.

Thanking u in advance

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Yes, of course :

proc sql;

create table wantAve as

select year, at_decile, mean(at) as aveAt

from want

group by year, at_decile;

quit;

PG

PG

View solution in original post

6 REPLIES 6
MikeZdeb
Rhodochrosite | Level 12

hi ... this will create a new variable (at_decile) that ranges from 0 to 9 within each year (you have a lot of missing values for AT that do not get assigned to any group ...

data have;

infile 'z:\practice.csv' dsd firstobs=2 truncover;

input lpermno year at;

run;

proc sort data=have;

by year;

run;


proc rank data=have out=want groups=10;

var at;

ranks at_decile;

by year;

run;

if you use the following, you'll see 10% of the observations within each year are in each group (plus how may missing values in each year) ...

proc freq data=want;

table year*at_decile / nocol nopercent missprint;

run;

if you want the deciles to range from 1 to 10 you'll have to use a data step after PROC RANK to add 1 to each decile value

mkeintz
PROC Star

What Mike suggests is just the way to go if you're breaking a given data set into deciles.  But we have applications here in which you categorize one dataset based on deciles derived from another (for example size deciles from NYSE-listed companies applied to companies listed on AMEX and NASDAQ - don't ask me why):

proc univariate data=nyse (keep=at) noprint;
  var at;
  output out=nyse_brkpoints

       pctlpts=0 to 100 by 10

       pctlpre=PCTL_   /*  vars to be named pctl_0 pctl_10 ... pctl_100 */

      ;

run;

data want (drop=pctl_:);

  if _n_=1 then do;

     set pctlpts;

     array cats {1:10} pctl_0 -- pctl_90;

     pctl_0=.;

  end;

  set mydata;

  if not (missing(at)) then do decile=10 to 1 by -1 until (at>=cats{decile}); end;

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
redrover99
Calcite | Level 5

Cool thanks guys, works hassle free!

PGStats
Opal | Level 21

Yes, of course :

proc sql;

create table wantAve as

select year, at_decile, mean(at) as aveAt

from want

group by year, at_decile;

quit;

PG

PG
mkeintz
PROC Star

Yes, but I suspect you are trying to get year-specific deciles - at least that's what is typical here.  So if you have a DATE variable then:

proc univariate data=nyse noprint;
  var at;

  class date ;  format date year4.;
  output out=nyse_brkpoints

       pctlpts=0 to 100 by 10

       pctlpre=PCTL_   /*  vars to be named pctl_0 pctl_10 ... pctl_100 */

      ;

run;

** So in the above you created one row per year.  Below put each of those rows in a row of the CATS matrix.  Assuming you have no data prior to 1980 and none after 2010, I bounded the row dimension of CATS to 1980:2010.


data want (drop=pctl_:);

** First populate the CATS matrix **;

  if _n_=1 then do until (end_of_pctl);

     set brkpoints  end=end_of_pctl;

     array pcts {1:10} pctl_0 -- pctl_90;

     array cats {1980:2010,1:10}  _temporary_;

     year=year(date);

     do C=1 to 10; cats{year,C}=pcts{C};end;

  end;

  ** Now select a row in CATs to determine deciles for records in MYDATA

   set mydata;

      year=year(date);   ** use this YEAR value to go to desired row of CATS **;

      if not (missing(at)) then do decile=10 to 1 by -1 until (at>=cats{year,decile}); end;

run;

BTW, often you want to use last year's deciles to categorize this years' companies (numerous investment strategies do this).  If so, then change the UNTIL expression to:

        until (at>cats(year-1,decile))

Of course this only works if the calibration dataset has data in the year preceding the earliest year in MYDATA.

Then go ahead and make means for DECILE*YEAR  or DECILE*DATE (formatted at year4.).

As to averages,

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2662 views
  • 6 likes
  • 4 in conversation