Help using Base SAS procedures

data into different groups

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 17
Accepted Solution

data into different groups

Hi

For each fyear I am looking to put my data into 10 different groups based on the size of AT. I would like to create a new row containing the appropriate number (between 1-10).

I have heard this is achievable through commands but I cant seem to get it right. Help would be gratefully appreciated.

Thanking u in advance


Accepted Solutions
Solution
‎08-17-2012 09:18 PM
Respected Advisor
Posts: 4,651

Re: How to put data into different_decile_groups

Yes, of course :

proc sql;

create table wantAve as

select year, at_decile, mean(at) as aveAt

from want

group by year, at_decile;

quit;

PG

PG

View solution in original post


All Replies
Valued Guide
Posts: 765

Re: How to put data into different_decile_groups

hi ... this will create a new variable (at_decile) that ranges from 0 to 9 within each year (you have a lot of missing values for AT that do not get assigned to any group ...

data have;

infile 'z:\practice.csv' dsd firstobs=2 truncover;

input lpermno year at;

run;

proc sort data=have;

by year;

run;


proc rank data=have out=want groups=10;

var at;

ranks at_decile;

by year;

run;

if you use the following, you'll see 10% of the observations within each year are in each group (plus how may missing values in each year) ...

proc freq data=want;

table year*at_decile / nocol nopercent missprint;

run;

if you want the deciles to range from 1 to 10 you'll have to use a data step after PROC RANK to add 1 to each decile value

Valued Guide
Posts: 797

Re: How to put data into different_decile_groups

What Mike suggests is just the way to go if you're breaking a given data set into deciles.  But we have applications here in which you categorize one dataset based on deciles derived from another (for example size deciles from NYSE-listed companies applied to companies listed on AMEX and NASDAQ - don't ask me why):

proc univariate data=nyse (keep=at) noprint;
  var at;
  output out=nyse_brkpoints

       pctlpts=0 to 100 by 10

       pctlpre=PCTL_   /*  vars to be named pctl_0 pctl_10 ... pctl_100 */

      ;

run;

data want (drop=pctl_Smiley Happy;

  if _n_=1 then do;

     set pctlpts;

     array cats {1:10} pctl_0 -- pctl_90;

     pctl_0=.;

  end;

  set mydata;

  if not (missing(at)) then do decile=10 to 1 by -1 until (at>=cats{decile}); end;

run;

Occasional Contributor
Posts: 17

Re: How to put data into different_decile_groups

Cool thanks guys, works hassle free!

Solution
‎08-17-2012 09:18 PM
Respected Advisor
Posts: 4,651

Re: How to put data into different_decile_groups

Yes, of course :

proc sql;

create table wantAve as

select year, at_decile, mean(at) as aveAt

from want

group by year, at_decile;

quit;

PG

PG
Occasional Contributor
Posts: 17

Re: How to put data into different_decile_groups

Thanks!

Valued Guide
Posts: 797

Re: How to put data into different_decile_groups

Yes, but I suspect you are trying to get year-specific deciles - at least that's what is typical here.  So if you have a DATE variable then:

proc univariate data=nyse noprint;
  var at;

  class date ;  format date year4.;
  output out=nyse_brkpoints

       pctlpts=0 to 100 by 10

       pctlpre=PCTL_   /*  vars to be named pctl_0 pctl_10 ... pctl_100 */

      ;

run;

** So in the above you created one row per year.  Below put each of those rows in a row of the CATS matrix.  Assuming you have no data prior to 1980 and none after 2010, I bounded the row dimension of CATS to 1980:2010.


data want (drop=pctl_Smiley Happy;

** First populate the CATS matrix **;

  if _n_=1 then do until (end_of_pctl);

     set brkpoints  end=end_of_pctl;

     array pcts {1:10} pctl_0 -- pctl_90;

     array cats {1980:2010,1:10}  _temporary_;

     year=year(date);

     do C=1 to 10; cats{year,C}=pcts{C};end;

  end;

  ** Now select a row in CATs to determine deciles for records in MYDATA

   set mydata;

      year=year(date);   ** use this YEAR value to go to desired row of CATS **;

      if not (missing(at)) then do decile=10 to 1 by -1 until (at>=cats{year,decile}); end;

run;

BTW, often you want to use last year's deciles to categorize this years' companies (numerous investment strategies do this).  If so, then change the UNTIL expression to:

        until (at>cats(year-1,decile))

Of course this only works if the calibration dataset has data in the year preceding the earliest year in MYDATA.

Then go ahead and make means for DECILE*YEAR  or DECILE*DATE (formatted at year4.).

As to averages,

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 1215 views
  • 6 likes
  • 4 in conversation