Solved
Contributor
Posts: 28

# Quintile for each year

Hi All,

I am trying to create quintiles for my sample based on SIZE for each year. My sample has five years: 2004 to 2008. I sorted the data by year and SIZE. I, however, do not know how to create quintile for each year.

FIRM   YEAR    SIZE

A          2004     10

B          2004     12

C          2004     13

D          2004     18

E          2004     19

F          2005      8

G          2005    10

H          2005    38

I          2005     41

J          2005     44

Expected output:

FIRM   YEAR    SIZE Quintile

A          2004     10      1

B          2004     12      2

C          2004     13      3

D          2004     18      4

E          2004     19      5

F          2005      8        1

G          2005    10        2

H          2005    38        3

I          2005     41        4

J          2005     44        5

Accepted Solutions
Solution
‎09-24-2012 07:35 PM
PROC Star
Posts: 1,307

## Re: Quintile for each year

Hi, pk

1. To find the actual quintile values:

PROC MEANS DATA=have NOPRINT NWAY NONOBS;

VAR SIZE;

CLASS YEAR;

OUTPUT  OUT=want1 P20()= P40()= P60= P80= / AUTONAME;

RUN;

2. To assign the quintile to the data (which looks like what you want):

Data must be sorted by YEAR;

PROC RANK DATA=have GROUPS=5 TIES=MEAN OUT=want2;

BY YEAR;

VAR SIZE;

RANKS rank_SIZE;

RUN;

Tom

All Replies
Super User
Posts: 23,663

## Re: Quintile for each year

Do you only have 5 observations per year as in your example?

Contributor
Posts: 28

## Re: Quintile for each year

No, I have moe than five observations in the sample.

PROC Star
Posts: 8,163

Contributor
Posts: 28

## Re: Quintile for each year

Dear Linlin,

I tried the code but it does not create quintiles for each year. Each year has more than five observations.

Super Contributor
Posts: 1,636

## Re: Quintile for each year

Sorry. I misunderstood your question. I will delete my post.

Super Contributor
Posts: 1,636

## Re: Quintile for each year

data have;

input FIRM \$  YEAR    SIZE;

cards;

A          2004     10

B          2004     12

C          2004     13

D          2004     18

E          2004     19

F          2005      8

G          2005    10

H          2005    38

I          2005     41

J          2005     44

k          2005     46

l          2005     50

m          2005     70

;

proc sql noprint;

select distinct year into :years separated by ' '

from have;

quit;

%macro test;

%do i=1 %to %sysfunc(countw(&years));

%let year=%scan(&years,&i);

proc univariate noprint data=have;

where year=&year;

var size;

output out=quintile&year pctlpts=20 40 60 80 pctlpre=pct;

run;

/* write the cutpoints to macro variables */

data _null_;

set quintile&year;

call symput('q1',pct20) ;

call symput('q2',pct40) ;

call symput('q3',pct60) ;

call symput('q4',pct80) ;

run;

/* create a new variable containing the quintiles */

data _&year;

set have(where=(year=&year));;

if size <= &q1 then x_quint=1;

else if size <= &q2 then x_quint=2;

else if size <= &q3 then x_quint=3;

else if size <= &q4 then x_quint=4;

else x_quint=5;

run;

%end;

%mend;

%test

data want;

set _:;

run;

proc print data=want;run;

Obs    FIRM    YEAR    SIZE    x_quint

1     A      2004     10        1

2     B      2004     12        2

3     C      2004     13        3

4     D      2004     18        4

5     E      2004     19        5

6     F      2005      8        1

7     G      2005     10        1

8     H      2005     38        2

9     I      2005     41        2

10     J      2005     44        3

11     k      2005     46        4

12     l      2005     50        4

13     m      2005     70        5

Solution
‎09-24-2012 07:35 PM
PROC Star
Posts: 1,307

## Re: Quintile for each year

Hi, pk

1. To find the actual quintile values:

PROC MEANS DATA=have NOPRINT NWAY NONOBS;

VAR SIZE;

CLASS YEAR;

OUTPUT  OUT=want1 P20()= P40()= P60= P80= / AUTONAME;

RUN;

2. To assign the quintile to the data (which looks like what you want):

Data must be sorted by YEAR;

PROC RANK DATA=have GROUPS=5 TIES=MEAN OUT=want2;

BY YEAR;

VAR SIZE;

RANKS rank_SIZE;

RUN;

Tom

Posts: 3,167

## Re: Quintile for each year

You could make the following code more robust by making another pass to figure out the maximum number of obs each year, I just assume it is less than 100: (raw data stolen from LinLin's post)

data have;

input FIRM \$  YEAR    SIZE;

cards;

A          2004     10

B          2004     12

C          2004     13

D          2004     18

E          2004     19

F          2005      8

G          2005    10

H          2005    38

I          2005     41

J          2005     44

k          2005     46

l          2005     50

m          2005     70

;

data want;

array t(100) _temporary_;

do _n_=1 by 1 until (last.year);

set have;

by year;

t(_n_)=size;

end;

_20=PCTL(20,of t(*));  _40=PCTL(40,of t(*));  _60=PCTL(60,of t(*));  _80=PCTL(80,of t(*));  _100=PCTL(100,of t(*));

do _n_=1 by 1 until (last.year);

set have;

by year;

if size <= _20 then q=1; else if size <=_40 then q=2; else if size <=_60 then q=3; else if size <=_80 then q=4; else if siZE <=_100 then q=5;

output;

end;

call missing (of t(*));

drop _:;

run;

proc print;run;

Haikuo

Contributor
Posts: 28

## Re: Quintile for each year

Thank you everyone for the codes.

🔒 This topic is solved and locked.