Hi All,
I am trying to create quintiles for my sample based on SIZE for each year. My sample has five years: 2004 to 2008. I sorted the data by year and SIZE. I, however, do not know how to create quintile for each year.
FIRM YEAR SIZE
A 2004 10
B 2004 12
C 2004 13
D 2004 18
E 2004 19
F 2005 8
G 2005 10
H 2005 38
I 2005 41
J 2005 44
Expected output:
FIRM YEAR SIZE Quintile
A 2004 10 1
B 2004 12 2
C 2004 13 3
D 2004 18 4
E 2004 19 5
F 2005 8 1
G 2005 10 2
H 2005 38 3
I 2005 41 4
J 2005 44 5
Please help.
Hi, pk
1. To find the actual quintile values:
PROC MEANS DATA=have NOPRINT NWAY NONOBS;
VAR SIZE;
CLASS YEAR;
OUTPUT OUT=want1 P20()= P40()= P60= P80= / AUTONAME;
RUN;
2. To assign the quintile to the data (which looks like what you want):
Data must be sorted by YEAR;
PROC RANK DATA=have GROUPS=5 TIES=MEAN OUT=want2;
BY YEAR;
VAR SIZE;
RANKS rank_SIZE;
RUN;
Tom
Do you only have 5 observations per year as in your example?
No, I have moe than five observations in the sample.
Take a look at: Categorising a continuous variable into quantiles
Dear Linlin,
I tried the code but it does not create quintiles for each year. Each year has more than five observations.
Sorry. I misunderstood your question. I will delete my post.
is this helpful?
data have;
input FIRM $ YEAR SIZE;
cards;
A 2004 10
B 2004 12
C 2004 13
D 2004 18
E 2004 19
F 2005 8
G 2005 10
H 2005 38
I 2005 41
J 2005 44
k 2005 46
l 2005 50
m 2005 70
;
proc sql noprint;
select distinct year into :years separated by ' '
from have;
quit;
%macro test;
%do i=1 %to %sysfunc(countw(&years));
%let year=%scan(&years,&i);
proc univariate noprint data=have;
where year=&year;
var size;
output out=quintile&year pctlpts=20 40 60 80 pctlpre=pct;
run;
/* write the cutpoints to macro variables */
data _null_;
set quintile&year;
call symput('q1',pct20) ;
call symput('q2',pct40) ;
call symput('q3',pct60) ;
call symput('q4',pct80) ;
run;
/* create a new variable containing the quintiles */
data _&year;
set have(where=(year=&year));;
if size <= &q1 then x_quint=1;
else if size <= &q2 then x_quint=2;
else if size <= &q3 then x_quint=3;
else if size <= &q4 then x_quint=4;
else x_quint=5;
run;
%end;
%mend;
%test
data want;
set _:;
run;
proc print data=want;run;
Obs FIRM YEAR SIZE x_quint
1 A 2004 10 1
2 B 2004 12 2
3 C 2004 13 3
4 D 2004 18 4
5 E 2004 19 5
6 F 2005 8 1
7 G 2005 10 1
8 H 2005 38 2
9 I 2005 41 2
10 J 2005 44 3
11 k 2005 46 4
12 l 2005 50 4
13 m 2005 70 5
Hi, pk
1. To find the actual quintile values:
PROC MEANS DATA=have NOPRINT NWAY NONOBS;
VAR SIZE;
CLASS YEAR;
OUTPUT OUT=want1 P20()= P40()= P60= P80= / AUTONAME;
RUN;
2. To assign the quintile to the data (which looks like what you want):
Data must be sorted by YEAR;
PROC RANK DATA=have GROUPS=5 TIES=MEAN OUT=want2;
BY YEAR;
VAR SIZE;
RANKS rank_SIZE;
RUN;
Tom
You could make the following code more robust by making another pass to figure out the maximum number of obs each year, I just assume it is less than 100: (raw data stolen from LinLin's post)
data have;
input FIRM $ YEAR SIZE;
cards;
A 2004 10
B 2004 12
C 2004 13
D 2004 18
E 2004 19
F 2005 8
G 2005 10
H 2005 38
I 2005 41
J 2005 44
k 2005 46
l 2005 50
m 2005 70
;
data want;
array t(100) _temporary_;
do _n_=1 by 1 until (last.year);
set have;
by year;
t(_n_)=size;
end;
_20=PCTL(20,of t(*)); _40=PCTL(40,of t(*)); _60=PCTL(60,of t(*)); _80=PCTL(80,of t(*)); _100=PCTL(100,of t(*));
do _n_=1 by 1 until (last.year);
set have;
by year;
if size <= _20 then q=1; else if size <=_40 then q=2; else if size <=_60 then q=3; else if size <=_80 then q=4; else if siZE <=_100 then q=5;
output;
end;
call missing (of t(*));
drop _:;
run;
proc print;run;
Haikuo
Thank you everyone for the codes.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.