Hi All,
I am trying to create quintiles for my sample based on SIZE for each year. My sample has five years: 2004 to 2008. I sorted the data by year and SIZE. I, however, do not know how to create quintile for each year.
FIRM YEAR SIZE
A 2004 10
B 2004 12
C 2004 13
D 2004 18
E 2004 19
F 2005 8
G 2005 10
H 2005 38
I 2005 41
J 2005 44
Expected output:
FIRM YEAR SIZE Quintile
A 2004 10 1
B 2004 12 2
C 2004 13 3
D 2004 18 4
E 2004 19 5
F 2005 8 1
G 2005 10 2
H 2005 38 3
I 2005 41 4
J 2005 44 5
Please help.
Hi, pk
1. To find the actual quintile values:
PROC MEANS DATA=have NOPRINT NWAY NONOBS;
VAR SIZE;
CLASS YEAR;
OUTPUT OUT=want1 P20()= P40()= P60= P80= / AUTONAME;
RUN;
2. To assign the quintile to the data (which looks like what you want):
Data must be sorted by YEAR;
PROC RANK DATA=have GROUPS=5 TIES=MEAN OUT=want2;
BY YEAR;
VAR SIZE;
RANKS rank_SIZE;
RUN;
Tom
Do you only have 5 observations per year as in your example?
No, I have moe than five observations in the sample.
Take a look at: Categorising a continuous variable into quantiles
Dear Linlin,
I tried the code but it does not create quintiles for each year. Each year has more than five observations.
Sorry. I misunderstood your question. I will delete my post.
is this helpful?
data have;
input FIRM $ YEAR SIZE;
cards;
A 2004 10
B 2004 12
C 2004 13
D 2004 18
E 2004 19
F 2005 8
G 2005 10
H 2005 38
I 2005 41
J 2005 44
k 2005 46
l 2005 50
m 2005 70
;
proc sql noprint;
select distinct year into :years separated by ' '
from have;
quit;
%macro test;
%do i=1 %to %sysfunc(countw(&years));
%let year=%scan(&years,&i);
proc univariate noprint data=have;
where year=&year;
var size;
output out=quintile&year pctlpts=20 40 60 80 pctlpre=pct;
run;
/* write the cutpoints to macro variables */
data _null_;
set quintile&year;
call symput('q1',pct20) ;
call symput('q2',pct40) ;
call symput('q3',pct60) ;
call symput('q4',pct80) ;
run;
/* create a new variable containing the quintiles */
data _&year;
set have(where=(year=&year));;
if size <= &q1 then x_quint=1;
else if size <= &q2 then x_quint=2;
else if size <= &q3 then x_quint=3;
else if size <= &q4 then x_quint=4;
else x_quint=5;
run;
%end;
%mend;
%test
data want;
set _:;
run;
proc print data=want;run;
Obs FIRM YEAR SIZE x_quint
1 A 2004 10 1
2 B 2004 12 2
3 C 2004 13 3
4 D 2004 18 4
5 E 2004 19 5
6 F 2005 8 1
7 G 2005 10 1
8 H 2005 38 2
9 I 2005 41 2
10 J 2005 44 3
11 k 2005 46 4
12 l 2005 50 4
13 m 2005 70 5
Hi, pk
1. To find the actual quintile values:
PROC MEANS DATA=have NOPRINT NWAY NONOBS;
VAR SIZE;
CLASS YEAR;
OUTPUT OUT=want1 P20()= P40()= P60= P80= / AUTONAME;
RUN;
2. To assign the quintile to the data (which looks like what you want):
Data must be sorted by YEAR;
PROC RANK DATA=have GROUPS=5 TIES=MEAN OUT=want2;
BY YEAR;
VAR SIZE;
RANKS rank_SIZE;
RUN;
Tom
You could make the following code more robust by making another pass to figure out the maximum number of obs each year, I just assume it is less than 100: (raw data stolen from LinLin's post)
data have;
input FIRM $ YEAR SIZE;
cards;
A 2004 10
B 2004 12
C 2004 13
D 2004 18
E 2004 19
F 2005 8
G 2005 10
H 2005 38
I 2005 41
J 2005 44
k 2005 46
l 2005 50
m 2005 70
;
data want;
array t(100) _temporary_;
do _n_=1 by 1 until (last.year);
set have;
by year;
t(_n_)=size;
end;
_20=PCTL(20,of t(*)); _40=PCTL(40,of t(*)); _60=PCTL(60,of t(*)); _80=PCTL(80,of t(*)); _100=PCTL(100,of t(*));
do _n_=1 by 1 until (last.year);
set have;
by year;
if size <= _20 then q=1; else if size <=_40 then q=2; else if size <=_60 then q=3; else if size <=_80 then q=4; else if siZE <=_100 then q=5;
output;
end;
call missing (of t(*));
drop _:;
run;
proc print;run;
Haikuo
Thank you everyone for the codes.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.