Hello,
I'd like to compute the relative frequency based on a binary variable (case) in the following way. For each record within each id, I'd like to measure the rate for records that are within 1 year prior to that record (including the record itself). Here's a summary of codes for the have/want data files. I'd appreciate the forum for any idea!
data have; INFORMAT date MMDDYY10.; input id date case; FORMAT date MMDDYY10.; datalines; 1 1/15/2008 1 1 1/31/2009 0 1 4/13/2009 0 1 5/30/2009 1 1 1/7/2010 1 2 1/24/2008 0 2 2/19/2009 1 2 4/21/2009 1 2 5/12/2009 0 2 7/11/2010 1 ; data want; INFORMAT date MMDDYY10.; input id date case rate ; FORMAT date MMDDYY10.; datalines; 1 1/15/2008 1 1 1 1/31/2009 0 0 1 4/13/2009 0 0 1 5/30/2009 1 0.33 /* example: 0.33 = 1/3 */ 1 1/7/2010 1 0.50 /* example: 0.50 = 2/4 */ 2 1/24/2008 0 0 2 2/19/2009 1 1 2 4/21/2009 1 1 2 5/12/2009 0 0.67 2 7/11/2010 1 1 ;
data have;
INFORMAT date MMDDYY10.;
input id date case;
FORMAT date MMDDYY10.;
datalines;
1 1/15/2008 1
1 1/31/2009 0
1 4/13/2009 0
1 5/30/2009 1
1 1/7/2010 1
2 1/24/2008 0
2 2/19/2009 1
2 4/21/2009 1
2 5/12/2009 0
2 7/11/2010 1
;
data want;
array x{%sysevalf('01jan1900'd):%sysevalf('01jan2200'd)} _temporary_;
call missing(of x{*});
do until(last.id);
set have;
by id;
x{date}=case;
end;
do until(last.id);
set have;
by id;
m=0;n=0;
do i=intnx('year',date,-1,'s') to date;
m+x{i};
if not missing(x{i}) then n+1;
end;
ratio=m/n;output;
end;
drop m n i;
run;
You should clarify what the rule for "within 1 year from that record" may be.
Consider your example:(which by the way does not specify which observations are contributing)
1 1/31/2009 0 0 1 4/13/2009 0 0 1 5/30/2009 1 0.33 /* example: 0.33 = 1/3 */ 1 1/7/2010 1 0.50 /* example: 0.50 = 2/4 */
Why is this not 2/3? 1/7/2009 is "within one year of " 5/30/2009. So it appears your rule is not "within" but "less than one year prior" or similar.
1 5/30/2009 1 0.33 /* example: 0.33 = 1/3 */
@ballardw
Thanks for the note! Yes, it's 1 year prior to each record. I've corrected the question.
data have;
INFORMAT date MMDDYY10.;
input id date case;
FORMAT date MMDDYY10.;
datalines;
1 1/15/2008 1
1 1/31/2009 0
1 4/13/2009 0
1 5/30/2009 1
1 1/7/2010 1
2 1/24/2008 0
2 2/19/2009 1
2 4/21/2009 1
2 5/12/2009 0
2 7/11/2010 1
;
data want;
array x{%sysevalf('01jan1900'd):%sysevalf('01jan2200'd)} _temporary_;
call missing(of x{*});
do until(last.id);
set have;
by id;
x{date}=case;
end;
do until(last.id);
set have;
by id;
m=0;n=0;
do i=intnx('year',date,-1,'s') to date;
m+x{i};
if not missing(x{i}) then n+1;
end;
ratio=m/n;output;
end;
drop m n i;
run;
@Ksharp Thank you! Much appreciated.
If you do not have a big table ,Try RPOC SQL:
data have;
INFORMAT date MMDDYY10.;
input id date case;
FORMAT date MMDDYY10.;
datalines;
1 1/15/2008 1
1 1/31/2009 0
1 4/13/2009 0
1 5/30/2009 1
1 1/7/2010 1
2 1/24/2008 0
2 2/19/2009 1
2 4/21/2009 1
2 5/12/2009 0
2 7/11/2010 1
;
data have;
set have;
rename case=_case; /*case is a key/special word,so rename it for SQL*/
run;
proc sql;
create table want2 as
select *,
(select sum(_case) from have where id=a.id and date between intnx('year',a.date,-1,'s') and a.date)/
(select count(_case) from have where id=a.id and date between intnx('year',a.date,-1,'s') and a.date) as ratio
from have as a;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.