Hello,
I'd like to compute the relative frequency based on a binary variable (case) in the following way. For each record within each id, I'd like to measure the rate for records that are within 1 year prior to that record (including the record itself). Here's a summary of codes for the have/want data files. I'd appreciate the forum for any idea!
data have; INFORMAT date MMDDYY10.; input id date case; FORMAT date MMDDYY10.; datalines; 1 1/15/2008 1 1 1/31/2009 0 1 4/13/2009 0 1 5/30/2009 1 1 1/7/2010 1 2 1/24/2008 0 2 2/19/2009 1 2 4/21/2009 1 2 5/12/2009 0 2 7/11/2010 1 ; data want; INFORMAT date MMDDYY10.; input id date case rate ; FORMAT date MMDDYY10.; datalines; 1 1/15/2008 1 1 1 1/31/2009 0 0 1 4/13/2009 0 0 1 5/30/2009 1 0.33 /* example: 0.33 = 1/3 */ 1 1/7/2010 1 0.50 /* example: 0.50 = 2/4 */ 2 1/24/2008 0 0 2 2/19/2009 1 1 2 4/21/2009 1 1 2 5/12/2009 0 0.67 2 7/11/2010 1 1 ;
data have;
INFORMAT date MMDDYY10.;
input id date case;
FORMAT date MMDDYY10.;
datalines;
1 1/15/2008 1
1 1/31/2009 0
1 4/13/2009 0
1 5/30/2009 1
1 1/7/2010 1
2 1/24/2008 0
2 2/19/2009 1
2 4/21/2009 1
2 5/12/2009 0
2 7/11/2010 1
;
data want;
array x{%sysevalf('01jan1900'd):%sysevalf('01jan2200'd)} _temporary_;
call missing(of x{*});
do until(last.id);
set have;
by id;
x{date}=case;
end;
do until(last.id);
set have;
by id;
m=0;n=0;
do i=intnx('year',date,-1,'s') to date;
m+x{i};
if not missing(x{i}) then n+1;
end;
ratio=m/n;output;
end;
drop m n i;
run;
You should clarify what the rule for "within 1 year from that record" may be.
Consider your example:(which by the way does not specify which observations are contributing)
1 1/31/2009 0 0 1 4/13/2009 0 0 1 5/30/2009 1 0.33 /* example: 0.33 = 1/3 */ 1 1/7/2010 1 0.50 /* example: 0.50 = 2/4 */
Why is this not 2/3? 1/7/2009 is "within one year of " 5/30/2009. So it appears your rule is not "within" but "less than one year prior" or similar.
1 5/30/2009 1 0.33 /* example: 0.33 = 1/3 */
@ballardw
Thanks for the note! Yes, it's 1 year prior to each record. I've corrected the question.
data have;
INFORMAT date MMDDYY10.;
input id date case;
FORMAT date MMDDYY10.;
datalines;
1 1/15/2008 1
1 1/31/2009 0
1 4/13/2009 0
1 5/30/2009 1
1 1/7/2010 1
2 1/24/2008 0
2 2/19/2009 1
2 4/21/2009 1
2 5/12/2009 0
2 7/11/2010 1
;
data want;
array x{%sysevalf('01jan1900'd):%sysevalf('01jan2200'd)} _temporary_;
call missing(of x{*});
do until(last.id);
set have;
by id;
x{date}=case;
end;
do until(last.id);
set have;
by id;
m=0;n=0;
do i=intnx('year',date,-1,'s') to date;
m+x{i};
if not missing(x{i}) then n+1;
end;
ratio=m/n;output;
end;
drop m n i;
run;
@Ksharp Thank you! Much appreciated.
If you do not have a big table ,Try RPOC SQL:
data have;
INFORMAT date MMDDYY10.;
input id date case;
FORMAT date MMDDYY10.;
datalines;
1 1/15/2008 1
1 1/31/2009 0
1 4/13/2009 0
1 5/30/2009 1
1 1/7/2010 1
2 1/24/2008 0
2 2/19/2009 1
2 4/21/2009 1
2 5/12/2009 0
2 7/11/2010 1
;
data have;
set have;
rename case=_case; /*case is a key/special word,so rename it for SQL*/
run;
proc sql;
create table want2 as
select *,
(select sum(_case) from have where id=a.id and date between intnx('year',a.date,-1,'s') and a.date)/
(select count(_case) from have where id=a.id and date between intnx('year',a.date,-1,'s') and a.date) as ratio
from have as a;
quit;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.