Hi folks,
I have a data set where each ID has multiple visits over time and I have 7 time points for which I need to calculate mean number of visits by ID. I have tried several codes and asked the question on this forum but couldn't get an answer. Can someone help please?
Ta
My data looks like;
ID status eventdate age sex
1 Eligible 2016-04-14 73 male
1 Eligible 2017-02-12 73 male
1 Eligible 2018-09-03 73 male
1 Eligible 2016-01-15 73 male
1 Eligible 2016-11-17 73 male
2 Intervention 2018-01-15 73 male
2 Intervention 2016-01-12 73 male
2 Intervention 2017-04-15 73 male
2 Intervention 2018-09-22 73 male
2 Intervention 2018-07-26 73 male
2 Intervention 2018-03-15 73 male
3 Intervention 2017-02-13 73 male
3 Intervention 2017-01-12 73 male
Based on eventdate, I have 7 time points; arbitrarily;
if EventDate<= '28FEB2016'd then delete;*for the sake of uniformity in number of pre- and post- time periods, deleting visits before March 2016;
else if '01MAR2016'd <=EventDate<= '31AUG2016'd then month2='-3';*from March 2016 to August 2016;
else if '01SEP2016'd <=EventDate<= '28FEB2017'd then month2='-2'; *From September 2016 to February 2017;
else if '01MAR2017'd <=EventDate<= '31AUG2017'd then month2='-1'; *From March 2017 to August 2017;
else if '01SEP2017'd <=EventDate<= '28FEB2018'd then month2='0'; *From September 2017 to February 2018- intervention period;
else if '01MAR2018'd <=EventDate<= '31AUG2018'd then month2='1'; *From March 2018 to August 2018- first six months after intervention;
else if '01SEP2018'd <=EventDate<= '28FEB2019'd then month2='2'; *From September 2018 to February 2019- second six months after intervention;
else if '01MAR2019'd <=EventDate<= '31MAY2019'd then month2='3'; *From March 2019 to May 2018- third time period after intervention but doesn't have the data till August;
Help much appreciated.
Show us the code you tried.
So I have created a variable called 'month2' to get 7 time periods from the 'eventdate' variable and another variable called 'visit' to indicate one visit for each 'eventdate' then wrote this code;
proc summary data=saltaire;
by status month2;
var visit;
output out=sum n=n;
run;
What I am not sure about is, how to take into account visits per number of IDs by status and month2.
forgot to mention, the code that I wrote only gives me a sum of visits during each time period but not a mean by number of ids in each time period.
If I interpret you correctly, you should run proc means once to get the sums per id and "month", and then a second time to get the averages of the sums per id.
Thanks Kurt, that's the problem, I don't know how to calculate mean by ID. Might be a simple step but it's because I've been struggling with it too much and my brain has given up probably.
Show us what you have tried.
I have posted the code in my previous reply.
Thanks
S
@sks521 wrote:
I have posted the code in my previous reply.
Thanks
S
No, you posted code that computes the value of N for each value of STATUS and MONTH2.
We want to see the code that you are trying to use to compute the mean value of N.
PROC MEANS DATA=saltaire
SUM MEAN MAXDEC=0 ;
VAR visit;
by pkid;
run;
the output looks like;
so this gives me sum of visits for each ID separately. How do I convert this into average fro each time period?
Mathematically speaking I am looking for an output like;
Average visits per ID for pre-intervention period= total visits/total number of ids (discrete ids)
@sks521 wrote:
PROC MEANS DATA=saltaire
SUM MEAN MAXDEC=0 ;
VAR visit;
by pkid;
run;
the output looks like;
PKid=437Analysis Variable: visitSum Mean20 1
so this gives me sum of visits for each ID separately. How do I convert this into average fro each time period?
Average visits per ID for pre-intervention period= total visits/total number of ids (discrete ids attending the hospital during that period)
For example;
average number of visits during pre-intervention;
Eligible= 4.44
Intervention= 5.66
Average number of visits during intervention;
eligible=5.33
Intervention= 6.66
Average number of visits post-intervention;
eligible=3.99
Intervention=6.22
Something like that.
If you want the results by time period, then you need to use a time period variable in PROC MEANS, which you haven't done. And if you want averages also by STATUS, you need to have STATUS in your PROC MEANS.
You also don't have a variable named VISIT, so you can't do any analysis on it. Perhaps you mean the variable N, which was computed in your first PROC MEANS?
Lastly, if you want results pre-intervention, during intervention or post-intervention, you need to create a variable with these values somehow.
So all of these elements are needed for in the code you to get an answer.
Something like this might work after your first PROC MEANS
proc format;
value period -1,-2,-3 = 'Pre-Intervention' 0='Intervention' 1,2,3='Post-Intervention';
run;
PROC MEANS DATA=saltaire;
class status month2;
VAR n;
format month2 period.;
output out=want mean=;
run;
Sorry about confusion, I should've been more explicit.
I have created a variable to indicate my time periods in a separate step. I also did create a variable called 'visit' to indicate a visit. Have also tried the code you have suggested but it still doesn't solve the problem.
It doesn't give me average number of visits per discrete id for each time period by status
This is going nowhere.
Please post
Make sure you cover edge cases, so we don't have to do a lot of further back-and-forth.
pkid visit month2 status
1 1 -3 eligible
1 1 -3 eligible
1 1 -2 eligible
1 1 0 eligible
2 1 -1 intervention
2 1 2 intervention
2 1 3 intervention
2 1 0 intervention
3 1 2 intervention
3 1 -1 intervention
3 1 -2 intervention
3 1 0 intervention
3 1 0 intervention
3 1 3 intervention
Month2 indicates my time periods which are 7 in total.
I ran sql which you'd suggested previously;
proc sql;
create table average as
select pkid, avg(visitsum) as average
from (
select pkid, month2, count(*) as visitsum
from saltaire
group by pkid, month2
)
group by pkid;
quit;
it doesn't give me average number of visits per discrete id for each time period by 'status'.
Thanks for your help
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.