I want to calculate mean time between two survey responses. I am working with a survey data with five follow-ups. Questionnaires were sent every three years. So, gap between each follow up is of three years. In my data, I have an indicator variable “survey no.” with values 1, 2, 3, 4, 5. Time between survey no.=1 and survey no.=2 is 3 years, between survey no. =1 and survey no. =3 is 6 years. The longest possible gap between survey no. =1 and survey no. =5 is 12 years. In my study I included participants who responded minimum of two survey but I have some participants who responded to all surveys, some to three surveys, and some to two surveys. This gives multiple combinations of number of years per id between two surveys. I have a made up sample below.
data have;
input ID survey no. exposure;
1 2 0
1 3 1
2 1 1
2 1 1
2 3 0
2 4 0
2 5 1
3 1 0
3 2 1
3 3 1
3 4 0
3 5 1
;
run;
In an attached image, I tried to explain possible combinations between survey response per id.
I tried to make a syntax but it is complicated that I thought. I am new to SAS and reluctant to suggest any code.
The basic approach would be to add an actual value of the interval for each record. A zero for the first survey and then apply your rule for calculating the interval.
You need to provide the rule for the interval between every possible pair of survey numbers. You do not say what the interval be survey 1 and 4, 2 and 4, 2 and 5, 3 and 4, or 3 and 5.
Another approach that may be more robust would be to have an actual DATE for each survey if practical, i.e. the response date actually collected. Learning point for future projects. But if the time frame for the collection was such that all Survey=1 were done in a single calendar day or month then you could add that date based on the number and use INTCK function between dates to get an actual interval. Then calculate the mean of those intervals.
Also that PDF only shows values involving exposure=1 and the first survey. Is that the only "interval" that you are concerned with? If Exposure has a role in this you have to explicitly state what that may be.
However before you get started you show for ID=2 that that there are two survey no=1. So what is the interval when this occurs?
Also please test your data step so that the code is valid. You had an invalid variable name and no Cards or Datalines statement to indicate start of data.
data have; input ID surveyno exposure; datalines; 1 2 0 1 3 1 2 1 1 2 1 1 2 3 0 2 4 0 2 5 1 3 1 0 3 2 1 3 3 1 3 4 0 3 5 1 ; run;
Thank you for your reply!
I am sorry for silly mistakes I made in my question, I started using SAS a month ago. Below is a sample from my dataset.
DATA dummy;
input id survey_no year distance bmi ;
*time= no. of survey;
* year= year when survey was conducted;
*distance = dichotomised exposure variable;
*bmi = continous outcome variable;
DATALINES;
1 4 2009 0 30.6689
1 6 2015 1 29.7004
2 4 2009 1 27.7744
2 6 2015 0 28.3782
3 1 2000 0 24.1140
3 2 2003 0 24.6914
3 3 2006 0 24.2188
3 4 2009 0 25.0000
3 5 2012 1 25.3086
3 6 2015 0 24.3827
4 4 2009 1 26.9531
4 5 2012 0 30.0914
4 6 2015 0 30.4688
5 1 2000 0 22.0386
5 2 2003 0 23.5078
5 4 2009 1 25.6544
5 5 2012 1 26.3980
5 6 2015 1 26.4463
;
RUN;
Surveys were conducted at different time points and respondents have not participated in all the surveys. In simplest form, we assess an association between "distance" and "bmi". Distance is dichotomised while BMI is continous. When we run a model, SAS uses different combinations for 0 1 and this 0 1 response comes from different time points and the gap between time points can vary from 3 to 12 years. I want to calculate mean time i.e years between two responses i.e distance considering multiple combinations depending on the number of the survey or year respondents participated in.
Thank you for your reply!
I am sorry for silly mistakes I made in my question, I started using SAS a month ago. Below is a sample from my dataset.
DATA dummy;
InPuT id survey_no year distance bmi ;
*time= no. of survey;
* year= year when survey was conducted;
*distance = dichotomised exposure variable;
*bmi = continous outcome variable;
DATALINES;
1 4 2009 0 30.6689
1 6 2015 1 29.7004
2 4 2009 1 27.7744
2 6 2015 0 28.3782
3 1 2000 0 24.1140
3 2 2003 0 24.6914
3 3 2006 0 24.2188
3 4 2009 0 25.0000
3 5 2012 1 25.3086
3 6 2015 0 24.3827
4 4 2009 1 26.9531
4 5 2012 0 30.0914
4 6 2015 0 30.4688
5 1 2000 0 22.0386
5 2 2003 0 23.5078
5 4 2009 1 25.6544
5 5 2012 1 26.3980
5 6 2015 1 26.4463
;
run;
Surveys were conducted at different time points and respondents have not participated in all the surveys. Time between two surveys is of 3 years. In simplest form, we assess an association between "distance" and "bmi". Distance is dichotomised while BMI is continous. When we run a model, SAS uses different combinations for 0 1 and this 0 1 response comes from different time points and the gap between time points can vary from 3 to 12 years. I want to calculate mean time i.e years between two responses i.e distance considering multiple combinations depending on the number of the survey or year respondents participated in.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.