12-21-2015 02:09 PM
I am a new user of SAS so that my question may be very easy but unfortunately I do not know the aswer. I will appreciate for any of your suggestions.
Thank you very much for your time and Merry Christmas!
The following is the description of the dataset and the results that I would like to find:
Data description: longitudinal data, edu is a dummy which 1 represents to have education in the corresponding tax year; 0 not.
I would like to get (1) the duration of the waiting period for a person to go to education, and the number of years s/he participated in education. (2) Also, the mean of the waiting time and the number of years in edu for all people.
For (1), I would like to have a variable to show the first person wait for 1 year to start education, and stay in school for 2 years; the 2nd person waits 0 year and stay in school for 3 years, 4th wait for 2 years and in school for 2 years. For person 3, he has multiple entry of education but I would like to count only the first time and treat the 2nd or any furhter edu as no edu in the period.
For (2), I would like to think it is good if the mean of waiting time and years of edu can be calculated directly.
id tax_year edu
1 1 0
1 2 1
1 3 1
1 4 0
1 5 0
2 1 1
2 2 1
2 3 1
2 4 0
2 5 0
3 1 0
3 2 1
3 3 0
3 4 1
3 5 0
4 1 0
4 2 0
4 3 1
4 4 1
5 5 0
12-21-2015 02:56 PM
Calculating means is easy. Preparing the numbers is a bit trickier. Assuming your data is sorted as indicated, BY ID TAX_YEAR:
if first.id then do;
retain waiting_increment education_increment;
if edu=0 then do;
if education_years > 0 then education_increment=0;
waiting_years + waiting_increment;
education_years + education_increment;
Then run PROC MEANS to get the means:
proc means data=want;
var waiting_years education_years;
It's untested ... should be fine by write back if there are any difficulties.
12-21-2015 04:14 PM
I tested the code and it works. I am wondering whether I can ask you the logic behind the code. Please forgive me if my question is too basic.
I did a few tests and try to find the logic when SAS runs this code. I think SAS handles it as kind of loop. The reason that I think so is that: before running the "else do....end" part, the code generates the total of all non-edu period, or the counts for all edu = 0. However, when the else do...end added, the periods after an edu = 1 have been deducted from the previous totals. This generates the actual waiting time. The reason I think it is a loop is from here-- the Else do changed the condition of calculating total and recalculated it. The conclusion also comes from the education_years, since when SAS run the “If edu_years >0 then …” , all edu_years are equal to zeros.
I am wondering whether you can point out my understanding is correct or not.
Thank you very much for your time.
12-21-2015 04:13 PM
Try this: edu2 and edu3 datasets will have appropriate variables you need. You can then use any of the procs univariate/means to count means.
data edu2(keep=id edu1_yr) edu3(keep=id tot_edu); set edu; by id tax_year; where edu=1; retain tot_edu; new_yr=lag(tax_year); if first.id then new_yr=tax_year; if edu=1 and tax_year-new_yr>1 then new_edu=0; else new_edu=edu; if first.id then do; edu1_yr=tax_year-1; output edu2; end; if first.id then tot_edu=0; tot_edu=tot_edu+new_edu; if last.id then output edu3; run;