BookmarkSubscribeRSS Feed
UNBRDC
Calcite | Level 5

Hello,

 

I am a new user of SAS so that my question may be very easy but unfortunately I do not know the aswer. I will appreciate for any of your suggestions.

 

Thank you very much for your time and Merry Christmas!

 

The following is the description of the dataset and the results that I would like to find:

 

Data description: longitudinal data, edu is a dummy which 1 represents to have education in the corresponding tax year; 0 not.

 

I would like to get (1) the duration of the waiting period for a person to go to education, and the number of years s/he participated in education. (2) Also, the mean of the waiting time and the number of years in edu for all people.

 

For (1), I would like to have a variable to show the first person wait for 1 year to start education, and stay in school for 2 years; the 2nd person waits 0 year and stay in school for 3 years, 4th wait for 2 years and in school for 2 years. For person 3, he has multiple entry of education but I would like to count only the first time and treat the 2nd or any furhter edu as no edu in the period.

 

For (2), I would like to think it is good if the mean of waiting time and years of edu can be calculated directly.

 

 

 

Sample data:

 

id            tax_year              edu                       

1              1                              0                             

1              2                              1                             

1              3                              1                             

1              4                              0             

1              5                              0

 

2              1                              1             

2              2                              1             

2              3                              1             

2              4                              0             

2              5                              0

 

 

3              1                              0             

3              2                              1             

3              3                              0             

3              4                              1             

3              5                              0

 

4              1                              0             

4              2                              0             

4              3                              1             

4              4                              1

5              5                              0             

 

 

3 REPLIES 3
Astounding
PROC Star

Calculating means is easy.  Preparing the numbers is a bit trickier.  Assuming your data is sorted as indicated, BY ID TAX_YEAR:

 

data want;

set have;

by id;

if first.id then do;

   waiting_increment=1;

   education_increment=1;

   waiting_years=0;

   education_years=0;

end;

retain waiting_increment education_increment;

if edu=0 then do;

   if education_years > 0 then education_increment=0;

   waiting_years + waiting_increment;

end;

else do;

   waiting_increment=0;

   education_years + education_increment;

end;

if last.id;

run;

 

Then run PROC MEANS to get the means:

 

proc means data=want;

var waiting_years education_years;

run;

 

It's untested ... should be fine by write back if there are any difficulties.

 

UNBRDC
Calcite | Level 5

Thanks. Astounding.

 

I tested the code and it works. I am wondering whether I can ask you the logic behind the code. Please forgive me if my question is too basic.

 

I did a few tests and try to find the logic when SAS runs this code. I think SAS handles it as kind of loop. The reason that I think so is that: before running the "else do....end" part, the code generates the total of all non-edu period, or the counts for all edu = 0. However, when the else do...end added, the periods after an edu = 1 have been deducted from the previous totals. This generates the actual waiting time. The reason I think it is a loop is from here-- the Else do changed the condition of calculating total and recalculated it. The conclusion also comes from the education_years, since when SAS run the “If edu_years >0 then …” , all edu_years are equal to zeros.

I am wondering whether you can point out my understanding is correct or not.

 

Thank you very much for your time.

ndp
Quartz | Level 8 ndp
Quartz | Level 8

Try this: edu2 and edu3 datasets will have appropriate variables you need. You can then use any of the procs univariate/means to count means.

 

data edu2(keep=id edu1_yr) edu3(keep=id tot_edu);
	set edu;
	by id tax_year;
	where edu=1;
	retain tot_edu;
	new_yr=lag(tax_year);
	if first.id then new_yr=tax_year;
	if edu=1 and tax_year-new_yr>1 then new_edu=0;
	else new_edu=edu;
	if first.id then do;
                   edu1_yr=tax_year-1;
                   output edu2;
                end;
	if first.id then tot_edu=0;
	tot_edu=tot_edu+new_edu;
	if last.id then output edu3;
run;

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1667 views
  • 0 likes
  • 3 in conversation