BookmarkSubscribeRSS Feed
UNBRDC
Calcite | Level 5

Hello,

 

I am a new user of SAS so that my question may be very easy but unfortunately I do not know the aswer. I will appreciate for any of your suggestions.

 

Thank you very much for your time and Merry Christmas!

 

The following is the description of the dataset and the results that I would like to find:

 

Data description: longitudinal data, edu is a dummy which 1 represents to have education in the corresponding tax year; 0 not.

 

I would like to get (1) the duration of the waiting period for a person to go to education, and the number of years s/he participated in education. (2) Also, the mean of the waiting time and the number of years in edu for all people.

 

For (1), I would like to have a variable to show the first person wait for 1 year to start education, and stay in school for 2 years; the 2nd person waits 0 year and stay in school for 3 years, 4th wait for 2 years and in school for 2 years. For person 3, he has multiple entry of education but I would like to count only the first time and treat the 2nd or any furhter edu as no edu in the period.

 

For (2), I would like to think it is good if the mean of waiting time and years of edu can be calculated directly.

 

 

 

Sample data:

 

id            tax_year              edu                       

1              1                              0                             

1              2                              1                             

1              3                              1                             

1              4                              0             

1              5                              0

 

2              1                              1             

2              2                              1             

2              3                              1             

2              4                              0             

2              5                              0

 

 

3              1                              0             

3              2                              1             

3              3                              0             

3              4                              1             

3              5                              0

 

4              1                              0             

4              2                              0             

4              3                              1             

4              4                              1

5              5                              0             

 

 

3 REPLIES 3
Astounding
PROC Star

Calculating means is easy.  Preparing the numbers is a bit trickier.  Assuming your data is sorted as indicated, BY ID TAX_YEAR:

 

data want;

set have;

by id;

if first.id then do;

   waiting_increment=1;

   education_increment=1;

   waiting_years=0;

   education_years=0;

end;

retain waiting_increment education_increment;

if edu=0 then do;

   if education_years > 0 then education_increment=0;

   waiting_years + waiting_increment;

end;

else do;

   waiting_increment=0;

   education_years + education_increment;

end;

if last.id;

run;

 

Then run PROC MEANS to get the means:

 

proc means data=want;

var waiting_years education_years;

run;

 

It's untested ... should be fine by write back if there are any difficulties.

 

UNBRDC
Calcite | Level 5

Thanks. Astounding.

 

I tested the code and it works. I am wondering whether I can ask you the logic behind the code. Please forgive me if my question is too basic.

 

I did a few tests and try to find the logic when SAS runs this code. I think SAS handles it as kind of loop. The reason that I think so is that: before running the "else do....end" part, the code generates the total of all non-edu period, or the counts for all edu = 0. However, when the else do...end added, the periods after an edu = 1 have been deducted from the previous totals. This generates the actual waiting time. The reason I think it is a loop is from here-- the Else do changed the condition of calculating total and recalculated it. The conclusion also comes from the education_years, since when SAS run the “If edu_years >0 then …” , all edu_years are equal to zeros.

I am wondering whether you can point out my understanding is correct or not.

 

Thank you very much for your time.

ndp
Quartz | Level 8 ndp
Quartz | Level 8

Try this: edu2 and edu3 datasets will have appropriate variables you need. You can then use any of the procs univariate/means to count means.

 

data edu2(keep=id edu1_yr) edu3(keep=id tot_edu);
	set edu;
	by id tax_year;
	where edu=1;
	retain tot_edu;
	new_yr=lag(tax_year);
	if first.id then new_yr=tax_year;
	if edu=1 and tax_year-new_yr>1 then new_edu=0;
	else new_edu=edu;
	if first.id then do;
                   edu1_yr=tax_year-1;
                   output edu2;
                end;
	if first.id then tot_edu=0;
	tot_edu=tot_edu+new_edu;
	if last.id then output edu3;
run;

 

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1220 views
  • 0 likes
  • 3 in conversation