BookmarkSubscribeRSS Feed
Luluvolcano
Calcite | Level 5

 

Hi,

 

I'd appreciate if someone can provide some help for my following question:

 

Data description: longitudinal data, edu is a dummy which 1 represents to have education in the corresponding tax year; 0 means no education.

 

I would like to get the duration of the waiting period before a person gets education, and how many years one participates at school.

 

For example, I would like to have a variable to show the first person waits for 1 year to start education, and stays in school for 2 years; the 2nd person waits for 0 year and stays in school for 3 years; the 4th person waits for 2 years and has been in school for 2 years. For the 3rd person, he has multiple entries for education but I would like to count only the first time (or 1 year of education).

 

In other words, I wonder if it is possible to create a variable for only those who have attended school consecutively. 

I would also like to see if the mean of waiting time and years of education can be calculated directly.

 

Many thanks!!!

 

 

Sample data:

 

id            tax_year              edu                       

1              1                              0                             

1              2                              1                             

1              3                              1                             

1              4                              0             

1              5                              0

 

2              1                              1             

2              2                              1             

2              3                              1             

2              4                              0             

2              5                              0

 

 

3              1                              0             

3              2                              1             

3              3                              0             

3              4                              1             

3              5                              0

 

4              1                              0             

4              2                              0             

4              3                              1             

4              4                              1

4              5                              0             

10 REPLIES 10
ballardw
Super User

Can you show what you would like the output to look like for the example data?

Luluvolcano
Calcite | Level 5

Hi,

 

If possible, the output should be:

1) SAS code for how to create a duration variable in such case and condition;

2) SAS code for how to create an education variable that only takes consecutive years of education once, and if there are random multiple entries for educational attainments at different years, we will just count the first entry once. 

3) SAS code for calcuating the mean of the newly created duration and education variables.

 

Thanks a lot!

ballardw
Super User

Show the actual output similar to your input data layout. It helps to clarify what the rules actually mean.

And it should show a reasonable number of case/conditions.

Luluvolcano
Calcite | Level 5

Hi,

 

Thanks for the prompt reply.

 

I am a new SAS user and I am still at the early stage of learning this program.

 

I am not sure how to create what I want using SAS that's why there's nothing to show.

 

Would "array" work? 

 

Thanks again.

ballardw
Super User

Example output can be text just like your input but add the expected values for that set of data. maybe something like:

 

id            tax_year              edu                wait        participation           

1              1                              0                             

1              2                              1                 1            

1              3                              1                             2

1              4                              0             

1              5                              0

 

2              1                              1             0

2              2                              1             

2              3                              1                             3

2              4                              0             

2              5                              0

 

 

3              1                              0             

3              2                              1             1

3              3                              0                            1

3              4                              1             

3              5                              0

 

Or maybe      

id             wait        participation           

1              1             2

2               0            3

3               1            1

 

DO you have a SAS dataset with your starting data?

 

Since you are looking a values across records then RETAIN for the wait and participation variables is likely. You may want a flag to set after the first time you set participation duration to say that you are done.

Since you need to reset the process for each ID then First.ID type processing is a likely start.

Hint to get started: for the first record in each ID then you want to set an appropriate initial value for your wait and participation variables.

You may want to look at the LAG function to compare the current value of edu with the previous one within your ID to see if the current record should increase the count for wait or participation.

 

How you want the output to look like will change some other options. One, if you have gotten to a point where you have completed the initial participation cycle, you might want to explicitly output the data at that point only.

 

Luluvolcano
Calcite | Level 5

Hi,

 

Thank you for the useful comments. 

 

Yes, based on my example data, my example output will look like

 

Id  Duration Edu Year
1   1             2
2   0             3
3   1             1

4   2             2

 

I don't have SAS at home but I can test it later at school.

 

I tried to create some codes and not sure if they are right. The part in red is the part I think will be the key (and they are probably not right at the moment):

 

Data;

set id;

by tax_year;

retain edu flag;

if first.tax_year then do;

edu=result;

flag=0;

end;

 

if 0 < result < edu then edu = result;

if (result+edu) >1 then flag =1;

if last.tax_year and flag = 1 then output;

 

run;

 

Thank you!

Ksharp
Super User
data have;
input id            tax_year              edu  ;
cards;                     
1              1                              0                             
1              2                              1                             
1              3                              1                             
1              4                              0             
1              5                              0
2              1                              1             
2              2                              1             
2              3                              1             
2              4                              0             
2              5                              0
3              1                              0             
3              2                              1             
3              3                              0             
3              4                              1             
3              5                              0 
4              1                              0             
4              2                              0             
4              3                              1             
4              4                              1
4              5                              0  
;
run;
proc summary data=have;
by id edu notsorted;
output out=temp;
run;
data temp1;
 set temp;
 by id;
 retain found;
 length name $ 10;
 name=ifc(edu=0,'Duration','EduYear');
 if first.id then call missing(found);
 if not found then output;
 if edu=1 then found=1;
run;
proc transpose data=temp1 out=temp2(drop=_name_);
 by id;
 id name;
 var _freq_;
run;
proc stdize data=temp2 out=want reponly missing=0;run;
Luluvolcano
Calcite | Level 5

Thank you, Keshan.

 

I will give it a try and get back to you 😃

Reeza
Super User
It isn't an issue with the program, we don't know what your problem is. Please illustrate it with sample input data AND sample output data that matches your input data.
Luluvolcano
Calcite | Level 5

 

Thank you. Please see my example output and code above.

 

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1938 views
  • 0 likes
  • 4 in conversation