Hi....I am trying to create separate records for each StudentUID based on whether the CompletionDate overlaps consecutive academic years. The new records seem to be created fine but when it comes to changing the TermCalendarID it doesn't seem to work in all cases as for StudentUID=12064. I wonder if anyone can help....Thanks.
data Have;
length StudentUID 8 Programs $ 22 EnrollmentStatus $ 16 RegisterStatus $ 12 TermCalendarID $ 10 StartDate 8 CompletionDate 8 ;
input StudentUID : BEST32. Programs : $CHAR22. EnrollmentStatus : $CHAR16. RegisterStatus : $CHAR12. TermCalendarID : $CHAR15. StartDate :yymmdd10. CompletionDate : yymmdd10. ;
format StartDate yymmdd10. CompletionDate yymmdd10. ;
datalines;
785 CulinaryArts PostSecondary PostSec 15-16 2015-09-10 2016-06-17
2774 Autobody PostSecondary PostSec 15-16 2015-09-10 2017-06-15
10543 CulinaryArts PostSecondary PostSec 13-14 2013-07-02 2015-06-20
11609 Carpentry PostSecondary PostSec 13-14 2013-07-02 2014-06-20
12064 Hairstyling PostSecondary PostSec 16-17 2016-02-01 2017-06-17
12065 Automotive PostSecondary PostSec 16-17 2016-02-01 2019-06-22
13687 CulinaryArts PostSecondary PostSec 13-14 2013-09-05 2016-06-20
13855 Automotive PostSecondary PostSec 15-16 2015-02-03 2015-06-19
15258 Autobody Secondary PTSD 13-14 2013-07-02 2019-06-20
;
run;
data Want;
set Have;
_CompletionDate=CompletionDate;
_k=year(_CompletionDate)-year(startdate);
if _k=1 then do;
if StartDate<mdy(6,30,year(StartDate)) then do;
CompletionDate=mdy(6,30,year(StartDate));
output;
StartDate=CompletionDate+1;
CompletionDate=_CompletionDate;
output;
end;
else output;
end;
else if _k>1 then do;
do n=year(StartDate) to year(_CompletionDate)-1;
if n=year(StartDate) then do;
if StartDate<mdy(6,30,n) then do;
CompletionDate=mdy(6,30,n);
output;
TermCalendarID = catx('-',put(input(substr(TermCalendarID,1,2),8.) + 1,2.),put(input(substr(TermCalendarID,4,2),8.) + 1,2.));
StartDate=CompletionDate+1;
CompletionDate=mdy(6,30,n+1);
output;
end;
else do;
CompletionDate=mdy(6,30,n+1);
output;
end;
end;
else if year(StartDate)<n<year(_CompletionDate)-1 then do;
startdate=mdy(7,1,n);
CompletionDate=mdy(6,30,n+1);
TermCalendarID = catx('-',put(input(substr(TermCalendarID,1,2),8.) + 1,2.),put(input(substr(TermCalendarID,4,2),8.) + 1,2.));
output;
end;
else if n=year(_CompletionDate)-1 then do;
startdate=mdy(7,1,n);CompletionDate=_CompletionDate;
TermCalendarID = catx('-',put(input(substr(TermCalendarID,1,2),8.) + 1,2.),put(input(substr(TermCalendarID,4,2),8.) + 1,2.));
output;
end;
end;
end;
else output;
drop n _:;
run;
Results:
StudentUID | Programs | EnrollmentStatus | RegisterStatus | TermCalendarID | StartDate | CompletionDate |
785 | CulinaryArts | PostSecondary | PostSec | 15-16 | 2015-09-10 | 2016-06-17 |
2774 | Autobody | PostSecondary | PostSec | 15-16 | 2015-09-10 | 2016-06-30 |
2774 | Autobody | PostSecondary | PostSec | 16-17 | 2016-07-01 | 2017-06-15 |
10543 | CulinaryArts | PostSecondary | PostSec | 13-14 | 2013-07-02 | 2014-06-30 |
10543 | CulinaryArts | PostSecondary | PostSec | 14-15 | 2014-07-01 | 2015-06-20 |
11609 | Carpentry | PostSecondary | PostSec | 13-14 | 2013-07-02 | 2014-06-20 |
12064 | Hairstyling | PostSecondary | PostSec | 16-17 | 2016-02-01 | 2016-06-30 |
12064 | Hairstyling | PostSecondary | PostSec | 16-17 | 2016-07-01 | 2017-06-17 |
12065 | Automotive | PostSecondary | PostSec | 16-17 | 2016-02-01 | 2016-06-30 |
12065 | Automotive | PostSecondary | PostSec | 17-18 | 2016-07-01 | 2017-06-30 |
12065 | Automotive | PostSecondary | PostSec | 18-19 | 2017-07-01 | 2018-06-30 |
12065 | Automotive | PostSecondary | PostSec | 19-20 | 2018-07-01 | 2019-06-22 |
13687 | CulinaryArts | PostSecondary | PostSec | 13-14 | 2013-09-05 | 2014-06-30 |
13687 | CulinaryArts | PostSecondary | PostSec | 14-15 | 2014-07-01 | 2015-06-30 |
13687 | CulinaryArts | PostSecondary | PostSec | 15-16 | 2015-07-01 | 2016-06-20 |
13855 | Automotive | PostSecondary | PostSec | 15-16 | 2015-02-03 | 2015-06-19 |
15258 | Autobody | Secondary | PTSD | 13-14 | 2013-07-02 | 2014-06-30 |
15258 | Autobody | Secondary | PTSD | 14-15 | 2014-07-01 | 2015-06-30 |
15258 | Autobody | Secondary | PTSD | 15-16 | 2015-07-01 | 2016-06-30 |
15258 | Autobody | Secondary | PTSD | 16-17 | 2016-07-01 | 2017-06-30 |
15258 | Autobody | Secondary | PTSD | 17-18 | 2017-07-01 | 2018-06-30 |
15258 | Autobody | Secondary | PTSD | 18-19 | 2018-07-01 | 2019-06-20 |
In the sample output you provided StudentUID=12064 seems to give you 2 lines of output , I guess that is what is expected. You said that changing the TermCalendarID is affecting the output. Can you simulate the input data for that scenario please.
In the sample output you provided StudentUID=12064 seems to give you 2 lines of output , I guess that is what is expected. You said that changing the TermCalendarID is affecting the output. Can you simulate the input data for that scenario please.
Hi....I made the changes to the TermCalendarUID as there were errors. I think I was missing a few statements for TermCalendarUID before the output statement. I included the variables such as Programs to see whether or not they are carrying over as the same in the new records.
data Have;
length StudentUID 8 Programs $ 22 EnrollmentStatus $ 16 RegisterStatus $ 12 TermCalendarID $ 10 StartDate 8 CompletionDate 8 ;
input StudentUID : BEST32. Programs : $CHAR22. EnrollmentStatus : $CHAR16. RegisterStatus : $CHAR12. TermCalendarID : $CHAR15. StartDate :yymmdd10. CompletionDate : yymmdd10. ;
format StartDate yymmdd10. CompletionDate yymmdd10. ;
datalines;
785 CulinaryArts PostSecondary PostSec 15-16 2015-09-10 2016-06-17
2774 Autobody PostSecondary PostSec 15-16 2015-09-10 2017-06-15
10543 CulinaryArts PostSecondary PostSec 13-14 2013-07-02 2015-06-20
11609 Carpentry PostSecondary PostSec 13-14 2013-07-02 2014-06-20
12064 Hairstyling PostSecondary PostSec 15-16 2016-02-01 2017-06-17
12065 Automotive PostSecondary PostSec 15-16 2016-02-01 2019-06-22
13687 CulinaryArts PostSecondary PostSec 13-14 2013-09-05 2016-06-20
13855 Automotive PostSecondary PostSec 14-15 2015-02-03 2015-06-19
15258 Autobody Secondary PTSD 13-14 2013-07-02 2019-06-20
;
run;
data Want;
set Have;
_CompletionDate=CompletionDate;
_k=year(_CompletionDate)-year(startdate);
if _k=1 then do;
if StartDate<mdy(6,30,year(StartDate)) then do;
CompletionDate=mdy(6,30,year(StartDate));
TermCalendarID = catx('-',put(input(substr(TermCalendarID,1,2),8.),2.),put(input(substr(TermCalendarID,4,2),8.),2.));
output;
TermCalendarID = catx('-',put(input(substr(TermCalendarID,1,2),8.)+1,2.),put(input(substr(TermCalendarID,4,2),8.)+1,2.));
StartDate=CompletionDate+1;
CompletionDate=_CompletionDate;
output;
end;
else output;
end;
else if _k>1 then do;
do n=year(StartDate) to year(_CompletionDate)-1;
if n=year(StartDate) then do;
if StartDate<mdy(6,30,n) then do;
CompletionDate=mdy(6,30,n);
TermCalendarID = catx('-',put(input(substr(TermCalendarID,1,2),8.),2.),put(input(substr(TermCalendarID,4,2),8.),2.));
output;
StartDate=CompletionDate+1;
CompletionDate=mdy(6,30,n+1);
TermCalendarID = catx('-',put(input(substr(TermCalendarID,1,2),8.) + 1,2.),put(input(substr(TermCalendarID,4,2),8.) + 1,2.));
output;
end;
else do;
CompletionDate=mdy(6,30,n+1);
TermCalendarID = catx('-',put(input(substr(TermCalendarID,1,2),8.),2.),put(input(substr(TermCalendarID,4,2),8.),2.));
output;
end;
end;
else if year(StartDate)<n<year(_CompletionDate)-1 then do;
startdate=mdy(7,1,n);
CompletionDate=mdy(6,30,n+1);
TermCalendarID = catx('-',put(input(substr(TermCalendarID,1,2),8.) + 1,2.),put(input(substr(TermCalendarID,4,2),8.) + 1,2.));
output;
end;
else if n=year(_CompletionDate)-1 then do;
startdate=mdy(7,1,n);
CompletionDate=_CompletionDate;
TermCalendarID = catx('-',put(input(substr(TermCalendarID,1,2),8.) + 1,2.),put(input(substr(TermCalendarID,4,2),8.) + 1,2.));
output;
end;
end;
end;
else output;
drop n _:;
run;
And what should the ideal output look like?
Hint: for example data only include a minimum number of variables needed to illustrate the issue about the question you have.
For example how do the variable Programs, EnrollmentStatus and RegistrationStatus affect the outcome? I don't see them used in any calculation. So they, for this purpose, only serve to take up screen space. Fewer variables will also simplify example desired results.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.