I have a dataset that is submitted each month. It consists of a list of teachers and the classes they teach. If a teacher shows up in one class list for this month and they don't show up in the class for next month, the endDate of the column for the newly merged dataset will be updated. How do I update the endDate column of the merged dataset. In my example , the teacher with id 1 did not show up for class B in februrary so the end date is January.
Dataset for January
id className startdate enddate
1 A 01/01/2019
1 B 01/01/2019
2 A 01/01/2019
3 B 01/01/2019
Dataset for February
id className startdate enddate
1 A 01/02/2019
2 A 01/02/2019
3 B 01/02/2019
Dataset for march
id schoolName startdate enddate
1 A 01/03/2019
1 B 01/03/2019
3 B 01/03/2019
Data Want
id className startdate enddate
1 A 01/01/2019
1 B 01/01/2019 01/01/2019
1 A 01/02/2019
1 A 01/03/2019
1 B 01/03/2019
2 A 01/01/2019
2 A 01/02/2019 01/02/2019
3 B 01/01/2019
3 B 01/02/2019
3 B 01/03/2019
Hi OLUGBOJI
One way:
data january;
informat id 1. class $1. startdate ddmmyy10.;
format startdate ddmmyyd10.;
input id class startdate;
datalines;
1 A 01/01/2019
1 B 01/01/2019
2 A 01/01/2019
3 B 01/01/2019
;
run;
data february;
informat id 1. class $1. startdate ddmmyy10.;
format startdate ddmmyyd10.;
input id class startdate;
datalines;
1 A 01/02/2019
2 A 01/02/2019
3 B 01/02/2019
;
run;
data all; set january february;
run;
* max enddate pr. class;
proc sql;
create table classmax as
select distinct class, max(startdate) as classenddate format=ddmmyyd10.
from all
group by class;
quit;
* max enddate pr. teacher/class;
proc sql;
create table teacherclassmax as
select distinct id, class, max(startdate) as tcenddate format=ddmmyyd10.
from all
group by id, class;
quit;
* join max. enddate pr. class on all classes;
proc sql;
create table all_classend as
select all.id, all.class, all.startdate, classmax.classenddate
from all left join classmax
on all.class = classmax.class;
quit;
* join teacher/class-enddate, if not = class-enddate;
proc sql;
create table want as
select all_classend.id, all_classend.class, all_classend.startdate, teacherclassmax.tcenddate
from all_classend left join teacherclassmax
on
all_classend.id = teacherclassmax.id and
all_classend.class = teacherclassmax.class and
all_classend.classenddate ne teacherclassmax.tcenddate
order by id, class, startdate;
quit;
result:
Hi OLUGBOJI
One way:
data january;
informat id 1. class $1. startdate ddmmyy10.;
format startdate ddmmyyd10.;
input id class startdate;
datalines;
1 A 01/01/2019
1 B 01/01/2019
2 A 01/01/2019
3 B 01/01/2019
;
run;
data february;
informat id 1. class $1. startdate ddmmyy10.;
format startdate ddmmyyd10.;
input id class startdate;
datalines;
1 A 01/02/2019
2 A 01/02/2019
3 B 01/02/2019
;
run;
data all; set january february;
run;
* max enddate pr. class;
proc sql;
create table classmax as
select distinct class, max(startdate) as classenddate format=ddmmyyd10.
from all
group by class;
quit;
* max enddate pr. teacher/class;
proc sql;
create table teacherclassmax as
select distinct id, class, max(startdate) as tcenddate format=ddmmyyd10.
from all
group by id, class;
quit;
* join max. enddate pr. class on all classes;
proc sql;
create table all_classend as
select all.id, all.class, all.startdate, classmax.classenddate
from all left join classmax
on all.class = classmax.class;
quit;
* join teacher/class-enddate, if not = class-enddate;
proc sql;
create table want as
select all_classend.id, all_classend.class, all_classend.startdate, teacherclassmax.tcenddate
from all_classend left join teacherclassmax
on
all_classend.id = teacherclassmax.id and
all_classend.class = teacherclassmax.class and
all_classend.classenddate ne teacherclassmax.tcenddate
order by id, class, startdate;
quit;
result:
Hi OLUGBOJI
My solution does not work if a teacher starts taking a class again, because the teacher is still active, so max(startdate) will be march. But what should happen? - It often happens to me that I give a solution to a specific SAS coding problem without solving the user's real problem, because I haven't got the overall picture.
You might get better help if you gave a short explanation of your intentions with the program. Marking the months where a teacher is not present in a given class, allocating same teacher to new classes if a course is finished, or??
Thank you.
If your purpose is to get an overview to verify claims, and you don't need the data for further processing, I think a report is all you need, Proc Tabulate can do it direct from your input data without further data processing except for adding q quarter variable to use as class variable in Proc Tabulate.
I added an extra month to data from the previous examples to have data for more than one quarter. The tables have two sets of columns: Quarters with counts pr. quarter and Months as documentation for the counts:
data apr;
infile cards truncover;
input id className $ startdate :ddmmyy10. enddate :ddmmyy10. ;
format startdate ddmmyy10. enddate ddmmyy10. ;
cards;
1 A 01/04/2019
1 B 01/04/2019
4 B 01/04/2019
;
data all; set jan feb mar apr;
Quarter = put(startdate,yyqd.);
run;
proc tabulate data=all;
class id classname Quarter startdate;
format startdate yymmd8.;
label startdate = 'Month';
keylabel N=' ' all=' ';
table id*classname,Quarter all *startdate /box = 'Teacher by class' rts=20;
table id,Quarter all *startdate /box = 'Teacher total' rts=20;;
run;
Proc tabulate writes 2 tables:
Sorry - The tables were not made from april data as shown,because ID 4 had start in March. This is made from correct input:
data jan;
infile cards truncover;
input id className $ startdate :ddmmyy10. enddate :ddmmyy10. ;
format startdate ddmmyy10. enddate ddmmyy10. ;
cards;
1 A 01/01/2019
1 B 01/01/2019
2 A 01/01/2019
3 B 01/01/2019
;
data feb;
infile cards truncover;
input id className $ startdate :ddmmyy10. enddate :ddmmyy10. ;
format startdate ddmmyy10. enddate ddmmyy10. ;
cards;
1 A 01/02/2019
2 A 01/02/2019
3 B 01/02/2019
;
data want ;
if _n_=1 then do;
dcl hash H (dataset:'feb') ;
h.definekey ("id","classname") ;
h.definedata ("startdate", "enddate") ;
h.definedone () ;
end;
set jan;
if h.check()= 0 then do;
output;
h.find();
output;
end;
else do;
end_date=startdate;
output;
end;
format end_Date ddmmyy10.;
run;
In your sample data, it looks like both data sets are in sorted order by ID ClassName StartDate. If that's the case:
data want;
set january (in=from_jan) february;
by id className startdate;
if last.className and from_jan then enddate = startdate;
run;
data jan;
infile cards truncover;
input id className $ startdate :ddmmyy10. enddate :ddmmyy10. ;
format startdate ddmmyy10. enddate ddmmyy10. ;
cards;
1 A 01/01/2019
1 B 01/01/2019
2 A 01/01/2019
3 B 01/01/2019
;
data feb;
infile cards truncover;
input id className $ startdate :ddmmyy10. enddate :ddmmyy10. ;
format startdate ddmmyy10. enddate ddmmyy10. ;
cards;
1 A 01/02/2019
2 A 01/02/2019
3 B 01/02/2019
;
proc sql;
create table want as
select id,classname, startdate,ifn(count(*)=1,startdate,.) as end_date format ddmmyy10.
from
(select * from jan
union
select * from feb)
group by id,classname
order by id,classname, startdate;
quit;
data jan;
infile cards truncover;
input id className $ startdate :ddmmyy10. enddate :ddmmyy10. ;
format startdate ddmmyy10. enddate ddmmyy10. ;
cards;
1 A 01/01/2019
1 B 01/01/2019
2 A 01/01/2019
3 B 01/01/2019
;
data feb;
infile cards truncover;
input id className $ startdate :ddmmyy10. enddate :ddmmyy10. ;
format startdate ddmmyy10. enddate ddmmyy10. ;
cards;
1 A 01/02/2019
2 A 01/02/2019
3 B 01/02/2019
;
data want;
set jan feb;
by id classname;
if first.classname and last.classname then endDate=startdate;
run;
No, I am afraid I made a mistake in not thinking forward from future month appends taking into account B could occur twice . You chose the right answer.
@ErikLund_Jensen 's solution covers a grand ETL solution
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.