BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
OLUGBOJI
Obsidian | Level 7

I have a dataset that is submitted each month. It consists of a list of teachers and the classes they teach. If a teacher shows up in one class list for this month and they don't show up in the class for next month, the endDate of the column for the newly merged dataset will be updated. How do I  update the endDate column of the merged dataset. In my example , the teacher with id 1 did not show up for class B in februrary so the end date is January. 

 

Dataset for January

 

id className startdate         enddate
1   A               01/01/2019
1   B               01/01/2019
2   A               01/01/2019
3   B               01/01/2019

 

 

Dataset for February
id    className   startdate    enddate
1       A               01/02/2019
2       A                01/02/2019
3       B                01/02/2019

 

 

Dataset for march
id schoolName startdate enddate
1   A             01/03/2019

1   B               01/03/2019
3   B            01/03/2019

 

 

 

Data Want

 

id    className   startdate    enddate

1    A                 01/01/2019
1    B                01/01/2019     01/01/2019
1    A                01/02/2019
1    A                01/03/2019

1   B               01/03/2019
2    A                01/01/2019
2    A                01/02/2019     01/02/2019
3    B                01/01/2019
3    B                01/02/2019
3    B                01/03/2019

 

1 ACCEPTED SOLUTION

Accepted Solutions
ErikLund_Jensen
Rhodochrosite | Level 12

Hi OLUGBOJI

 

One way:


data january;
informat id 1. class $1. startdate ddmmyy10.;
format startdate ddmmyyd10.;
input id class startdate;
datalines;
1   A               01/01/2019
1   B               01/01/2019
2   A               01/01/2019
3   B               01/01/2019
;
run;

data february;
informat id 1. class $1. startdate ddmmyy10.;
format startdate ddmmyyd10.;
input id class startdate;
datalines;
1       A           01/02/2019
2       A           01/02/2019
3       B           01/02/2019
;
run;
 
data all; set january february;
run;

* max enddate pr. class;
proc sql;
create table classmax as
	select distinct class, max(startdate) as classenddate format=ddmmyyd10.
	from all
	group by class;
quit;

* max enddate pr. teacher/class;
proc sql;
create table teacherclassmax as
	select distinct id, class, max(startdate) as tcenddate format=ddmmyyd10.
	from all
	group by id, class;
quit;
 
* join max. enddate pr. class on all classes;
proc sql;
	create table all_classend as
		select all.id, all.class, all.startdate, classmax.classenddate
        from all left join classmax
		on all.class = classmax.class;
quit;

* join teacher/class-enddate, if not = class-enddate;
proc sql;
	create table want as
		select all_classend.id, all_classend.class, all_classend.startdate, teacherclassmax.tcenddate
        from all_classend left join teacherclassmax
		on 
			all_classend.id = teacherclassmax.id and
			all_classend.class = teacherclassmax.class and
			all_classend.classenddate ne teacherclassmax.tcenddate
		order by id, class, startdate;
quit;

result:

 

wantteacherclass.gif

View solution in original post

18 REPLIES 18
tomrvincent
Rhodochrosite | Level 12
left join jan to feb by id and class. Then enddate in 'want' would be something like 'case when feb.endate is missing then jan.startdate end'
ErikLund_Jensen
Rhodochrosite | Level 12

Hi OLUGBOJI

 

One way:


data january;
informat id 1. class $1. startdate ddmmyy10.;
format startdate ddmmyyd10.;
input id class startdate;
datalines;
1   A               01/01/2019
1   B               01/01/2019
2   A               01/01/2019
3   B               01/01/2019
;
run;

data february;
informat id 1. class $1. startdate ddmmyy10.;
format startdate ddmmyyd10.;
input id class startdate;
datalines;
1       A           01/02/2019
2       A           01/02/2019
3       B           01/02/2019
;
run;
 
data all; set january february;
run;

* max enddate pr. class;
proc sql;
create table classmax as
	select distinct class, max(startdate) as classenddate format=ddmmyyd10.
	from all
	group by class;
quit;

* max enddate pr. teacher/class;
proc sql;
create table teacherclassmax as
	select distinct id, class, max(startdate) as tcenddate format=ddmmyyd10.
	from all
	group by id, class;
quit;
 
* join max. enddate pr. class on all classes;
proc sql;
	create table all_classend as
		select all.id, all.class, all.startdate, classmax.classenddate
        from all left join classmax
		on all.class = classmax.class;
quit;

* join teacher/class-enddate, if not = class-enddate;
proc sql;
	create table want as
		select all_classend.id, all_classend.class, all_classend.startdate, teacherclassmax.tcenddate
        from all_classend left join teacherclassmax
		on 
			all_classend.id = teacherclassmax.id and
			all_classend.class = teacherclassmax.class and
			all_classend.classenddate ne teacherclassmax.tcenddate
		order by id, class, startdate;
quit;

result:

 

wantteacherclass.gif

OLUGBOJI
Obsidian | Level 7
I tried adding a march dataset and it didn't work for the following scenarios:
1. Teacher with id 1 starts taking class b by march
2.March dataset merged with 'January' and 'february'
ErikLund_Jensen
Rhodochrosite | Level 12

Hi OLUGBOJI

 

My solution does not work if a teacher starts taking a class again, because the teacher is still active, so max(startdate) will be march. But what should happen? - It often happens to me that I give a solution to a specific SAS coding problem without solving the user's real problem, because I haven't got the overall picture. 

 

You might get better help if you gave a short explanation of your intentions with the program. Marking the months where a teacher is not present in a given class, allocating same teacher to new classes if a course is finished, or??

OLUGBOJI
Obsidian | Level 7
I'm sorry that I didn't make myself clear. Teachers might be making claims for classes they only taught in the past and thereby claim bonus payments. Bonus payment goes out every quarter. The intentions of the program is to verify the output against claims made by the teachers.
ErikLund_Jensen
Rhodochrosite | Level 12

@OLUGBOJI 

 

Thank you.

If your purpose is to get an overview to verify claims, and you don't need the data for further processing, I think a report  is all you need, Proc Tabulate can do it direct from your input data without further data processing except for adding q quarter variable to use as class variable in Proc Tabulate.

 

I added an extra month to data from the previous examples to have data for more than one quarter. The tables have two sets of columns: Quarters with counts pr. quarter and Months as documentation for the counts:

 


data apr;
infile cards truncover;
input id    className $  startdate  :ddmmyy10.     enddate  :ddmmyy10. ;
format startdate  ddmmyy10.       enddate ddmmyy10. ;
cards;
1   A             01/04/2019
1   B               01/04/2019
4   B            01/04/2019
;

data all; set jan feb mar apr;
	Quarter = put(startdate,yyqd.);
run;

proc tabulate data=all;
	class id classname Quarter startdate;
	format startdate yymmd8.;
	label startdate = 'Month';
	keylabel N=' ' all=' ';
	table id*classname,Quarter all *startdate /box = 'Teacher by class' rts=20;
	table id,Quarter all *startdate  /box = 'Teacher total' rts=20;;
run;

Proc tabulate writes 2 tables:

 

teachers.gif

 

ErikLund_Jensen
Rhodochrosite | Level 12

Sorry - The tables were not made from april data as shown,because ID 4 had start in March. This is made from correct input: 

 

teachers.gif

novinosrin
Tourmaline | Level 20
data jan;
infile cards truncover;
input id className $ startdate  :ddmmyy10.       enddate :ddmmyy10. ;
format startdate  ddmmyy10.       enddate ddmmyy10. ;
cards;
1   A               01/01/2019
1   B               01/01/2019
2   A               01/01/2019
3   B               01/01/2019
;

data feb;
infile cards truncover;
input id    className $  startdate  :ddmmyy10.     enddate  :ddmmyy10. ;
format startdate  ddmmyy10.       enddate ddmmyy10. ;
cards;
1       A               01/02/2019
2       A                01/02/2019
3       B                01/02/2019
;

data want ;
if _n_=1 then do;
   dcl hash H (dataset:'feb') ;
   h.definekey  ("id","classname") ;
   h.definedata ("startdate", "enddate") ;
   h.definedone () ;
end;
set jan;
if h.check()= 0 then do;
output;
h.find();
output;
end;
else do;
end_date=startdate;
output;
end;
format end_Date ddmmyy10.;
run;
Astounding
PROC Star

In your sample data, it looks like both data sets are in sorted order by ID ClassName StartDate.  If that's the case:

 

data want;
set january (in=from_jan) february;
by id className startdate;
if last.className and from_jan then enddate = startdate;
run;
novinosrin
Tourmaline | Level 20

data jan;
infile cards truncover;
input id className $ startdate  :ddmmyy10.       enddate :ddmmyy10. ;
format startdate  ddmmyy10.       enddate ddmmyy10. ;
cards;
1   A               01/01/2019
1   B               01/01/2019
2   A               01/01/2019
3   B               01/01/2019
;

data feb;
infile cards truncover;
input id    className $  startdate  :ddmmyy10.     enddate  :ddmmyy10. ;
format startdate  ddmmyy10.       enddate ddmmyy10. ;
cards;
1       A               01/02/2019
2       A                01/02/2019
3       B                01/02/2019
;

proc sql;
create table want as
select id,classname, startdate,ifn(count(*)=1,startdate,.) as end_date format ddmmyy10.
from
(select * from jan
union 
select * from feb)
group by id,classname
order by id,classname, startdate;
quit;
novinosrin
Tourmaline | Level 20

data jan;
infile cards truncover;
input id className $ startdate  :ddmmyy10.       enddate :ddmmyy10. ;
format startdate  ddmmyy10.       enddate ddmmyy10. ;
cards;
1   A               01/01/2019
1   B               01/01/2019
2   A               01/01/2019
3   B               01/01/2019
;

data feb;
infile cards truncover;
input id    className $  startdate  :ddmmyy10.     enddate  :ddmmyy10. ;
format startdate  ddmmyy10.       enddate ddmmyy10. ;
cards;
1       A               01/02/2019
2       A                01/02/2019
3       B                01/02/2019
;
data want;
set jan feb;
by id classname;
if first.classname and last.classname then endDate=startdate;
run;
OLUGBOJI
Obsidian | Level 7
Thanks. This appears to be more efficient
novinosrin
Tourmaline | Level 20

No, I am afraid I made a mistake in not thinking forward from future month appends taking into account B could occur twice . You chose the right answer. 

 

@ErikLund_Jensen  's solution covers a grand ETL solution

OLUGBOJI
Obsidian | Level 7
I can see that this 'ifn(count(*)=1,startdate,.)' will not work if the teacher takes up the class again in future. Thanks @ErikLund_Jensen

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 1228 views
  • 5 likes
  • 5 in conversation