When an employee opens a dossier, a few tasks are created automatically. These tasks receive a sequence number (1 to 5) indicating in which order they need to be performed. Only one task can be active while the others are waiting. When a task is completed it becomes done, unless they cancel it. Each task has a start date, a close date and a deadline depending on each task.
I received an Excel file which contains task_handler (employee), object_name (title of the task), start_date, close_date, deadline, task_state and sequence.
object_name | deadline | start_date | close_date | task_handler | task_state | sequence |
Intake | 8/03/2016 | 1/03/2016 | 7/03/2016 | Handler A | DONE | 1 |
Intake | 8/03/2016 | 1/03/2016 | 7/03/2016 | Handler A | DONE | 1 |
Intake | 8/03/2016 | 1/03/2016 | 22/03/2016 | Handler B | DONE | 1 |
Intake | 8/03/2016 | 1/03/2016 | 4/03/2016 | Handler C | DONE | 1 |
Intake | 8/03/2016 | 1/03/2016 | 9/03/2016 | Handler D | DONE | 1 |
Intake | 8/03/2016 | 1/03/2016 | 22/03/2016 | Handler E | DONE | 1 |
Intake | 8/03/2016 | 1/03/2016 | Handler F | ACTIVE | 1 | |
Intake | 8/03/2016 | 1/03/2016 | Handler G | ACTIVE | 1 | |
Intake | 8/03/2016 | 1/03/2016 | Handler H | ACTIVE | 1 |
I have to make a table in which I can show how many "intakes" an employee has closed and how many "intakes" an employee still has opened. In addition, they also want me to display how many of these tasks were closed after the deadline expired. (Obviously this part requires me to focus on task_state ACTIVE and DONE.) I also have to add a final column that displays how many tasks have been CANCELLED.
Important note: Employees are able to adjust the sequence of tasks, which means that I cannot reply on sequence value 1. Instead I use "like '%intake%'". If the task is cancelled, the deadline, the start_date and the close_date are empty cells. If the task is still active, the close_date is an empty cell.
The result should look like this:
task_handler | task_state | amount_of_tasks | expired |
Handler A | ACTIVE | 4 | 0 |
Handler A | DONE | 5 | 1 |
Handler A | CANCELLED | 1 | 0 |
Handler B | ACTIVE | 3 | 0 |
Handler B | DONE | 9 | 5 |
Handler B | CANCELLED | 0 | 0 |
This is the query I created so far:
(Import Excel file into SAS library.)
PROC IMPORT OUT= ATROPOS.Tasks
DATAFILE= "random_location"
DBMS=XLS REPLACE;
GETNAMES=YES;
DATAROW=2;
GUESSINGROWS=696430;
RUN;
(Add a column called Expired, which contains close_date - deadline.)
PROC SQL;
CREATE TABLE Atropos.TasksEXP AS
SELECT *, (close_date - deadline) AS Expired
FROM ATROPOS.Tasks;
QUIT;
(Creating the table and adjusting Expired values. (Expired > 0 = Yes // Expired <= 0 = No))
PROC SQL;
CREATE TABLE ATROPOS.Startup AS
SELECT task_handler, task_state, COUNT(*) AS amount_of_tasks, Expired
FROM ATROPOS.Tasks
WHERE task_state IN ('ACTIVE','DONE','CANCELLED')
AND object_name LIKE '%Intake%'
AND IF (Expired > 0) THEN (Expired = 'Yes') ELSE (Expired = 'No')
GROUP BY task_handler, task_state;
QUIT;
I am having issues getting the IF THEN ELSE statement to work. IF remains unrecognized in a black font, whereas THEN and ELSE are recognized in a blue font. I have tried different notations, even without using brackets, but nothing appears to be working. Secondly, I find it difficult to adjust the values in Expired, even if I were to use a data or update statement. Any ideas? I am guessing I am using a very devious method or was I on the right track here?
Thanks in advance.
proc sort data = have;
by handler;
run;
data want;
set have;
by handler;
retain
active
done
cancelled
expired
others
;
if first.handler
then do;
active = 0;
done = 0;
cancelled = 0;
expired = 0;
others = 0;
end;
select task_state;
when ('DONE') do;
done + 1;
if close_date > deadline then expired + 1;
end;
when ('CANCELLED') cancelled + 1;
when ('ACTIVE') active + 1;
otherwise others + 1;
end;
if last.handler then output;
keep handler active done cancelled expired others;
run;
You might have to use transpose to convert from wide to long format, if that is necessary. Or you could create a multiple output in the last.handler block.
I just noticed a huge problem too.
For every close_date cell that has no value, the Expired cell will be 0 - X (where X is the deadline). This way they will always result in a negative value and thus they will always appear as not expired although they could still be expired.
One important note: the original Excel file also contains r_creation_date, which is the creation date of the task. As for the "intake" task, the deadline is always 7 days after the creation date. So maybe another approach is required?
For example:
If creation_date = 01/03/2016 then deadline = 08/03/2016.
proc sort data = have;
by handler;
run;
data want;
set have;
by handler;
retain
active
done
cancelled
expired
others
;
if first.handler
then do;
active = 0;
done = 0;
cancelled = 0;
expired = 0;
others = 0;
end;
select task_state;
when ('DONE') do;
done + 1;
if close_date > deadline then expired + 1;
end;
when ('CANCELLED') cancelled + 1;
when ('ACTIVE') active + 1;
otherwise others + 1;
end;
if last.handler then output;
keep handler active done cancelled expired others;
run;
You might have to use transpose to convert from wide to long format, if that is necessary. Or you could create a multiple output in the last.handler block.
Holy, that is a different way of thinking then I am used to.
One question though:
What happens if the query processes a task that hasn't been closed yet and thus has no value in close_date? If you look at the first table I showed, you will see that the last three handlers have no close_date because the task is still active. However, in the deadline column you will see 08/03/2016 as deadline, so we know they are expired. (The handler didn't respect the deadline and even though the task is still active, the deadline is expired.)
EDIT: Typo's.
Just expand the ACTIVE part:
select task_state;
when ('DONE') do;
done + 1;
if close_date > deadline then expired + 1;
end;
when ('ACTIVE') do;
active + 1;
if date() > deadline then expired + 1;
end;
Instead of using the date() function for today's date, you could define a "current date" into a macro variable before the data step and use that.
If you look at the OP, you could say I use standard SQL in SAS or at least try to. What is the method you posted called? Is that the process code from SAS itself?
This is the data step language, the "mother tongue" of SAS, so to say. As I'm coming from procedural programming (BASIC, PASCAL, C) and also have some experience with DBASE programming, this suits me more than SQL.
I'm quite sure that one of the SQL experts here could provide a solution with SQL.
The data step can do some things rather easily (and make it easy to read) that take complicated thinking in SQL. PROC SQL can also be very resource-consuming in certain scenarios involving joins of large datasets.
Hi,
There is no such thing as an IF THEN construct in SQL, you would use a CASE WHEN construct in SQL. I think you may be over complicating it. Firstly, check your data coming in from that proc import, I can guarentee you that using proc import (which is guessing procedure) combined with Excel (which is a very poor choice for data transfer) will give you issues. Are your date fields acutally dates, are they called that, so many things can go wrong.
Assuming you data is accurate, then something like (untested - post test data in form of datastep if you would like tested code):
proc sql; create table WANT as select A.TASK_HANDLER, A.TASK_STATE, count(*) as AMOUNT_OF_TASKS, (select count(*) from ATROPOS.TASKS where TASK_HANDLER=A.TASK_HANDLER and TAST_STATE=A.TASK_STATE and CLOSE_DATE - DEADLINE > 0) as EXPIRED from ATROPOS.TASKS group by TASK_HANDLER, TASK_STATE; quit;
I currently have no access to the Atropos database, so the only way is to request a specific unload in the form of an Excel file. Since the Atropos database is managed by another team, chances are very real that I will never get access (not even read access). But that's something I have been looking out for as well. When you use a PROC IMPORT, how can you properly define the format of a column's value? I have been reading up on the FORMAT = DATE8. and stuff, but I hardly ever get it to work. I figured out already that this command only works in SELECT during PROC SQL, but I am having difficulties to get this to work in a datastep.
For example
PROC IMPORT
OUT = TEST.DATE
DATAFILE = "C:\temp\test.xls"
DBMS = XLS REPLACE;
GETNAMES = YES;
DATAROW = 2;
FORMAT deadline = DATE8. start_date = DATE8. close_date = DATE8. r_creation_date = DATE8.;
RUN;
This doesn't work for me. Even if I try using '...', "...", DDMMYYYY8. or DDMMYYYY10., it simply does not accept the FORMAT. Or do I have to define the variables even though I said to GETNAMES? Or does that only work in a datastep?
Yes, you can't do that. As I said, proc import is a guessing procedure - it looks at your file and tries to guess the best strcuture for the data based on some rows of data. This is the principal reason why proc import would not be my choice of process. At worst I would take the Excel file if that is all you can get, and save to CSV, then write a datastep import program where you can specify what each data element is, set formats informats etc. E.g.
data imported; infile "c:\your_csv.csv" dlm=","; length var1 $200 var_2 8 var_3 $50; format var_1 $200 var_2 3. var_3 $10.; informat ...; input var_1 $ var_2 var_3 $; run;
FORMAT works in a data step, but PROC IMPORT is a helper procedure that looks at the data, makes more or less educated guesses about the strcuture and then creates and runs a data step from that. Only 4 statements are valid within PROC IMPORT: DLM=, DATAROW=, GUESSINGROWS=, GETNAMES=.
To force formats, one can take the data step that IMPORT creates from the log and modify that.
I managed to get rid of some errors, but there are some I simply cannot get rid of.
The code:
PROC SORT IN=ATROPOS.Tasks OUT=ATROPOS.Tasks;
BY task_handler;
RUN;
DATA ATROPOS.Intake;
SET ATROPOS.Tasks;
BY task_handler;
RETAIN
Active
Done
Cancelled
Expired
Others;
IF first.task_handler
THEN DO;
Active=0;
Done=0;
Cancelled=0;
Expired=0;
Others=0;
END;
SELECT (task_state);
WHEN ('DONE') DO;
Done+1;
IF close_date > deadline THEN Expired+1;
END;
WHEN ('CANCELLED') Cancelled+1;
END;
WHEN ('ACTIVE') DO;
Active+1;
IF date() > deadline THEN Expired+1;
END;
OTHERWISE Others+1;
END;
IF last.task_handler THEN OUTPUT;
KEEP task_handler Active Done Cancelled Expired Others;
RUN;
The log:
522 PROC SORT IN=ATROPOS.Tasks OUT=ATROPOS.Tasks;
523 BY task_handler;
524 RUN;
NOTE: Input data set is already sorted; it has been copied to the output data set.
NOTE: There were 15180 observations read from the data set ATROPOS.TASKS.
NOTE: The data set ATROPOS.TASKS has 15180 observations and 10 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
525
526 DATA ATROPOS.Intake;
527 SET ATROPOS.Tasks;
528 BY task_handler;
529 RETAIN
530 Active
531 Done
532 Cancelled
533 Expired
534 Others;
535
536 IF first.task_handler
537 THEN DO;
538 Active=0;
539 Done=0;
540 Cancelled=0;
541 Expired=0;
542 Others=0;
543 END;
544
545 SELECT (task_state);
546 WHEN ('DONE') DO;
547 Done+1;
548 IF close_date > deadline THEN Expired+1;
549 END;
550
551 WHEN ('CANCELLED') Cancelled+1;
552 END;
553
554 WHEN ('ACTIVE') DO;
---- -
161 161
ERROR 161-185: No matching DO/SELECT statement.
555 Active+1;
556 IF date() > deadline THEN Expired+1;
557 END;
558
559 OTHERWISE Others+1;
---------
161
560 END;
---
161
ERROR 161-185: No matching DO/SELECT statement.
561
562 IF last.task_handler THEN OUTPUT;
563 KEEP task_handler Active Done Cancelled Expired Others;
564 RUN;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set ATROPOS.INTAKE may be incomplete. When this step was stopped there were 0
observations and 6 variables.
WARNING: Data set ATROPOS.INTAKE was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.06 seconds
cpu time 0.06 seconds
Both "No matching DO/SELECT statement" errors are weird to me. I cross checked with the first "WHEN ('DONE') DO;" statement and the ones that come after it have the exact same syntax. So why are the last two reporting errors and the first isn't? Do I have to repeat the "SELECT (task_state);" for every "WHEN X DO;" statement?
I forgot the date macro, but I'll work on that now. I am not certain where I have to define that exactly. In between two datasteps in a new and indipendent datastep?
DATA ATROPOS.Intake;
SET ATROPOS.Tasks;
BY task_handler;
RETAIN
Active
Done
Cancelled
Expired
Others;
IF first.task_handler
THEN DO;
Active=0;
Done=0;
Cancelled=0;
Expired=0;
Others=0;
END;
SELECT (task_state);
WHEN ('DONE') DO;
Done+1;
IF close_date > deadline THEN Expired+1;
END;
WHEN ('CANCELLED') Cancelled+1;
*END; * this END has no preceding DO!;
WHEN ('ACTIVE') DO;
Active+1;
IF date() > deadline THEN Expired+1;
END;
OTHERWISE Others+1;
END; * put this END in column 1, as it closes the SELECT;
IF last.task_handler THEN OUTPUT;
KEEP task_handler Active Done Cancelled Expired Others;
RUN;
See my inline comments.
This part:
550
551 WHEN ('CANCELLED') Cancelled+1;
552 END;
Ends the select statement as the when has no do, thus the end; ends the select. The other errors follow on from this problem, i.e. fix this, and the other should go away. I.e. remove this END:
As another point, its not a good idea to code all in caps, or mixed case, and consistent indenting will highlight the incorrect part in a fair few examples:
proc sort in=atropos.tasks out=atropos.tasks; by task_handler; run; data atropos.intake; set atropos.tasks; by task_handler; retain active done cancelled expired others; if first.task_handler then do; active=0; done=0; cancelled=0; expired=0; others=0; end; select (task_state); when ('DONE') do; done+1; if close_date > deadline then expired+1; end; when ('CANCELLED') cancelled+1; when ('ACTIVE') do; active+1; if date() > deadline then expired+1; end; otherwise others+1; end; if last.task_handler then output; keep task_handler active done cancelled expired others; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.