BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Yves_Boonen
Quartz | Level 8

When an employee opens a dossier, a few tasks are created automatically. These tasks receive a sequence number (1 to 5) indicating in which order they need to be performed. Only one task can be active while the others are waiting. When a task is completed it becomes done, unless they cancel it. Each task has a start date, a close date and a deadline depending on each task.

 

I received an Excel file which contains task_handler (employee), object_name (title of the task), start_date, close_date, deadline, task_state and sequence.

 

object_namedeadlinestart_dateclose_datetask_handlertask_statesequence
Intake8/03/20161/03/20167/03/2016Handler ADONE1
Intake8/03/20161/03/20167/03/2016Handler ADONE1
Intake8/03/20161/03/201622/03/2016Handler BDONE1
Intake8/03/20161/03/20164/03/2016Handler CDONE1
Intake8/03/20161/03/20169/03/2016Handler DDONE1
Intake8/03/20161/03/201622/03/2016Handler EDONE1
Intake8/03/20161/03/2016 Handler FACTIVE1
Intake8/03/20161/03/2016 Handler GACTIVE1
Intake8/03/20161/03/2016 Handler HACTIVE1

 

I have to make a table in which I can show how many "intakes" an employee has closed and how many "intakes" an employee still has opened. In addition, they also want me to display how many of these tasks were closed after the deadline expired. (Obviously this part requires me to focus on task_state ACTIVE and DONE.) I also have to add a final column that displays how many tasks have been CANCELLED.

 

Important note: Employees are able to adjust the sequence of tasks, which means that I cannot reply on sequence value 1. Instead I use "like '%intake%'". If the task is cancelled, the deadline, the start_date and the close_date are empty cells. If the task is still active, the close_date is an empty cell.

 

The result should look like this:

 

task_handlertask_stateamount_of_tasksexpired
Handler AACTIVE40
Handler ADONE51
Handler ACANCELLED10
Handler BACTIVE30
Handler BDONE95
Handler BCANCELLED00

 

 

This is the query I created so far:

 

(Import Excel file into SAS library.)

PROC IMPORT OUT= ATROPOS.Tasks

DATAFILE= "random_location"

DBMS=XLS REPLACE;

GETNAMES=YES;

DATAROW=2;

GUESSINGROWS=696430;

RUN;

 

(Add a column called Expired, which contains close_date - deadline.)

PROC SQL;

CREATE TABLE Atropos.TasksEXP AS

SELECT *, (close_date - deadline) AS Expired

FROM ATROPOS.Tasks;

QUIT;

 

(Creating the table and adjusting Expired values. (Expired > 0 = Yes // Expired <= 0 = No))

PROC SQL;

CREATE TABLE ATROPOS.Startup AS

SELECT task_handler, task_state, COUNT(*) AS amount_of_tasks, Expired

FROM ATROPOS.Tasks

WHERE task_state IN ('ACTIVE','DONE','CANCELLED')

AND object_name LIKE '%Intake%'

AND IF (Expired > 0) THEN (Expired = 'Yes') ELSE (Expired = 'No')

GROUP BY task_handler, task_state;

QUIT;

 

I am having issues getting the IF THEN ELSE statement to work. IF remains unrecognized in a black font, whereas THEN and ELSE are recognized in a blue font. I have tried different notations, even without using brackets, but nothing appears to be working. Secondly, I find it difficult to adjust the values in Expired, even if I were to use a data or update statement. Any ideas? I am guessing I am using a very devious method or was I on the right track here?

 

Thanks in advance.

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User
proc sort data = have;
by handler;
run;

data want;
set have;
by handler;
retain
  active
  done
  cancelled
  expired
  others
;
if first.handler
then do;
  active = 0;
  done = 0;
  cancelled = 0;
  expired = 0;
  others = 0;
end;
select task_state;
  when ('DONE') do;
    done + 1;
    if close_date > deadline then expired + 1;
  end;
  when ('CANCELLED') cancelled + 1;
  when ('ACTIVE') active + 1;
  otherwise others + 1;
end;
if last.handler then output;
keep handler active done cancelled expired others;
run;

You might have to use transpose to convert from wide to long format, if that is necessary. Or you could create a multiple output in the last.handler block.

View solution in original post

16 REPLIES 16
Yves_Boonen
Quartz | Level 8

I just noticed a huge problem too.

 

For every close_date cell that has no value, the Expired cell will be 0 - X (where X is the deadline). This way they will always result in a negative value and thus they will always appear as not expired although they could still be expired.

 

One important note: the original Excel file also contains r_creation_date, which is the creation date of the task. As for the "intake" task, the deadline is always 7 days after the creation date. So maybe another approach is required?

 

For example:

 

If creation_date = 01/03/2016 then deadline = 08/03/2016.

Kurt_Bremser
Super User
proc sort data = have;
by handler;
run;

data want;
set have;
by handler;
retain
  active
  done
  cancelled
  expired
  others
;
if first.handler
then do;
  active = 0;
  done = 0;
  cancelled = 0;
  expired = 0;
  others = 0;
end;
select task_state;
  when ('DONE') do;
    done + 1;
    if close_date > deadline then expired + 1;
  end;
  when ('CANCELLED') cancelled + 1;
  when ('ACTIVE') active + 1;
  otherwise others + 1;
end;
if last.handler then output;
keep handler active done cancelled expired others;
run;

You might have to use transpose to convert from wide to long format, if that is necessary. Or you could create a multiple output in the last.handler block.

Yves_Boonen
Quartz | Level 8

Holy, that is a different way of thinking then I am used to.

 

One question though:

 

What happens if the query processes a task that hasn't been closed yet and thus has no value in close_date? If you look at the first table I showed, you will see that the last three handlers have no close_date because the task is still active. However, in the deadline column you will see 08/03/2016 as deadline, so we know they are expired. (The handler didn't respect the deadline and even though the task is still active, the deadline is expired.)

 

EDIT: Typo's.

Kurt_Bremser
Super User

Just expand the ACTIVE part:

select task_state;
  when ('DONE') do;
    done + 1;
    if close_date > deadline then expired + 1;
  end;
  when ('ACTIVE') do;
    active + 1;
    if date() > deadline then expired + 1;
  end;

Instead of using the date() function for today's date, you could define a "current date" into a macro variable before the data step and use that.

Yves_Boonen
Quartz | Level 8

If you look at the OP, you could say I use standard SQL in SAS or at least try to. What is the method you posted called? Is that the process code from SAS itself?

Kurt_Bremser
Super User

This is the data step language, the "mother tongue" of SAS, so to say. As I'm coming from procedural programming (BASIC, PASCAL, C) and also have some experience with DBASE programming, this suits me more than SQL.

I'm quite sure that one of the SQL experts here could provide a solution with SQL.

 

The data step can do some things rather easily (and make it easy to read) that take complicated thinking in SQL. PROC SQL can also be very resource-consuming in certain scenarios involving joins of large datasets.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

There is no such thing as an IF THEN construct in SQL, you would use a CASE WHEN construct in SQL.  I think you may be over complicating it.  Firstly, check your data coming in from that proc import, I can guarentee you that using proc import (which is guessing procedure) combined with Excel (which is a very poor choice for data transfer) will give you issues.  Are your date fields acutally dates, are they called that, so many things can go wrong.  

 

Assuming you data is accurate, then something like (untested - post test data in form of datastep if you would like tested code):

proc sql;
  create table WANT as
  select  A.TASK_HANDLER,  
          A.TASK_STATE,
          count(*) as AMOUNT_OF_TASKS,
          (select count(*) from ATROPOS.TASKS 
           where TASK_HANDLER=A.TASK_HANDLER and TAST_STATE=A.TASK_STATE and CLOSE_DATE - DEADLINE > 0) as EXPIRED
  from    ATROPOS.TASKS
  group by TASK_HANDLER,
           TASK_STATE;
quit;

 

Yves_Boonen
Quartz | Level 8

I currently have no access to the Atropos database, so the only way is to request a specific unload in the form of an Excel file. Since the Atropos database is managed by another team, chances are very real that I will never get access (not even read access). But that's something I have been looking out for as well. When you use a PROC IMPORT, how can you properly define the format of a column's value? I have been reading up on the FORMAT = DATE8. and stuff, but I hardly ever get it to work. I figured out already that this command only works in SELECT during PROC SQL, but I am having difficulties to get this to work in a datastep.

 

For example

 

PROC IMPORT

OUT = TEST.DATE

DATAFILE = "C:\temp\test.xls"

DBMS = XLS REPLACE;

GETNAMES = YES;

DATAROW = 2;

FORMAT deadline = DATE8. start_date = DATE8. close_date = DATE8. r_creation_date = DATE8.;

RUN;

 

This doesn't work for me. Even if I try using '...', "...", DDMMYYYY8. or DDMMYYYY10., it simply does not accept the FORMAT. Or do I have to define the variables even though I said to GETNAMES? Or does that only work in a datastep?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, you can't do that.  As I said, proc import is a guessing procedure - it looks at your file and tries to guess the best strcuture for the data based on some rows of data.  This is the principal reason why proc import would not be my choice of process.  At worst I would take the Excel file if that is all you can get, and save to CSV, then write a datastep import program where you can specify what each data element is, set formats informats etc.  E.g.

data imported;
  infile "c:\your_csv.csv" dlm=",";
  length var1 $200 var_2 8 var_3 $50;
  format var_1 $200 var_2 3. var_3 $10.;
  informat ...;
  input var_1 $ var_2 var_3 $;
run;
Kurt_Bremser
Super User

FORMAT works in a data step, but PROC IMPORT is a helper procedure that looks at the data, makes more or less educated guesses about the strcuture and then creates and runs a data step from that. Only 4 statements are valid within PROC IMPORT: DLM=, DATAROW=, GUESSINGROWS=, GETNAMES=.

To force formats, one can take the data step that IMPORT creates from the log and modify that.

Yves_Boonen
Quartz | Level 8

I managed to get rid of some errors, but there are some I simply cannot get rid of.

 

The code:

PROC SORT IN=ATROPOS.Tasks OUT=ATROPOS.Tasks;
BY task_handler;
RUN;

DATA ATROPOS.Intake;
SET ATROPOS.Tasks;
BY task_handler;
RETAIN
  Active
  Done
  Cancelled
  Expired
  Others;

IF first.task_handler
THEN DO;
  Active=0;
  Done=0;
  Cancelled=0;
  Expired=0;
  Others=0;
END;

SELECT (task_state);
  WHEN ('DONE') DO;
    Done+1;
    IF close_date > deadline THEN Expired+1;
  END;

  WHEN ('CANCELLED') Cancelled+1;
  END;

  WHEN ('ACTIVE') DO;
    Active+1;
    IF date() > deadline THEN Expired+1;
  END;

  OTHERWISE Others+1;
  END;

IF last.task_handler THEN OUTPUT;
KEEP task_handler Active Done Cancelled Expired Others;
RUN;

 

 

The log:

522  PROC SORT IN=ATROPOS.Tasks OUT=ATROPOS.Tasks;
523  BY task_handler;
524  RUN;

NOTE: Input data set is already sorted; it has been copied to the output data set.
NOTE: There were 15180 observations read from the data set ATROPOS.TASKS.
NOTE: The data set ATROPOS.TASKS has 15180 observations and 10 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


525
526  DATA ATROPOS.Intake;
527  SET ATROPOS.Tasks;
528  BY task_handler;
529  RETAIN
530    Active
531    Done
532    Cancelled
533    Expired
534    Others;
535
536  IF first.task_handler
537  THEN DO;
538    Active=0;
539    Done=0;
540    Cancelled=0;
541    Expired=0;
542    Others=0;
543  END;
544
545  SELECT (task_state);
546    WHEN ('DONE') DO;
547      Done+1;
548      IF close_date > deadline THEN Expired+1;
549    END;
550
551    WHEN ('CANCELLED') Cancelled+1;
552    END;
553
554    WHEN ('ACTIVE') DO;
       ----          -
       161           161
ERROR 161-185: No matching DO/SELECT statement.

555      Active+1;
556      IF date() > deadline THEN Expired+1;
557    END;
558
559    OTHERWISE Others+1;
       ---------
       161
560    END;
       ---
       161
ERROR 161-185: No matching DO/SELECT statement.

561
562  IF last.task_handler THEN OUTPUT;
563  KEEP task_handler Active Done Cancelled Expired Others;
564  RUN;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set ATROPOS.INTAKE may be incomplete.  When this step was stopped there were 0
         observations and 6 variables.
WARNING: Data set ATROPOS.INTAKE was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.06 seconds
      cpu time            0.06 seconds


 

 

Both "No matching DO/SELECT statement" errors are weird to me. I cross checked with the first "WHEN ('DONE') DO;" statement and the ones that come after it have the exact same syntax. So why are the last two reporting errors and the first isn't? Do I have to repeat the "SELECT (task_state);" for every "WHEN X DO;" statement?

Yves_Boonen
Quartz | Level 8

I forgot the date macro, but I'll work on that now. I am not certain where I have to define that exactly. In between two datasteps in a new and indipendent datastep?

Kurt_Bremser
Super User
DATA ATROPOS.Intake;
SET ATROPOS.Tasks;
BY task_handler;
RETAIN
  Active
  Done
  Cancelled
  Expired
  Others;

IF first.task_handler
THEN DO;
  Active=0;
  Done=0;
  Cancelled=0;
  Expired=0;
  Others=0;
END;

SELECT (task_state);
  WHEN ('DONE') DO;
    Done+1;
    IF close_date > deadline THEN Expired+1;
  END;

  WHEN ('CANCELLED') Cancelled+1;
  *END; * this END has no preceding DO!;

  WHEN ('ACTIVE') DO;
    Active+1;
    IF date() > deadline THEN Expired+1;
  END;

  OTHERWISE Others+1;
END; * put this END in column 1, as it closes the SELECT;

IF last.task_handler THEN OUTPUT;
KEEP task_handler Active Done Cancelled Expired Others;
RUN;

See my inline comments.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

This part:

550
551    WHEN ('CANCELLED') Cancelled+1;
552    END;

Ends the select statement as the when has no do, thus the end; ends the select.  The other errors follow on from this problem, i.e. fix this, and the other should go away.  I.e. remove this END:

 

As another point, its not a good idea to code all in caps, or mixed case, and consistent indenting will highlight the incorrect part in a fair few examples:

proc sort in=atropos.tasks out=atropos.tasks;
  by task_handler;
run;

data atropos.intake;
  set atropos.tasks;
  by task_handler;
  retain active done cancelled expired others;
  if first.task_handler then do;
    active=0;
    done=0;
    cancelled=0;
    expired=0;
    others=0;
  end;
  select (task_state);
    when ('DONE') do;
      done+1;
      if close_date > deadline then expired+1;
    end;
    when ('CANCELLED') cancelled+1;
    when ('ACTIVE') do;
      active+1;
      if date() > deadline then expired+1;
    end;
    otherwise others+1;
  end;
  if last.task_handler then output;
  keep task_handler active done cancelled expired others;
run;

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 2869 views
  • 9 likes
  • 4 in conversation