Help using Base SAS procedures

Count in PROC SQL.

Accepted Solution Solved
Reply
Contributor
Posts: 62
Accepted Solution

Count in PROC SQL.

I just started working as an analyst within a bank. Since they work in databases on different platforms and since the leading analyst has a huge workload, I am tasked with exploring SAS and its possibilities.

 

One of these databases is a java webbased database build in a program called QIT. This database contains every dossier that has been opened for our customers. This database contains information such as:

 

- kind_of_document   (AVE, DOK, FAK, TRIGGER,...)

- document_handler   (Name of the employee of the bank who is handling this dossier.)

- r_creation_date   (The date on which the dossier has been created.)

- document_state   (CANCELED, HOLD, TODO, TREATED)

 

The exercise is to create a table in SAS that checks all AVE related dossiers with a document_state of TODO or HOLD, sorted by document_handler. (See results below.) Since I have no access to the QIT database yet, they have given me an Excel file containing said information for the month February. At the end of my exercise, I can easily double-check my results through a similar query in QIT. I noticed that my SAS query yields 45 rows as a result, whereas my QIT query yields 67 rows as a result. I have to be doing something wrong.

 

 

QIT query:

select document_handler, document_state, count (*) as amount_of_documents, min (r_creation_Date) as oldest_document

from atp_document

where document_state in ('TODO', 'HOLD')

and kind_of_document like '%AVE%'

and r_creation_date < date ('01/03/2016')

and r_creation_date > date ('31/01/2016')

group by document_handler, document_state

 

 

SAS query:

PROC SQL;

CREATE TABLE Atropos.Open_Doc AS

SELECT document_handler, document_state, COUNT(*) AS number_of_documents, MIN(r_creation_date) AS oldest_document FORMAT=DATE7.

FROM Atropos.Documents (=imported Excel file containing the information for the month February)

WHERE document_state IN ('TODO','HOLD')

AND kind_of_document LIKE '%AVE%'

GROUP BY document_handler, document_state;

QUIT;

 

 

Result:

Comparing both tables with each other tells me a few things.

 

- I am missing 22 document_handlers in my SAS table

- The number_of_documents isn't always correct for every document_handler (for example, my query yields 2 documents for handler A and in QIT my query yields 4 documents for handler A

 

 

This is the result from QIT and this is CORRECT! (First 4 rows only.)

 

document_handlerdocument_stateamount_of_documentsoldest_document
Handler AHOLD1.023/02/2016 10:14:31
Handler BHOLD4.019/02/2016 16:39:50
Handler CTODO2.024/02/2016 11:12:46
Handler CHOLD4.0

2/02/2016 15:43:21

 

 

This is the result from SAS and this is WRONG! (First 4 rows only.)

 

document_handlerdocument_stateamount_of_documentsoldest_document
Handler BHOLD2.019/02/2016
Handler CTODO1.029/02/2016
Handler DHOLD2.019/11/2016
Handler EHOLD2.006/02/2016

 

I think something is wrong with the COUNT command, but I cannot seem to put my finger on it. Any help is greatly appreciated.


Accepted Solutions
Solution
‎04-05-2016 06:59 AM
Super User
Posts: 6,936

Re: Count in PROC SQL.

When you are missing group values, your data can't be correct. Just check the Excel if the missing document_handler values are there at all.

If a missing document_handler is there, look at the other criteria.

Since Excel is NOT suited for data transfer, you may have introduced some glitches during import.

I'd much prefer to get test data in a reasonable data transfer format like CSV.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Super User
Super User
Posts: 7,401

Re: Count in PROC SQL.

I don't see as anything has gone "wrong" as such.  SQL is passed through an SQL compiler, it is processing your commands slightly different, as there is probably other assumptions in the other SQL compiler.  If you write your SQL code such that it is explicit in its execution order then you should get the same result:

proc sql;
  create table ATROPOS.OPEN_DOC as
  select  DOCUMENT_HANDLER,
          DOCUMENT_STATE,
          count(*) as NUMBER_OF_DOCUMENTS,
          min(R_CREATION_DATE) as OLDEST_DOCUMENT format=date7.
  from    (select * from ATROPOS.DOCUMENTS where DOCUMENT_STATE in ("TODO","HOLD") and KIND_OF_DOCUMENT like '%AVE%')
  group by DOCUMENT_HANDLER,
           DOCUMENT_STATE;
quit;

You will see from the above that I have moved the whole where clause bit into a sub-query.  This means that that gets run first, and the results of that step get fed into the outer SQL which in turn generates your aggregates.  Otherwise your up to the whim of the compiler as to when the where gets executed, as with a group by statement you don't use where, but having.  So it is likely taking unexpected data, grouping and then where clausing rather than the other way round.

Solution
‎04-05-2016 06:59 AM
Super User
Posts: 6,936

Re: Count in PROC SQL.

When you are missing group values, your data can't be correct. Just check the Excel if the missing document_handler values are there at all.

If a missing document_handler is there, look at the other criteria.

Since Excel is NOT suited for data transfer, you may have introduced some glitches during import.

I'd much prefer to get test data in a reasonable data transfer format like CSV.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 62

Re: Count in PROC SQL.

As expected they proivded me with an updated Excel file and forgot to mention this. I ended up comparing apples with pears and you guys noticed that much quicker than me.

 

The query was indeed correct. Thanks for the feedback!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 415 views
  • 1 like
  • 3 in conversation