Solved: Re: Count in PROC SQL.

Yves_Boonen · Posted 04-04-2016 09:21 AM

I just started working as an analyst within a bank. Since they work in databases on different platforms and since the leading analyst has a huge workload, I am tasked with exploring SAS and its possibilities.

One of these databases is a java webbased database build in a program called QIT. This database contains every dossier that has been opened for our customers. This database contains information such as:

- kind_of_document (AVE, DOK, FAK, TRIGGER,...)

- document_handler (Name of the employee of the bank who is handling this dossier.)

- r_creation_date (The date on which the dossier has been created.)

- document_state (CANCELED, HOLD, TODO, TREATED)

The exercise is to create a table in SAS that checks all AVE related dossiers with a document_state of TODO or HOLD, sorted by document_handler. (See results below.) Since I have no access to the QIT database yet, they have given me an Excel file containing said information for the month February. At the end of my exercise, I can easily double-check my results through a similar query in QIT. I noticed that my SAS query yields 45 rows as a result, whereas my QIT query yields 67 rows as a result. I have to be doing something wrong.

QIT query:

select document_handler, document_state, count (*) as amount_of_documents, min (r_creation_Date) as oldest_document

from atp_document

where document_state in ('TODO', 'HOLD')

and kind_of_document like '%AVE%'

and r_creation_date < date ('01/03/2016')

and r_creation_date > date ('31/01/2016')

group by document_handler, document_state

SAS query:

PROC SQL;

CREATE TABLE Atropos.Open_Doc AS

SELECT document_handler, document_state, COUNT(*) AS number_of_documents, MIN(r_creation_date) AS oldest_document FORMAT=DATE7.

FROM Atropos.Documents (=imported Excel file containing the information for the month February)

WHERE document_state IN ('TODO','HOLD')

AND kind_of_document LIKE '%AVE%'

GROUP BY document_handler, document_state;

QUIT;

Result:

Comparing both tables with each other tells me a few things.

- I am missing 22 document_handlers in my SAS table

- The number_of_documents isn't always correct for every document_handler (for example, my query yields 2 documents for handler A and in QIT my query yields 4 documents for handler A

This is the result from QIT and this is CORRECT! (First 4 rows only.)

document_handler	document_state	amount_of_documents	oldest_document
Handler A	HOLD	1.0	23/02/2016 10:14:31
Handler B	HOLD	4.0	19/02/2016 16:39:50
Handler C	TODO	2.0	24/02/2016 11:12:46
Handler C	HOLD	4.0	2/02/2016 15:43:21

This is the result from SAS and this is WRONG! (First 4 rows only.)

document_handler	document_state	amount_of_documents	oldest_document
Handler B	HOLD	2.0	19/02/2016
Handler C	TODO	1.0	29/02/2016
Handler D	HOLD	2.0	19/11/2016
Handler E	HOLD	2.0	06/02/2016

I think something is wrong with the COUNT command, but I cannot seem to put my finger on it. Any help is greatly appreciated.

Kurt_Bremser · Posted 04-04-2016 09:43 AM

When you are missing group values, your data can't be correct. Just check the Excel if the missing document_handler values are there at all.

If a missing document_handler is there, look at the other criteria.

Since Excel is NOT suited for data transfer, you may have introduced some glitches during import.

I'd much prefer to get test data in a reasonable data transfer format like CSV.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

RW9 · Posted 04-04-2016 09:34 AM

I don't see as anything has gone "wrong" as such. SQL is passed through an SQL compiler, it is processing your commands slightly different, as there is probably other assumptions in the other SQL compiler. If you write your SQL code such that it is explicit in its execution order then you should get the same result:

proc sql;
  create table ATROPOS.OPEN_DOC as
  select  DOCUMENT_HANDLER,
          DOCUMENT_STATE,
          count(*) as NUMBER_OF_DOCUMENTS,
          min(R_CREATION_DATE) as OLDEST_DOCUMENT format=date7.
  from    (select * from ATROPOS.DOCUMENTS where DOCUMENT_STATE in ("TODO","HOLD") and KIND_OF_DOCUMENT like '%AVE%')
  group by DOCUMENT_HANDLER,
           DOCUMENT_STATE;
quit;

You will see from the above that I have moved the whole where clause bit into a sub-query. This means that that gets run first, and the results of that step get fed into the outer SQL which in turn generates your aggregates. Otherwise your up to the whim of the compiler as to when the where gets executed, as with a group by statement you don't use where, but having. So it is likely taking unexpected data, grouping and then where clausing rather than the other way round.

Kurt_Bremser · Posted 04-04-2016 09:43 AM

When you are missing group values, your data can't be correct. Just check the Excel if the missing document_handler values are there at all.

If a missing document_handler is there, look at the other criteria.

Since Excel is NOT suited for data transfer, you may have introduced some glitches during import.

I'd much prefer to get test data in a reasonable data transfer format like CSV.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Yves_Boonen · Posted 04-05-2016 04:55 AM

As expected they proivded me with an updated Excel file and forgot to mention this. I ended up comparing apples with pears and you guys noticed that much quicker than me.

The query was indeed correct. Thanks for the feedback!

Count in PROC SQL.

Re: Count in PROC SQL.

Re: Count in PROC SQL.

Re: Count in PROC SQL.

Re: Count in PROC SQL.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away