BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Yves_Boonen
Quartz | Level 8

I just started working as an analyst within a bank. Since they work in databases on different platforms and since the leading analyst has a huge workload, I am tasked with exploring SAS and its possibilities.

 

One of these databases is a java webbased database build in a program called QIT. This database contains every dossier that has been opened for our customers. This database contains information such as:

 

- kind_of_document   (AVE, DOK, FAK, TRIGGER,...)

- document_handler   (Name of the employee of the bank who is handling this dossier.)

- r_creation_date   (The date on which the dossier has been created.)

- document_state   (CANCELED, HOLD, TODO, TREATED)

 

The exercise is to create a table in SAS that checks all AVE related dossiers with a document_state of TODO or HOLD, sorted by document_handler. (See results below.) Since I have no access to the QIT database yet, they have given me an Excel file containing said information for the month February. At the end of my exercise, I can easily double-check my results through a similar query in QIT. I noticed that my SAS query yields 45 rows as a result, whereas my QIT query yields 67 rows as a result. I have to be doing something wrong.

 

 

QIT query:

select document_handler, document_state, count (*) as amount_of_documents, min (r_creation_Date) as oldest_document

from atp_document

where document_state in ('TODO', 'HOLD')

and kind_of_document like '%AVE%'

and r_creation_date < date ('01/03/2016')

and r_creation_date > date ('31/01/2016')

group by document_handler, document_state

 

 

SAS query:

PROC SQL;

CREATE TABLE Atropos.Open_Doc AS

SELECT document_handler, document_state, COUNT(*) AS number_of_documents, MIN(r_creation_date) AS oldest_document FORMAT=DATE7.

FROM Atropos.Documents (=imported Excel file containing the information for the month February)

WHERE document_state IN ('TODO','HOLD')

AND kind_of_document LIKE '%AVE%'

GROUP BY document_handler, document_state;

QUIT;

 

 

Result:

Comparing both tables with each other tells me a few things.

 

- I am missing 22 document_handlers in my SAS table

- The number_of_documents isn't always correct for every document_handler (for example, my query yields 2 documents for handler A and in QIT my query yields 4 documents for handler A

 

 

This is the result from QIT and this is CORRECT! (First 4 rows only.)

 

document_handlerdocument_stateamount_of_documentsoldest_document
Handler AHOLD1.023/02/2016 10:14:31
Handler BHOLD4.019/02/2016 16:39:50
Handler CTODO2.024/02/2016 11:12:46
Handler CHOLD4.0

2/02/2016 15:43:21

 

 

This is the result from SAS and this is WRONG! (First 4 rows only.)

 

document_handlerdocument_stateamount_of_documentsoldest_document
Handler BHOLD2.019/02/2016
Handler CTODO1.029/02/2016
Handler DHOLD2.019/11/2016
Handler EHOLD2.006/02/2016

 

I think something is wrong with the COUNT command, but I cannot seem to put my finger on it. Any help is greatly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

When you are missing group values, your data can't be correct. Just check the Excel if the missing document_handler values are there at all.

If a missing document_handler is there, look at the other criteria.

Since Excel is NOT suited for data transfer, you may have introduced some glitches during import.

I'd much prefer to get test data in a reasonable data transfer format like CSV.

View solution in original post

3 REPLIES 3
RW9
Diamond | Level 26 RW9
Diamond | Level 26

I don't see as anything has gone "wrong" as such.  SQL is passed through an SQL compiler, it is processing your commands slightly different, as there is probably other assumptions in the other SQL compiler.  If you write your SQL code such that it is explicit in its execution order then you should get the same result:

proc sql;
  create table ATROPOS.OPEN_DOC as
  select  DOCUMENT_HANDLER,
          DOCUMENT_STATE,
          count(*) as NUMBER_OF_DOCUMENTS,
          min(R_CREATION_DATE) as OLDEST_DOCUMENT format=date7.
  from    (select * from ATROPOS.DOCUMENTS where DOCUMENT_STATE in ("TODO","HOLD") and KIND_OF_DOCUMENT like '%AVE%')
  group by DOCUMENT_HANDLER,
           DOCUMENT_STATE;
quit;

You will see from the above that I have moved the whole where clause bit into a sub-query.  This means that that gets run first, and the results of that step get fed into the outer SQL which in turn generates your aggregates.  Otherwise your up to the whim of the compiler as to when the where gets executed, as with a group by statement you don't use where, but having.  So it is likely taking unexpected data, grouping and then where clausing rather than the other way round.

Kurt_Bremser
Super User

When you are missing group values, your data can't be correct. Just check the Excel if the missing document_handler values are there at all.

If a missing document_handler is there, look at the other criteria.

Since Excel is NOT suited for data transfer, you may have introduced some glitches during import.

I'd much prefer to get test data in a reasonable data transfer format like CSV.

Yves_Boonen
Quartz | Level 8

As expected they proivded me with an updated Excel file and forgot to mention this. I ended up comparing apples with pears and you guys noticed that much quicker than me.

 

The query was indeed correct. Thanks for the feedback!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1368 views
  • 1 like
  • 3 in conversation