DATA Step, Macro, Functions and more

Group by / Having statement in SQL

Reply
Occasional Contributor
Posts: 16

Group by / Having statement in SQL

Hello,

 

I am trying to use the Group by / Having statement  to filter a data set. This is what I am trying to do:


1. PROD_DESCRIPTION is not empty

2. ACCT_NAME contains "CARD" and have the lastest update_time in this variable

 

Somehow, it only generated the list with PROD_DESCRIPTION is not empty, and the second condition did not get caught, but I am sure there are some that satisifies the condition.

 

my code looks like following, please help. Thank you.

 

PROC SQL;

CREATE TABLE test2 AS

SELECT

t1.*

FROM test1 t1

group by ID

having (PROD_DESCRIPTION ne ""

  or (ACCT_NAME contains "CARD" and UPDATE_TIME = MAX(UPDATE_TIME)))

 

;

QUIT;

Super User
Posts: 10,500

Re: Group by / Having statement in SQL

It helps to provide some example data that shows some of the records you expect to see in the result but don't get. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

Also have you tried

having not missing(PROD_DESCRIPTION)

  or (ACCT_NAME contains "CARD" and UPDATE_TIME = MAX(UPDATE_TIME))

Occasional Contributor
Posts: 16

Re: Group by / Having statement in SQL

Thank you for ur reply. the code you presented in your reply did not help. it had the same result as my original code.

 

Lets have an example,

 

ID       PROD_DESCRIPTION       ACCT_Name      Update_Time
1         pen                                      pen                      Jul2014
1         cup                                      cup                      May2014
1                                                    card                     Jun2015
1                                                    card                     Jan2016


From my code, the result only shows the first two row, but I want to get 1st, 2nd and 4th row.

Hope it helps.

Thank you in advance.

Super User
Posts: 10,500

Re: Group by / Having statement in SQL

Do you see the difference between "CARD" and card in your example data? Text comparisons are case sensitive

Occasional Contributor
Posts: 16

Re: Group by / Having statement in SQL

Thank you.

 

sorry it was a typo. In my data, it was upper case as 'CARD', but it did not help.

 

but thank you for point it out.

Super User
Posts: 10,500

Re: Group by / Having statement in SQL

Without data to test on it can be difficult to determine where actual results are. Use of CONTAINS seems to be problematic and may be padding the variable to length such that "CARD" is not equal to "CARD    " or similar, especially if using code developed in another SQL as a pattern.

 

I suggest also trying

 

having not missing(PROD_DESCRIPTION)

  or (index(ACCT_NAME, "CARD")>0 and UPDATE_TIME = MAX(UPDATE_TIME))

 

or use STRIP(acct_name) in the contains clause.

 

Also confirm that your UPDATE_TIME is a SAS time or Datetime variable. If the values are character then the MAX function likely doesn't like it though an error may not result.

 

You can provide data for us to use based on your dataset. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

Or a dummy dataset with the same variables and values that duplicate the behavior

 

Occasional Contributor
Posts: 16

Re: Group by / Having statement in SQL

Thank you again for help me.

 

the "UPDATE_TIME" is in Date/Time format.

 

One thing I need to point out is, keep the condition  having not missing(PROD_DESCRIPTION),

1. if I just add one condition which is ACCT_NAME contains "CARD", it worked.

2. if I just add one condition which is UPDATE_TIME = MAX(UPDATE_TIME), it worked too.

but when I add both of them together, it did not work.

 

I am so confused.

Occasional Contributor
Posts: 16

Re: Group by / Having statement in SQL

because of what I do, I could not share the data set. its little touchy...sorry. but thank you for recommending.
Super User
Posts: 10,500

Re: Group by / Having statement in SQL

dummy data that has the same data characteristics, lengths, types and formats of variables is sufficient as long as it duplicates the behavior you observe and can be manipulated to generate the desired results. Such as looking for CARD vs card.

Occasional Contributor
Posts: 16

Re: Group by / Having statement in SQL

Yes, that can be done.  let me work on this over the weekend, if I still had problem, I would consult you. Thank you again, have a nice weekend.

Respected Advisor
Posts: 4,649

Re: Group by / Having statement in SQL

I get the expected result with:

 

data test1;
input ID PROD_DESCRIPTION $ ACCT_Name :$8. t :$7.;
Update_Time = input(cats("01",t), date9.);
format Update_Time yymmdd10.;
datalines;
1         PEN    PEN    JUL2014
1         CUP    CUP    MAY2014
1         .      CARD   JUN2015
1         .      CARD   JAN2016
;

PROC SQL;
CREATE TABLE test2 AS
SELECT
    t1.*
FROM test1 as t1
group by ID
having PROD_DESCRIPTION is not missing or
  ACCT_NAME contains "CARD" and UPDATE_TIME = MAX(UPDATE_TIME);
select * from test2;
QUIT;
PG
Occasional Contributor
Posts: 16

Re: Group by / Having statement in SQL

Thank you, it worked. but it gives me in the "Result" tab, but in the "Output Data" tab. By the way, I am using SAS Enterprise Guide 6.1.

 

I need this to be under "Output Data" tab, as i need to do further process.

 

Not too sure if your SAS version. Hope you can help further. Thank you in advance.

 

 

Respected Advisor
Posts: 4,649

Re: Group by / Having statement in SQL

You should be getting both version of the output. I am not a user of EG interface, but my guess is: CREATE TABLE TEST2 AS creates the TEST2 dataset in the WORK library (Output data) and SELECT * FROM TEST2; prints the contents of TEST2 in the Results. That last statement is just there for your convenience, it is not required. 

 

Further processing should use the TEST2 dataset.

PG
Ask a Question
Discussion stats
  • 12 replies
  • 227 views
  • 0 likes
  • 3 in conversation