Solved: Re: PROC SQL issue where MIN function is working but when changed to M...

tugtrog · Posted 10-25-2018 04:51 PM

I am trying to write a query that will select MIN and MAX timestamp on a single day. I am able to get a vast majority of my data to pull, but there are issues on a few of the rows where I can pull the MIN date but when I change it to MAX I get a page break as the results. There are multiple timestamps per day, but if not it still should select the same timestamp for MIN and MAX.

My code is extremely long or I would post it.

I am using a having statement to select the min and max similar to this:

proc sql;

SELECT distinct t2.no,
					T1._ID,
					t2.id, 
					t3.First_Name
					t3.last_name
					t3.TITLE, 
					t3.02_LOCATION
					t3.Level_01, 
					t3.Level_02, 
					T1.M_TS
					T1.O_TS
					t3.employee
					T1.O_A
				FROM TEST.DAY T2
					INNER JOIN B.LE T1 ON (T2.ID = T1._ID and (PUT(DATEPART(T2.ADate),date9.) = PUT(DATEPART(T1.M_TS),date9.)))
					INNER JOIN B.E T3 ON (T1.O_A = T3.EMPLOYEE AND (t3.O_TS > '15Dec9999:0:0:0'dt))
					WHERE T2.NO = '11111-11111'
						GROUP BY T1._ID
							HAVING M_TS = MAX(T1.M_TS) and 
								O_TS = MIN(T1.O_TS) AND
								FIRST_NAME = scan(t3.first_name,1,' ') AND 
								LAST_NAME = SCAN(T3.LAST_NAME,1,' ')
;
run;

I change nothing in the code but the MAX to MIN and the code works. Does anyone have an idea of what could cause this issue?

novinosrin · Posted 10-25-2018 05:39 PM

MH 2 cents thought:

If you are at an organization that uses SAS at enterprise level on a commercial level, I would rather prefer to deal a lot of this in piecemeal preferably taking advantage with a datastep approach unless something is preferred by egoistic imbecile bosses who think they have achieved a great deal after an explicit sql connect pass through writing tons of variable names with the potential leading to typos and plus the risk of "remerge" that may cause so much duplication and other unwarranted intricacies causing inaccuracy and integrity issues.

Taking advantage of the beautiful PDV block with exquisite block of addresses that holds the operands reading the descriptor portion giving us the breathing space to play with variables of interest for create, insert, update, delete alone is priceless while the other variables can just sleep and still be neatly written to the output buffer.

Of course, I love proc sql and have/will use when it's not unwieldy and that involves cleaner column processing involving less lexicon. Sure Proc Sql also has to have a memory address block and even the great guru Paul Dorfman has mentioned the APP functions can be put to effect in proc sql which might of my personal interest at some point in the future. Nevertheless, long select clauses with many nested in it will certainly cause someone a heartache for sure. HTH

View solution in original post

novinosrin · Posted 10-25-2018 04:52 PM

Full code, log and error info plz

tugtrog · Posted 10-25-2018 05:05 PM

I posted the code, but I am not getting an error I am just having nulls returned in the columns when I put it in a created table, and page brake when looking at results.

novinosrin · Posted 10-25-2018 05:13 PM

Got it.

HAVING M_TS = MAX(T1.M_TS) and 
								O_TS = MIN(T1.O_TS) AND
								FIRST_NAME = scan(t3.first_name,1,' ') AND 
								LAST_NAME = SCAN(T3.LAST_NAME,1,' ')
;

Nothing wrong with the above syntactically. From a business logic or data point of view can't tell. However, simple conclusion is that no records satisfy all the conditions specified in "having" filter.

tugtrog · Posted 10-25-2018 05:17 PM

Thanks a lot for your response and time. I think it might be an issue in the data, but I figured I would check here to see if it was something else I was missing.

novinosrin · Posted 10-25-2018 05:39 PM

MH 2 cents thought:

If you are at an organization that uses SAS at enterprise level on a commercial level, I would rather prefer to deal a lot of this in piecemeal preferably taking advantage with a datastep approach unless something is preferred by egoistic imbecile bosses who think they have achieved a great deal after an explicit sql connect pass through writing tons of variable names with the potential leading to typos and plus the risk of "remerge" that may cause so much duplication and other unwarranted intricacies causing inaccuracy and integrity issues.

Taking advantage of the beautiful PDV block with exquisite block of addresses that holds the operands reading the descriptor portion giving us the breathing space to play with variables of interest for create, insert, update, delete alone is priceless while the other variables can just sleep and still be neatly written to the output buffer.

Of course, I love proc sql and have/will use when it's not unwieldy and that involves cleaner column processing involving less lexicon. Sure Proc Sql also has to have a memory address block and even the great guru Paul Dorfman has mentioned the APP functions can be put to effect in proc sql which might of my personal interest at some point in the future. Nevertheless, long select clauses with many nested in it will certainly cause someone a heartache for sure. HTH

kiranv_ · Posted 10-25-2018 07:09 PM

@novinosrin I love your answers and learn from you, i agree that Data step gives more flexibility in terms of Data processing but

1. Explicit sql has significant impact in run times and they are very important when they are lots of records.

2. Other than Proc sql , no other SQL implementation(as far as I know) allow remerge but multiple passes

novinosrin · Posted 10-25-2018 07:11 PM

@kiranv_ --"1. Explicit sql has significant impact in run times and they are very important when they are lots of records."

Not necessarily. I will prove that to you in time. Unfortunately not at 6:10 chicago time while on a hurry to catch red line CTA to get back home.

I agree to 2.

Finally Thank you for the kind note.

kiranv_ · Posted 10-25-2018 07:58 PM

sure. I worked on code to process close to 200 millions records with close to 3000 lines of SAS code to Teradata sql, difference was from 20 min to 18 hours. major time issue is because of I/0

novinosrin · Posted 10-25-2018 09:22 PM

That depends upon the quality of SAS user and not the product/language plus taking into account he objective/need.and optimum approaches.

Reeza · Posted 10-25-2018 09:33 PM

@novinosrin wrote:
That depends upon the quality of SAS user and not the product/language plus taking into account he objective/need.and optimum approaches.

Usually it has to do with running on a server versus running on a desktop is my guess.

It's a good workaround when you can't afford a SAS server, but need the power of a server. You are limited by the type of work though, because it's mostly standard data manipulation. Until SQL introduced PIVOT (2008ish) and some of the more statistical techniques it was more of an issue.

novinosrin · Posted 10-25-2018 10:39 PM

Thank you @Reeza. Well noted, The discussion might next slip into performance of proc sql vs datastep(libname) at engine level as the latter indeed has a higher overhead measuring direct extracts/in general column processing as opposed to row, fewer joins/set operators/indexes (more flexible in proc sql than datastep or even proc datasets for obvious reasons. Reference; complete guide to indexes - by author Micheal raithel) utmost.

Datastep is indeed powerful for various tasks that can be accomplished. However, saying that how extensive, variety/complexity it can get pretty much seems unlimited especially to the readers eye on this very forum when demonstrated by some whom we look up to as role models.( don't wanna keep repeating the ones I adore lol)

The matter of typing and maintaining convenience, control and the kind of objective./business requirement beyond the set of functionality described in paragraph one by proc sql and the control of handling variables to the tune of a SAS user be row level processing or column level processing(I must emphasize the required proficiency hash, array(multi) and dow etc) is what makes SAS the classy gem, a kind of its own and yet one can be obliterate to the descriptors(meta portion) i.e not having to worry about the variables that's that not required for processing which doesn't hinder.

On the other hand in proc sql, if fewer select clauses with small number of vars is fine, however long clauses with a huge list of variables, and joins made on multiple variables(with several AND operations) , aliases, ambiguous references and many more unnecessary nits to take care of. And on top of the that, Cartesian is another sweetheart to beware.

So to conclude, besides the server and other things, my view is that different objectives may warrant efficient approaches that would test the individual super user to his/her merits in coming up with the most optimum quality solution.

Best Regards!

PS IMHO, if SAS didn't provide the scope of competitiveness to pursue further far and wide, the charm and joy of coding wouldn't exist. 🙂 Lastly, I use sql when that happens to be the most laziest option to type(code) yet comparable to other options.

tugtrog · Posted 10-26-2018 10:57 AM

This really helped, thank you!

novinosrin · Posted 10-26-2018 01:06 PM

Hey @tugtrog Sorry for the bother and I hope your day is going well.

if you have a moment, take a look at the below new thread posted by a new member 🙂

https://communities.sas.com/t5/New-SAS-User/Merging-Data-in-Proc-SQL/m-p/507812

What a deja vu 🙂 So that means our discussion has actually made sense 🙂

tugtrog · Posted 10-26-2018 01:58 PM

Hey @novinosrin, you are not bothering me at all. I appreciate you sharing this with me, I am actually working on using a data step and PDV right now. Our discussion really did help me, and opened my mind to a whole new idea on how to quickly execute some massive queries on massive data sets that I don't know if I would have been exposed to otherwise.

Thanks again! Have a great weekend .

PS feel free to share anything with me!

PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestamp

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Re: PROC SQL issue where MIN function is working but when changed to MAX returns nothing on timestam

Ready to join fellow brilliant minds for the SAS Hackathon?