Help using Base SAS procedures

PROC SQL UPDATE taking forever to run

Reply
Occasional Contributor
Posts: 15

PROC SQL UPDATE taking forever to run

I am trying to update a master table shell with some demographics. The code I am using worked on a dataset of 32 million records, but now I am having to deal with a dataset that has almost 100 million records. I had to stop the code after 21 days of it running. If anyone has any suggestions about how I can improve the processing time, I am all ears. Someone suggested removing the select statements and replacing with inner joins, but I can't seem to get that working or put in the proper order. Here is the original code I was using. It's nothing fancy.

 

PROC SQL;
	UPDATE T.MASTER_PATIENT_RX A
		SET
			PERSONID=(SELECT PERSONID FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),	
			PT_LASTNAME=(SELECT (UPCASE(LastName)) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_FIRSTNAME=(SELECT (UPCASE(FirstName)) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_MIDDLENAME=(SELECT (UPCASE(MiddleName)) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_DOB=(SELECT DOB FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_AGEYR=(
				SELECT 
					(
					CASE
						WHEN DOB NE . AND A.RX_DTFILLED NE . THEN FLOOR((INTCK('MONTH',DOB,RX_DTFILLED)-(DAY(RX_DTFILLED)<DAY(DOB)))/12)
						ELSE .
					END
					) 
				FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_AGEMO=(
				SELECT 
					(
					CASE
						WHEN PT_AGEYR EQ 0 AND DOB NE . AND A.RX_DTFILLED NE . THEN FLOOR((INTCK('MONTH',DOB,RX_DTFILLED)-(DAY(RX_DTFILLED)<DAY(DOB))))
						ELSE .
					END
					) 
				FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_GENDER=(SELECT PT_GENDER FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_GENDERID=(SELECT GenderID FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_SPECIES=(
				SELECT
					(
					CASE
						WHEN SpeciesCode=1 THEN 'HUMAN'
						WHEN SpeciesCode=2 THEN 'VETERINARY'
						ELSE 'UNKNOWN'
					END
					)
				FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_SPECIESID=(SELECT SpeciesCode FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_ADDR1=(SELECT (UPCASE(Street)) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_ADDR2=(SELECT (UPCASE(Street2)) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_CITY=(SELECT (UPCASE(City)) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_STATE=(SELECT (UPCASE((SUBSTR(State,1,2)))) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_ZIP=(SELECT (SUBSTR(Zip,1,5)) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
			PT_COUNTYFIPS=(SELECT CountyFIPS FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD)
		WHERE PATIENTID IN (SELECT PatientiD FROM ZW.MASTER_PATIENT)
		;
QUIT;
Super User
Posts: 17,828

Re: PROC SQL UPDATE taking forever to run

Is your data on a server? Can you use Pass Through SQL? If so, a native SQL query in that language may be faster.
Occasional Contributor
Posts: 15

Re: PROC SQL UPDATE taking forever to run

The data is on an MS SQL Server, but the folks that own the data will not let us have access to it. I've asked for that, but was denied. Instead they are giving us copies of the tables in SAS.

Super User
Posts: 17,828

Re: PROC SQL UPDATE taking forever to run

Ok, Process changes: I would suggest they give you read access to the server. You can read tables but not change them. If this is a regular process I would also suggest you ask if they can do it for you.

It looks like you're only joining to one table so you may want to consider a left join, but that would create a new table. Another option is a bunch of formats, but I'm not sure that would be any more efficient in the long run.

Super User
Posts: 10,500

Re: PROC SQL UPDATE taking forever to run

[ Edited ]

If this were my project, working on a local copy of data and  there is only one patientid in the master record set, I would be very tempted to 1) rename variables in the transaction data set to match the master data, 2) do the transformations in the transaction data before merging and 3) use as data set update, which does require sorting the data by the same variables used in a by statement to align things..

 

I have a sneaking suspicion that having 19 select -where clauses (especially since they are essentially the same clause) can't be very efficient.

 

Have you done anything like look at system memory and/or cpu usage while this is running?

 

Respected Advisor
Posts: 3,124

Re: PROC SQL UPDATE taking forever to run

Echo with @ballardw, SQL runs faster when it is simple, complex code confuses its compiler thus weaken its optimizer. Break it down to 19+ separated procedures, you will be surprised how fast it goes.

Super User
Posts: 9,681

Re: PROC SQL UPDATE taking forever to run

For big table. SQL is not efficient .  Consider about using Hash Table ?

Ask a Question
Discussion stats
  • 6 replies
  • 401 views
  • 1 like
  • 5 in conversation