I am trying to update a master table shell with some demographics. The code I am using worked on a dataset of 32 million records, but now I am having to deal with a dataset that has almost 100 million records. I had to stop the code after 21 days of it running. If anyone has any suggestions about how I can improve the processing time, I am all ears. Someone suggested removing the select statements and replacing with inner joins, but I can't seem to get that working or put in the proper order. Here is the original code I was using. It's nothing fancy.
PROC SQL;
UPDATE T.MASTER_PATIENT_RX A
SET
PERSONID=(SELECT PERSONID FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_LASTNAME=(SELECT (UPCASE(LastName)) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_FIRSTNAME=(SELECT (UPCASE(FirstName)) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_MIDDLENAME=(SELECT (UPCASE(MiddleName)) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_DOB=(SELECT DOB FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_AGEYR=(
SELECT
(
CASE
WHEN DOB NE . AND A.RX_DTFILLED NE . THEN FLOOR((INTCK('MONTH',DOB,RX_DTFILLED)-(DAY(RX_DTFILLED)<DAY(DOB)))/12)
ELSE .
END
)
FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_AGEMO=(
SELECT
(
CASE
WHEN PT_AGEYR EQ 0 AND DOB NE . AND A.RX_DTFILLED NE . THEN FLOOR((INTCK('MONTH',DOB,RX_DTFILLED)-(DAY(RX_DTFILLED)<DAY(DOB))))
ELSE .
END
)
FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_GENDER=(SELECT PT_GENDER FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_GENDERID=(SELECT GenderID FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_SPECIES=(
SELECT
(
CASE
WHEN SpeciesCode=1 THEN 'HUMAN'
WHEN SpeciesCode=2 THEN 'VETERINARY'
ELSE 'UNKNOWN'
END
)
FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_SPECIESID=(SELECT SpeciesCode FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_ADDR1=(SELECT (UPCASE(Street)) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_ADDR2=(SELECT (UPCASE(Street2)) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_CITY=(SELECT (UPCASE(City)) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_STATE=(SELECT (UPCASE((SUBSTR(State,1,2)))) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_ZIP=(SELECT (SUBSTR(Zip,1,5)) FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD),
PT_COUNTYFIPS=(SELECT CountyFIPS FROM ZW.MASTER_PATIENT AS B WHERE A.PATIENTID=B.PatientiD)
WHERE PATIENTID IN (SELECT PatientiD FROM ZW.MASTER_PATIENT)
;
QUIT;
The data is on an MS SQL Server, but the folks that own the data will not let us have access to it. I've asked for that, but was denied. Instead they are giving us copies of the tables in SAS.
If this were my project, working on a local copy of data and there is only one patientid in the master record set, I would be very tempted to 1) rename variables in the transaction data set to match the master data, 2) do the transformations in the transaction data before merging and 3) use as data set update, which does require sorting the data by the same variables used in a by statement to align things..
I have a sneaking suspicion that having 19 select -where clauses (especially since they are essentially the same clause) can't be very efficient.
Have you done anything like look at system memory and/or cpu usage while this is running?
Echo with @ballardw, SQL runs faster when it is simple, complex code confuses its compiler thus weaken its optimizer. Break it down to 19+ separated procedures, you will be surprised how fast it goes.
For big table. SQL is not efficient . Consider about using Hash Table ?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.