topic Reading in a large dataset using Proc SQL in SAS Programming

Reading in a large dataset using Proc SQL

NR13 — Wed, 26 Feb 2020 14:54:36 GMT

Hi,

I'm looking for a way to filter a large dataset that is being read in, using proc sql. I'd ideally like to only read in appointment numbers that match appointment numbers in another dataset. As it stands now, the dataset takes over an hour to read in, so I'm trying to speed it up. In the past I've used something like this:

Proc SQL;
Create Table New as
Select *
From All_Scores as A
,DI_SCR (where=(DI=&DIid.)) as B
,OrgCodes as C
Where A.APPT_ID=B.APPT_ID
AND B.DI_ORG_ID=C.ORG_ID;

Quit;

The "where=" in the from statement significantly cut down the processing time. Is there a way to do something similar, that only reads in the APPT_IDs from B, if the APPT_ID is also in A?

Thanks in advance!

Re: Reading in a large dataset using Proc SQL

Kurt_Bremser — Wed, 26 Feb 2020 15:05:19 GMT

How many obs are in DI_SCR and OrgCodes?

The optimal solution might be a data step with hash objects used for the lookup.

Re: Reading in a large dataset using Proc SQL

ChrisNZ — Wed, 26 Feb 2020 20:16:32 GMT

Are these SAS data sets?

Are the tables sorted? Or indexed? Can they easily be kept sorted and/or indexed, or are they refreshed too often?

How many records in each?

You use select *. Do you actually need all columns from all tables?

Please add the _method option after proc sql so we can see how SAS executes the joins.