Hi,
I'm looking for a way to filter a large dataset that is being read in, using proc sql. I'd ideally like to only read in appointment numbers that match appointment numbers in another dataset. As it stands now, the dataset takes over an hour to read in, so I'm trying to speed it up. In the past I've used something like this:
Proc SQL;
Create Table New as
Select *
From All_Scores as A
,DI_SCR (where=(DI=&DIid.)) as B
,OrgCodes as C
Where A.APPT_ID=B.APPT_ID
AND B.DI_ORG_ID=C.ORG_ID;
Quit;
The "where=" in the from statement significantly cut down the processing time. Is there a way to do something similar, that only reads in the APPT_IDs from B, if the APPT_ID is also in A?
Thanks in advance!
How many obs are in DI_SCR and OrgCodes?
The optimal solution might be a data step with hash objects used for the lookup.
Are these SAS data sets?
Are the tables sorted? Or indexed? Can they easily be kept sorted and/or indexed, or are they refreshed too often?
How many records in each?
You use select *. Do you actually need all columns from all tables?
Please add the _method option after proc sql so we can see how SAS executes the joins.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.