BookmarkSubscribeRSS Feed
Marrrissa
Calcite | Level 5
Version 7.1 Looking for some help, my only co-worker left a few weeks ago and I have to have this data pulled for billing our vendors. Admittedly, I probably don’t know as much as I should for this analyst role but it was forced on me after we lost someone and I had been learning from my only team member previously.

Using SQL in SAS. I’m trying to pull the last 4 years of usage history from a dataset containing 28 million rows. The problem is I can’t get it to run in less than 12 hours before I get kicked off the VPN. Last month the table had 27 million and it ran just fine but this month no luck. There is a lot more to this code but this is the part that takes the longest.

%datacon;
%let billmo = 10;
%let billyr = 2021;

Proc sql;
Create table prior4yr
Select *
From usage_history
Where year(datepart(first_login)) > sum((&billyr.,-4) or (year(datepart(first_login)) = sum(&billyr., -4) and month(datepart(first_login)) > &billmo.);
Quit;

Is there a way to rewrite this so it doesn’t have to look through all the 28 million rows in usage_history? If it’s helpful at all, I pull this data every month so I do have an older copy of my prior4yr table saved and I really only need to pull in data from the last 2 months to add to prior4yr.
6 REPLIES 6
SASKiwi
PROC Star

It looks like you are reading this data from an external database. Do you know what it is? You can most likely speed this up by converting it to an SQL Passthru query that uses database-specific SQL. Also your WHERE statement should be just a DATETIME selection and not include functions. Something like this:

%let Bill_Date = %sysfunc(intnx(MONTH, %sysfunc(today()), -1, BEGINNING), date9.);
%put Bill_Date = &Bill_Date;

proc sql;
  create table want as
  select *
  from have
  where first_login > "&Bill_Date.:00:00:00"dt;
quit;
Reeza
Super User
IME a basic query like this should run in under 20 minutes on a server or 1 hour on a desktop.
Something else is the issue...did the VPN change? Did your coworker have some extra privileges or access to more resources on the server as a super user or something?
AlanC
Barite | Level 11

A lot of times, these large, exponential increases in time are due to network latency. The dataset is on a NAS drive or SAN, and you are trying to pull the data to your local machine across VPN. Do as suggested and force the processing to the server or some place local to the dataset. Ping the server where the dataset is located and you will probably find your culprit. 28M recs is not that much but the time indicates a lot of back and forth over a network. 

https://github.com/savian-net
Ksharp
Super User
As SASKiwi said,
if you read data from external database, also try option ' readbuff=10000 ' of libname statement.
Sajid01
Meteorite | Level 14

You are connecting to a remote database through vpn.
Try to minimise data transfer. Therefore the query must be executed as sql pass through as suggested by @SASKiwi 

Kurt_Bremser
Super User

From your code, it looks as if your dataset is in WORK (single level name). Is your VPN purely on your connection to a SAS server, or do you actually run SAS on your PC and connect to the data through the VPN?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 628 views
  • 1 like
  • 7 in conversation