BookmarkSubscribeRSS Feed
raheleh22
Obsidian | Level 7

Hi, 

I have a basic laptop, and working with my huge dataset ( millions of rows and 100 variables) turned into a nightmare. Therefore I am planning to request an upgrade whether on my laptop or request a desktop from my advisor. I do not have IT knowledge and I need to know what laptop/desktop characteristics (IT-related characteristics INCLUDING CPU,...) are recommended to have the SAS work properly and faster on huge dataset analysis. I appreciate any advice on this matter. 

 

10 REPLIES 10
Reeza
Super User
Size of dataset? How much RAM/HD do you have? What type of procs are you running that are problematic. Data steps typically work on HD, regression procs need RAM.

I've worked with datasets with 30 million rows, 200+ variables on a desktop with 8GB RAM and 2TB...but it took 20 minutes to process a data step.
raheleh22
Obsidian | Level 7

The dataset is almost 3 GiG itself. ( it has >100 million rows). recently I had a "guessing row" in my proc and it took 1 hour to run. I will need to run regression procs later but now I am more in descriptive process and data transformation, and it takes too long. 

Reeza
Super User
3GB isn't that big actually and should be handled fairly well in SAS. However, reading in a 3GB dataset via PROC IMPORT is a bad idea. You should run it for the first few thousand rows, get the code and then run it as a data step to import your data more efficiently.

ballardw
Super User

If the issue is with "guessingrows" then you need to stop doing that with large data sets (limited to 32K rows anyway). Any data set of 100 million rows without documentation "forcing" use of Proc Import (basically the only use of guessingrows) is garbage to begin with. Use the documentation to write a proper data step program.

 

Note that reading 100 million rows of data in one hour is roughly 28,000 rows per second. It takes time to read/process big files. Period.

A faster disk may be of more benefit than memory or CPU upgrades with large files.

 

FreelanceReinh
Jade | Level 19

Hi @raheleh22,

 

Obviously, I don't know your project, but chances are that your current hardware is fully sufficient for a much larger part of your work than you might think.

 

As others have already suggested, I would start with writing a DATA step reading a few thousand records of the raw data (using the OBS= option of the INFILE statement). Then, after a very short run time and a careful check of the log, you will have created a fairly small SAS dataset which you can explore with PROC PRINT, PROC FREQ, PROC MEANS, etc. with technical data correctness being the first objective. Are all variables populated as expected? Check for missing values, truncated character values, incorrect (in)formats and so on and correct the reading step as necessary. A single run of the "pre-final" DATA step on the full raw data file will show if unexpected issues occur further down in the raw data. 

 

Then continue the investigation of the small dataset to become familiar with the data structure. What are the key variables? How many observations are there per key variable combination? Is the dataset already properly sorted? Are there redundant variables? ...

 

Based on that knowledge together with the data description you can decide if the existing dataset is a sufficient basis for writing the DATA and PROC steps for the "descriptive process and data transformation" that you mentioned and the regression or if you need to read more records from the raw data (not necessarily the first n records, possibly only selected variables).

 

But even a sufficient extended subset of the data is most likely small enough that run time will not be an issue until you have developed almost production-ready programs for plausibility checks, descriptive tables and graphs as well as inferential analyses.

 

At this advanced stage you can (step by step) increase the number of records involved and see how the run times of the various DATA and PROC steps change. Thanks to the extensive preparations you will hardly ever need to run a step many times because of mistakes.

 

So, probably most of your work can be done on a relatively small subset of the data (e.g., a suitable random sample) and before a few (overnight) runs of the programs on the full data, your hardware limitations will not be an obstacle.

 

Good luck!

Quentin
Super User

What does "nightmare" mean.  Does it means painfully slow, or were you actually getting errors?  If you were getting errors, what errors did you get, and what were you were doing when you got the error? 

 

Also, since you mention having an advisor I assume that means you are at a university?  You may want to ask if the university has a SAS server environment that you could use for this work.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Joe Madden & Joseph Henry present Putting Power into the Hands of the Programmer with SAS Viya Workbench on Wednesday Nov 6.
Register now at https://www.basug.org/events.
raheleh22
Obsidian | Level 7

Actually, the code runs with no errors, but it is slow and takes time. yes, it is a university environment, but they do not work with SAS and they work with other software therefore they do not have a SAS server. 

Reeza
Super User
PROC IMPORT is always slow because it has to scan the data and guess at types. Using a data step is orders of magnitude faster. Try it and see.
Reeza
Super User
Though to be fair, you're only doing this once typically and then saving it to a drive so it's not a huge time saver, just something to know.

Also, if you have an SSD it's faster than a typical drive.
ballardw
Super User

@Reeza wrote:
Though to be fair, you're only doing this once typically and then saving it to a drive so it's not a huge time saver, just something to know.

Also, if you have an SSD it's faster than a typical drive.

One would hope so. But how many examples do we find from beginners re-importing files for each session because they don't import the data to a permanent library for reuse on this forum?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1408 views
  • 6 likes
  • 5 in conversation