BookmarkSubscribeRSS Feed
makset
Obsidian | Level 7

I have several data files about 1 TB in total (16 files).
I want to have access to the data contained therein as soon as possible. I am asking for some advice and thoughts.
my workstation:

W8.1Pro

sas 9.2

2x xeon e5-2630 v3

RAM 64GB

nvme samsung mzvpv256

nvme wds100t3xoc-00sj 1TB

 

Best regards and thank you in advance for your help

17 REPLIES 17
Reeza
Super User
SAS processes data row by row so some things work quite well on a desktop regardless of size, but take time.

Are these SAS files or text files? And most importantly - what's your question? What do you need help with?
makset
Obsidian | Level 7

@Reeza wrote:
Are these SAS files or text files?

sas file


@Reeza wrote:
what's your question? What do you need help with?

The question is not about a specific problem.
I have, for example, 16 cores and a fairly fast nvme, so maybe I will divide each file into 16 equal parts and process them in parallel?
maybe there is another way?
maybe a hasch table? I'm learning now

best regards

Tom
Super User Tom
Super User

It all depends on what you are trying to do.  In general it might be best to summarize the data into much smaller size and then do your analysis from the summary.  But whether that is possible or how to do it depends on what analysis you are doing.

Reeza
Super User
A vague, general question gets a vague general answer.
makset
Obsidian | Level 7

Write what comes to your mind in this specific topic or indicate places where I can find something valuable in your opinion and what I could have missed
-I used the hash table
-I got interested in sasphile but it has big limitations in my opinion

Patrick
Opal | Level 21

@makset 

Your question is way to vague and generic to propose anything else than reading/books. So if it's about dealing with large data sets then you might want to Google for SAS topics dealing with performance - and there's lot about this out there.

Also spending time reading SAS® 9.4 Language Reference: Concepts, Sixth Edition is likely valuable for you as it explains how SAS actually works and how it processes data.

 

About SAS hashes: They get loaded into memory with fully expanded column length so likely not suitable for your huge tables unless you've got actually the required memory available.

SASKiwi
PROC Star

I doubt there is hash functionality in SAS 9.2.

I recommend you get your SAS software up-to-date. I would expect SAS 9.4 to be more efficient and offer more capabilities than 9.2.

makset
Obsidian | Level 7

What in sas 9.4 (TS1M6) is so good (better than 9.2)

SASKiwi
PROC Star

I suggest you refer to the SAS documentation for a complete list of improvements (documentation.sas.com). There are hundreds if not thousands of them.

 

I'd expect SAS 9.4 to be significantly faster than 9.2 as well, although that may depend on the type of processing you are doing.

 

BTW I just checked and you are in luck. DATA Step HASH was implemented in SAS 9.1 so you will have it in 9.2: https://support.sas.com/kb/11/391.html

SASKiwi
PROC Star

@mkeintz - Yes, see my follow up post.

mkeintz
PROC Star

@SASKiwi 

 

hash functionality has been there longer than you might think.

 

According to The SAS Hash Object in Action: 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
andreas_lds
Jade | Level 19

I would start by upgrading to the latest sas version. Both nvme you mention are way to small to hold all dataset you want to process, so where is the data stored?

makset
Obsidian | Level 7

Migration to sas 9.4 (TS1M6) is not possible yet, I don't have time, maybe in October. Anyway, I was thinking to rewrite everything to c ++, but so far I don't have much experience with c ++.
system is on nvme samsung mzvpv256
All the data is stored here nvme wds100t3xoc-00sj 1TB, and I have one more

SASKiwi
PROC Star

IMO,it sounds like you don't have enough storage regardless of what language you use to process your data files. If your 16 files completely fill your disk drive you have no room to do anything else.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 1842 views
  • 3 likes
  • 7 in conversation