I have several data files about 1 TB in total (16 files).
I want to have access to the data contained therein as soon as possible. I am asking for some advice and thoughts.
my workstation:
W8.1Pro
sas 9.2
2x xeon e5-2630 v3
RAM 64GB
nvme samsung mzvpv256
nvme wds100t3xoc-00sj 1TB
Best regards and thank you in advance for your help
@Reeza wrote:
Are these SAS files or text files?
sas file
@Reeza wrote:
what's your question? What do you need help with?
The question is not about a specific problem.
I have, for example, 16 cores and a fairly fast nvme, so maybe I will divide each file into 16 equal parts and process them in parallel?
maybe there is another way?
maybe a hasch table? I'm learning now
best regards
It all depends on what you are trying to do. In general it might be best to summarize the data into much smaller size and then do your analysis from the summary. But whether that is possible or how to do it depends on what analysis you are doing.
Write what comes to your mind in this specific topic or indicate places where I can find something valuable in your opinion and what I could have missed
-I used the hash table
-I got interested in sasphile but it has big limitations in my opinion
Your question is way to vague and generic to propose anything else than reading/books. So if it's about dealing with large data sets then you might want to Google for SAS topics dealing with performance - and there's lot about this out there.
Also spending time reading SAS® 9.4 Language Reference: Concepts, Sixth Edition is likely valuable for you as it explains how SAS actually works and how it processes data.
About SAS hashes: They get loaded into memory with fully expanded column length so likely not suitable for your huge tables unless you've got actually the required memory available.
I doubt there is hash functionality in SAS 9.2.
I recommend you get your SAS software up-to-date. I would expect SAS 9.4 to be more efficient and offer more capabilities than 9.2.
What in sas 9.4 (TS1M6) is so good (better than 9.2)
I suggest you refer to the SAS documentation for a complete list of improvements (documentation.sas.com). There are hundreds if not thousands of them.
I'd expect SAS 9.4 to be significantly faster than 9.2 as well, although that may depend on the type of processing you are doing.
BTW I just checked and you are in luck. DATA Step HASH was implemented in SAS 9.1 so you will have it in 9.2: https://support.sas.com/kb/11/391.html
@mkeintz - Yes, see my follow up post.
hash functionality has been there longer than you might think.
According to The SAS Hash Object in Action:
In SAS® Version 9.1, the hash table - the very first object introduced via the DATA Step Component Interface in Version 9.0 - has finally become robust and syntactically stable
I would start by upgrading to the latest sas version. Both nvme you mention are way to small to hold all dataset you want to process, so where is the data stored?
Migration to sas 9.4 (TS1M6) is not possible yet, I don't have time, maybe in October. Anyway, I was thinking to rewrite everything to c ++, but so far I don't have much experience with c ++.
system is on nvme samsung mzvpv256
All the data is stored here nvme wds100t3xoc-00sj 1TB, and I have one more
IMO,it sounds like you don't have enough storage regardless of what language you use to process your data files. If your 16 files completely fill your disk drive you have no room to do anything else.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.