Solved: Re: Importing .csv.gz files - Page 2

Tom · Posted 07-17-2025 10:53 AM

@Season wrote:
So the selection process still takes place after the entire importation is done, right?

A text file is linear. There is no way to read it without actually reading it.

Season · Posted 07-17-2025 11:00 AM

Thank you! I consulted Deepseek on resolving this issue in R and it provided a "flowing decompression" method of dealing with this problem. In short, batches of observations are decompressed, imported, selected and stored. When one cycle finishes, a second batch is decompressed while the first batch of decompressed file is deleted, and so on. The stored observations, which is what we finally want yet is distributed as multiple small datasets for the time being, is stacked to form a large one. Can SAS do something like this?

Tom · Posted 07-17-2025 11:05 AM

@Season wrote:

Thank you! I consulted Deepseek on resolving this issue in R and it provided a "flowing decompression" method of dealing with this problem. In short, batches of observations are decompressed, imported, selected and stored. When one cycle finishes, a second batch is decompressed while the first batch of decompressed file is deleted, and so on. The stored observations, which is what we finally want yet is distributed as multiple small datasets for the time being, is stacked to form a large one. Can SAS do something like this?

Why would you want to? SAS does NOT load the whole dataset into memory to work with it, like the original base R does with variables (objects as R calls them). So no tricks to make it use less memory is typically needed when working in SAS.

Season · Posted 07-17-2025 11:33 AM

Because loading in the entire dataset is too large. I understand that the importation process might not need all of the file to be loaded into memory in SAS, but the question is the resultant imported dataset is too large to be stored in memory as well.

Tom · Posted 07-17-2025 12:08 PM

@Season wrote:

Because loading in the entire dataset is too large. I understand that the importation process might not need all of the file to be loaded into memory in SAS, but the question is the resultant imported dataset is too large to be stored in memory as well.

SAS stores datasets on disk, not in memory. So large amounts of memory are not needed to work with datasets. Especially one that only has 40 variables. The only place you will have memory issues would be if you tried to do analysis that resulted in creating matrices that were too large to store in memory. For example trying to using CLASS variable with millions of distinct classes.

Saving such a large dataset on disk might be any issue however. The SAS dataset structure is not that efficient but using the COMPRESS=YES option can make them take a little less disk space.

Season · Posted 07-17-2025 10:17 PM

Thank you for your patient illustration! Could you please tell me where to specify the COMPRESS=YES option?

Tom · Posted 07-17-2025 11:24 PM

You set the system option using the OPTIONS statement.

options compress=yes;

You set it at the LIBREF level using the COMPRESS= option of the LIBNAME statement.

libname mylib 'myfolder_name' compress=yes;

You can set it at the DATASET level using the COMPRESS= dataset option.

data mylib.myds(compress=yes);
  infile .....

Season · Posted 07-17-2025 10:52 PM

Is it possible, then, to specify the starting and ending row of the .csv.gz file and let SAS read in the designated subset of data only?

Tom · Posted 07-17-2025 11:21 PM

@Season wrote:

Is it possible, then, to specify the starting and ending row of the .csv.gz file and let SAS read in the designated subset of data only?

Yes. Setting the starting observation number (really the starting LINE number) you have already seen in the example INFILE statements posted above. To tell where to stop use the OBS= option of the INFILE statement.

So to read the first 100 lines of actual data you would use FIRSTOBS=2 and OBS=101 (skipping the header line).

Season · Posted 07-18-2025 01:03 AM

Thank you so much for your very informative and helpful reply! So my question can now be resolved by a macro.

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away