03-30-2015 11:23 AM
I'm attempting to output observations to a SAS dataset on my company's server in SAS Enterprise Guide from within a data _null_ step.
Here's the basic idea using a test dataset:
input y x1 x2;
1 2 3
10 20 30
100 200 300
file 'E:\SAS Temporary Files\_TD21860_VO-DCA-VSAS01_\Prc2\report.sas7bdat';
The file statement includes the address of the WORK folder on the company server.
The code successfully creates a file named "report", however when I attempt to open the file I receive the following error message even though I specified the SAS data set extension ".sas7bdat" in the file statement:
The open data operation failed. The following error occurred.
[Error] File WORK.REPORT.DATA is not a SAS data set.
Neither can I successfully run:
proc print data=report;
What am I doing wrong?
Thanks in advance
03-30-2015 11:34 AM
You created a TEXT file not a SAS data set. PUT generates character output and FILE generally only creates TEXT files (unless you are doing extra stuff to look like specific form of file).
If you want a SAS data set then it should be referenced with a library and named on 1) the Data step (data mylib.report with the library specified to the location, though it looks like you MIGHT have been attempting to write to the default WORK library which wouldn't need to be referenced.)
2) specified in the output method used by given procedure, often OUT= or OUTPUT though ods adds some options.
03-30-2015 01:30 PM
Did you mean something like this:
data 'E:\SAS Temporary Files\_TD21860_VO-DCA-VSAS01_\Prc2\report.sas7bdat';
03-30-2015 01:38 PM
That may work . . . but I'm trying to keep overhead memory consumption at a minimum, especially if working with big dataset. Hence the use of the data _null_ statement instead of simply doing what you suggested.
Or are my worries unjustified?
03-30-2015 01:43 PM
Maybe you are looking for Proc Copy or Proc Datasets to move the dataset around. Data _null_ in any form processes every record in the dataset and your overhead may well go down significantly using a procedure designed to move entire datasets.
03-30-2015 02:03 PM
Well I'm actually doing a lot of data processing inside the data _null_ step by reading my data into an array, and then just keeping the final output, so not sure if a proc copy or proc datasets would work inside of the data _null_. (The example I offered in my question may be a bit misleading since it's grossly simplified.)
I'll try Tom's idea and see how it works. I suppose the other option is just output the final results to the SAS log with a put statement, then copy and paste to Excel or whatever.
03-30-2015 02:21 PM
Could it be what you need is a DROP statement to prevent your array elements from being saved in your dataset? Your comment about saving memory isn't in line with what you're trying to do.
03-30-2015 02:42 PM
Put to the log, especially if you are talking about large datasets is likely even more inefficient with the added possibility of exceeding the number of lines allowed in the log. Which then adds yet another layer of complexity to the project.
I have a hard time seeing why DATA lib.datasetname is unacceptable as it will execute in basically the same time as data _null_.
libname mylib "'E:\SAS Temporary Files\_TD21860_VO-DCA-VSAS01_\Prc2";
Though I would be VERY hesitant to write anything I wanted later to a folder subordinate to a SAS temporary folder as those will get deleted at the end of the session.
You may not be aware that you can specify when data is written to the dataset. So you could retain values across iterations of the data step and then output when the desired summary has been completed for groups of records.
03-30-2015 03:10 PM
Yea, think I'll scratch the "data _null_;" idea and just use a regular "data report (keep=...);" statement to output the end results of my program computations into a separate dataset.
That works fine - I just want SAS to avoid creating a duplicate dataset in memory, then dropping the excess columns after reading the keep= line in my data statement.
03-31-2015 02:50 AM
When you use the keep= or drop= options in the set statament of the data step, you already reduce the size of the PDV and the required memory.
If you don't have gazillions of variables in your input data set (or create them in the data step), the memory consumption of a data step is negligible.
1000 numerical variables will "eat" 8K and the space needed for the metadata (name to location table), which is peanuts compared to what the SAS system itself needs to simply run.
03-31-2015 12:12 PM
Will do, thanks Tom. So far my program is running fine with the keep statement. Thank you, all, for the suggestions.
At the moment my worries about working with really big data that consumes most of the memory on my machine are mostly theoretical, but I'd like to write my code with that contingency in mind for efficiency's sake.
04-01-2015 02:24 AM
Big data is just disk space. In a data step, SAS takes only the memory needed for 1 record at a time, unless you make excessive use of functions like lag(). So the memory consumption of a data step depends mainly on the record size of the dataset(s).
This is different from software like R that loads a complete dataset into memory and treats it more like a spreadsheet.
For dealing with big data, the data step is (IMO) the #1 solution, efficiencywise. Once it works for 1 record, it works for any number of records. The only things that increase are disk space and execution time.
03-31-2015 11:58 AM
I don't see any reason why you should consume excessive memory, but if you see any indications that you are, please post back. It would indicate that you are doing something unusual, and there might be a less memory-intensive way to do it.
03-31-2015 09:44 AM
A Proc Printto, to write your put values to a text file, might be a better option when you intend to use information that is normally "put" to your log.
PROC PRINTTO LOG='Your\folder\and file\location\data null output file name.TXT' NEW; RUN;