BookmarkSubscribeRSS Feed
Martin_Bryant
Quartz | Level 8

I'm a SAS newbie and I don't seem to be able to work out the correct syntax for this. I have a dataset (fileList) with one field (projFile) which holds a filename. I wish to open the file and read the contents into a second field that will be created. The file is zipped (it's a SAS-EG project file) and so I'm told that I should use Filename statement with the zip option to read the file. However, no matter how I reference projFile it doesn't like it.

 

data fileList;
set fileList;
    filename inzip zip "&projFile" member="project.xml";
    infile inzip;
    input fileContent $char2000.;
    output;
run;

I may also have the input statement wrong, but until I can get past this issue, I don't know. Thanks.

 

9 REPLIES 9
Kurt_Bremser
Super User

The first thing you positively MUST not do is this:

 

data fileList;
set fileList;

because you immediately destroy your filelist dataset.

 

You will not be able to do this in one data step, you have to run multiple data steps for the project files.

%macro import_one(projfile,outds);

filename inzip zip "&projFile";

data &outds;
set fileList;
infile inzip member="project.xml" end=done;
do until (done);
    input fileContent $char2000.;
    output;
end;
run;

%mend;

data _null_;
set filelist;
call execute(cats('%nrstr(%import_one(',projfile,",",dataset_name,"))"));
run;

This is, of course, untested.

Martin_Bryant
Quartz | Level 8
Excellent, thanks. I will try that now.
Kurt_Bremser
Super User

Note that, for my code to work, dataset FILELIST needs to have a second variable for the output dataset. This is necessary because you can't use the ZIP file name for a dataset. But you might be able to extract a valid dataset name on the fly from the ZIP file name.

Martin_Bryant
Quartz | Level 8

Maybe I need to restate my problem. I have the following code which works well.

 

filename inzip zip "/user/Martin/myproject.egp" member="project.xml";
data _null_;   
   infile inzip;
   input;
   put _infile_ ;
run;

Now, rather than display one file content in the log, I wish to create a dataset that holds two variables, the fileName and fileContent, and has one observation for each file I can find.

I already have code that generate a dataset (fileList) that has one variable, projFile (full path and filename of the .egp files) and one observation for each file.

So I wish to put the above code into a loop that uses projFile as the filename, and outputs to a new dataset. Does that help? I'm trying to write my own code but I clearly still have a lot to learn about SAS-EG code, especially macros. Thanks.

Kurt_Bremser
Super User

To somehow correctly read the XML data from one of my project files, I had to do this:

filename inzip zip "/path/project.egp" member="project.xml" encoding="utf-16" termstr=LF;

data content;   
infile inzip truncover firstobs=2;
input line $256.;
run;

The first line gets scrambled because of the BOM, so I used firstobs=2; but I still got a WARNING for a character in observation 0 that could not be transcoded.

After this, the dataset contains an observation for every line of XML text in the project.xml file in the project file.

 

Next, we need to decide how to proceed.

 

What is settled is this:

  • since project.xml is a member in a ZIP archive, we need to run a separate FILENAME statement and DATA step for each project
  • we will have a multiple of observations for every single project file; to condense them into one observation and reduce the possible loss of data as much as possible, we need a LENGTH of 32767 for the variable.

What do you intend to do with the resulting dataset?

Martin_Bryant
Quartz | Level 8
Once I have my full dataset listing the projects and their contents, then this will be used as a searchable repository of code. We will, for example, be able to enter a dataset name and find out which code refers to it. Then if we need to change the structure of said dataset, we can ensure that nothing else messes up.
Kurt_Bremser
Super User

Much luck with that.

 

A short project with just 4 nodes and a few datasets from one of the nodes has a project.xml of 130 K.

From that you need to determine the code names in the labels and find with which subdirectory in the ZIP they are related. Next, read the code.sas from that subdirectory and then parse the code for dataset names, but reliable code parsing is a problem in itself. How (e.g.) do you detect a dataset name in the context of code?

 

You should check the SCAPROC procedure before diving into all that work.

 

Martin_Bryant
Quartz | Level 8
Thanks,I'm getting there with this. The SCAProc sounds useful, but there is no way I can get everyone to use it. So I think I'm stuck with this method.
AlanC
Barite | Level 11

Martin,,

 

Parse the EG xml. Get the SAS programs. Create a loop to execute each one using a code prepend of the following:

 

options mfile mprint obs=0 noerrorbyabend errors=0 source source2 nonotes;

filename mprint 'c:\temp\[INSERT_NAME].txt’;

 

Then, parse the [INSERT_NAME].txt file for the datasets.

 

Parsing SAS code is not trivial (aka years of work). The above gives you a means of resolving the macro code: that is more than half the battle. 

 

Here is an example piece of code to illustrate:

 

options mfile mprint obs=0 noerrorbyabend errors=0 source source2 nonotes;

filename mprint 'c:\temp\TEST_SAS_MPRINT.txt';

 

data _null_; file mprint; run;

 

%macro getData(ds=);

   data test;

      set &ds.;

   run;

%mend;

 

%getData(ds=sashelp.class);

 

options mprint;

 

https://github.com/savian-net

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2017 views
  • 2 likes
  • 3 in conversation