BookmarkSubscribeRSS Feed
Jumboshrimps
Obsidian | Level 7

Boss is requesting my macro that gunzips data sets in hundreds of folders

be replaced in the existing SAS code base of over 80 programs.

This macro will be in the SAS_utils folder, so all 80 programs can call it.

It is a very short (61 lines) macro that does the following:

  • Identify current temp SAS WORK directory
  • gunzip data set residing on server to temp SAS WORK Directory
  • Have calling SAS program read in dataset from temp SAS WORK directory.

Calling program's change is below:

 

Instead of 

"Data want;

set &lib.&dataset;",

run;

 

it has to be:

 

"Data want;

set %macro(&lib.&dataset);"

run;

 

OK fine.

 

My macro is gunziping to the temporary SAS WORK directory on a Linux server (9.4).

Temp SAS work directory is a very narly string - "/data/go/to/your/happy/place/tempsas/SAS_work0802000044B5_major_corp_server.com"

 

Tempoary WORK directory is generated from the snippet below:

 

put %sysfunc(getoption(work));
%let tmpSASwrk = %sysfunc(getoption(work));

THIS IS A FIXED REQUIREMENT AND CANNOT BE CHANGED.

 

Once the .gz dataset  is unpacked and read in from the temporary SAS WORK directory,

when the SAS session ends, the dataset is deleted, so no

cleanup is necessary. (These files are very large) 

Some of these SAS programs can run over 24 hours.

 

Error occurs when Linux SAS runs the statement below:

 

x "gunzip -c /data/share/user/bigdatset.sas7dat.gz >

/data/go/to/your/happy/place/tempsas/SAS_work0802000044B5_major_corp_server.com/

bigdatset.sas7dat";

 

Errror:

ERROR: File WORK.X.DATA does not exist.

 

I'm assuming due to the SET statement, SAS is interpreting "x gunzip -c yada yada" as a literal "x"

referring to a dataset named "x".  (Then again, I could be wrong).

 

This does not happen when I run the macro by itself  from within the calling SAS program

(without the SET statement) i.e."%macro(&lib.&dataset);" -

macro runs fine,

the .gz file is unpacked in the tempsas directory,

and is ready to be used in a data step.

Problem is, that is not acceptable.  Macro has to be run within the existing code base so a global

replacement can be done on "SET &lib.&dataset;"   to 

"SET %macro(&lib.&dataset);" 

Bash script that calls SAS program will export some global variables

to the SAS program so the program knows where the SAS_util folder can be found and will call 

the macro from there.

 

Is it it possible to run a macro within the SET statement that exits out of Linux SAS 9.4,

calls gunzip (then chmod, then change group permissions),

and returns back to the calling program

AND reads in the dataset that has been unpacked into the temp SAS directory?

 

Thank  you.

36 REPLIES 36
PaigeMiller
Diamond | Level 26

Show us the macro and the calling code ... all of it, not just selected snippets.

Error occurs when Linux SAS runs the statement below:

 

x "gunzip -c /data/share/user/bigdatset.sas7dat.gz >

/data/go/to/your/happy/place/tempsas/SAS_work0802000044B5_major_corp_server.com/

bigdatset.sas7dat";

 

If this is what appears after the SET command when the macro resolves, then this can't possibly work because SAS thinks the command is SET X "gunzip ...", and it is trying to SET data set X, which it can't find. When the macro resolves, it must produce legal valid working SAS code, and it appears that when this macro resolves in a SET statement, you get non-working code..

 

 

--
Paige Miller
Jumboshrimps
Obsidian | Level 7

Paige,

 

Thank you for your reply.

I've attached entire macro - 61 lines (with some minor edits,such as changing ID's).

 

This macro resolves as is, just not from the set statement.

 


Again, thanx,

 

 


/*********************************************************************************************************

File name: c14_uzip_v3.sas

Macro name : c14_uzip_v3

Purpose: For a given .gz file incorporate the mechanism of unzipping the file of extension .gz
and storing in the SAS WORK directory.

Parameters:

zpflname: the file name of the zipped file.
zipflpth: path of the file in .gz file type

 

options mprint mlogic symbolgen; 

 


%macro c14_uzip_v3(zipflpth,zpflname,tmpSASwrk);


%let zipflpth = %sysget (nmepath);%put nmepath >> &zipflpth;
%let zpflname = %sysget(nmeds); %put nmeds >> &zpflname;
%put &zipflpth;
%put &zpflname;

%put &zipflpth./&zpflname;



%put %sysfunc(getoption(work));

%let tmpSASwrk = %sysfunc(getoption(work));
%put &tmpSASwrk;

 


%let filref=&zipflpth.;

%if %sysfunc(fileexist("&filref./&zpflname..sas7bdat.gz")) %then %do;


x "gunzip -c &zipflpth./&zpflname..sas7bdat.gz > &tmpSASwrk./&zpflname";

x "chmod 770 &tmpSASwrk./&zpflname..sas7bdat";


%end;


%mend c14_uzip_v3;

Tom
Super User Tom
Super User

@Jumboshrimps You need to re-read what Paige wrote.

 

The macro processor evaluates the macro calls and macro variable references and passes the resulting text on the real SAS compiler to execute, just as if that text was what was in the source code to begin with.

 

The first two characters that your macro generates are X and space.  So when you call it in the MIDDLE of a SET statement the X looks to SAS like the name of the dataset you want to read.

 

When you call the macro at a statement boundary it generates two statements. Something like this:

x "gunzip -c /x/y/abc.sas7bdat.gz > /z/w//abc.sas7bdat";
x "chmod 770 /z/w//abc.sas7bdat"";

If you call it in the middle of a SET statement your statements will look like this instead:

set x "gunzip -c /x/y/abc.sas7bdat.gz > /z/w//abc.sas7bdat";
x "chmod 770 /z/w//abc.sas7bdat"";
;

So instead of two X statements you have a SET, and X and a null statement (bare semi-colon).

Reeza
Super User
What kind of data sets are you reading here? SAS 9.4M5 added support for GZIP to the filename ZIP functions. Not sure how that works if it's SAS datasets though.
Reeza
Super User
Did you know that you can pass the full path to a SAS data set instead of using the libname or member name? You'll likely want to do that here I suspect.

set 'c:\users\demo\class.sas7bdat';

This is valid code.
Jumboshrimps
Obsidian | Level 7

Thank you for  your reply.

 

Not in Linux it's not.

 

 

Kurt_Bremser
Super User

@Jumboshrimps wrote:

Thank you for  your reply.

 

Not in Linux it's not.

 

 


Don't be silly. Of course it's possible to use a physical name on a UNIX system:

37         data '$HOME/test/class.sas7bdat';
38         set sashelp.class;
39         run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set $HOME/test/class.sas7bdat has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds
Patrick
Opal | Level 21

@Jumboshrimps 

To use the macro call in a Set statement you need to write a function style macro. The main thing about such macros: You can only use macro language in it so that once the macro executes nothing is left over for the SAS compiler other than valid SAS syntax for a SET statement.

Here a white paper with details. It even has an example for a macro call in a Set statement.

 

ScottBass
Rhodochrosite | Level 12

Is this close to what you need?  Obviously adjust for linux.  I don't see why you're using a %macro for the set statement instead of just macro variables?

 

options mprint; 

* create test data ;
libname foo "\\my\UNC\path\Temp";

proc datasets lib=foo mt=(data catalog) kill nowarn nolist;
quit;

* check the results ;
title "FOO Before";
proc datasets lib=foo nowarn;
quit;

proc copy in=sashelp out=foo;
   select class cars shoes;
run;

* check the results ;
title "FOO After";
proc datasets lib=foo nowarn;
quit;

* zip them - I'm on Windows and we use 7-zip ;
%macro zip(path,data);
   %let exe=\\my\UNC\path\7-Zip\7z.exe;
   %let zip=&path\&data..7z;
   %let data=&path\&data..sas7bdat;
   data _null_;
      infile "&exe a ""&zip"" ""&data"" " pipe;
      input;
      put _infile_;
   run;
%mend;
%let path=%sysfunc(pathname(foo));
%zip(&path,class)
%zip(&path,cars)
%zip(&path,shoes)

* delete the datasets - 7-zip does not do so ;
proc datasets lib=foo mt=(data catalog) kill nowarn nolist;
quit;

* check the results ;
data _null_;
   infile "dir /b &path" pipe;
   input;
   put _infile_;
run;

**** OK, that was just to get some test data... *** ;

proc datasets lib=work mt=(data catalog) kill nowarn nolist;
quit;

* before ;
title "WORK Before";
proc datasets lib=work nowarn;
quit;

%macro unzip(path,data);
   %let exe=\\my\UNC\path\7-Zip\7z.exe;
   %let zip=&path\&data..7z;
   %let out=%sysfunc(pathname(work));
   data _null_;
      infile "&exe e ""&zip"" -o""&out"" -aoa && del ""&zip"" " pipe;
      input;
      put _infile_;
   run;
%mend;
%unzip(&path,class)
%unzip(&path,cars)
%unzip(&path,shoes)

* after ;
title "WORK After";
proc datasets lib=work nowarn;
quit;

title;

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
ScottBass
Rhodochrosite | Level 12

Is it it possible to run a macro within the SET statement that exits out of Linux SAS 9.4,

calls gunzip (then chmod, then change group permissions),

and returns back to the calling program

AND reads in the dataset that has been unpacked into the temp SAS directory?

 

 

No.  Well maybe dosubl()?  Anyway, even if that worked, I'd redesign the approach.

 

Unpack BEFORE the data step, THEN read the data.

 

Don't let the tail (an easy search and replace) wag the dog (crappy or impossible design). 

 

The fact that you have 80 programs that can be addressed by a simple search and replace on the set statement may indicate that they should have been modularized a while ago (for example macro-ized).  It stinks of copy and paste code that is now too hard to maintain.

 

Are your programs consistent enough that a multi-line search and replace, say via sed or awk, would work?

 

For example, replace:

 

 

data want;
   set &lib.&dataset;

 

 

with:

 

 

%unzip_macro(&path,&lib,&dataset);

data want;
   set @lib.&dataset;
...

 

 

Or, are your programs SO similar that you could combine them into one, or at least much fewer than 80, via a well designed macro or set of macros?


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
Kurt_Bremser
Super User

Hi Scott, it is possible:

/* create data for testing */

libname test '$HOME/test';

data test.class;
set sashelp.class;
run;

filename oscmd pipe "gzip $HOME/test/class.sas7bdat";

data _null_;
infile oscmd;
input;
put _infile_;
run;

/* the macro with %sysexec */

%macro gzip_on_the_fly(filename,libname);
%local targetpath dsfilename;
%let targetpath=%sysfunc(pathname(&libname.));
%let dsfilename=%scan(%scan(&filename.,-1,/),1,.).sas7bdat;
%sysexec gzip -dc &filename. > &targetpath./&dsfilename.;
&libname..%scan(&dsfilename.,1,.)
%mend;

/* now use it */
data test.class1;
set
  %gzip_on_the_fly($HOME/test/class.sas7bdat.gz,work)
;
run;

The "problem" I see here is that the %sysexec is about as communicative as the X statement (read: not) and does not return any information about the success or non-success of the external command (apart from setting &SYSRC). The filename pipe I used for the compression is much better in that regard.

 

The macro assumes that any input file ends on

.sas7bdat.gz

 

Edit: included libname in the code created by the macro.

Quentin
Super User
Lovely little macro function, Kurt. Really nice example of the power of the macro language.
BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Kurt_Bremser
Super User

@Quentin wrote:
Lovely little macro function, Kurt. Really nice example of the power of the macro language.

And there's nothing really fancy in it. No complicated lists and loops, no indirect addressing, no quoting/unquoting, just a few macro functions and the %sysexec.

 

And if you know that you'll always use WORK, it's like that:

%macro gzip_on_the_fly(filename);
%local targetpath dsfilename;
%let targetpath=%sysfunc(pathname(WORK));
%let dsfilename=%scan(%scan(&filename.,-1,/),1,.).sas7bdat;
%sysexec gzip -dc &filename. > &targetpath./&dsfilename.;
WORK.%scan(&dsfilename.,1,.)
%mend;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 36 replies
  • 1292 views
  • 8 likes
  • 8 in conversation