BookmarkSubscribeRSS Feed
FreelanceReinh
Jade | Level 19

Week after week many people post questions on the community boards without providing sample data in the form of a data step. Why don't they use Mark Jordan's (@SASJedi's) convenient %data2datastep macro, which is presented in @Reeza's well-received library article from March 2016?

 

Indeed, a post using %data2datastep ("as is") would contain the keyword datalines4. But it seems that merely 96 discussions among all threads from the last 12 months contain this keyword.

 

Yesterday, a user who had been pointed to the library article wrote: "I don't understand that reading at all!"

 

Let me try to put myself in the position of a fictitious user with only very basic SAS knowledge and a non-IT background, reading the article.

 

  1. The first paragraph mentions "SASHELP datasets". Someone who is learning SAS on the job might not have come across these datasets. Maybe a link to more information would be helpful (cf. this discussion from March 2016)?

  2. The second paragraph is about the "SAS macro", "SAS University Edition" and "your autoexec" -- three terms the beginner might not be familiar with -- and the "macro call" follows. For someone who has never "called" a SAS macro (or who has been discouraged from using macros as a beginner) this could sound intimidating.
    Suggestion: Would it help to avoid the term "macro" (until later in the article) and limit the explanations to plain instructions like "click there, copy that, ..."?

  3. The 6-step instructions start with a download link, don't they? No, it's a link to another article (Mark Jordan's blog post). Only at the end of Mark's post (not at the bottom of that page) there are download links. Two of them! "Original" and "latest update".
    Suggestion: Put one link directly into Reeza's article without even mentioning "macros", "versions" or "updates". (See item 8 below, though.)

  4. Now I've downloaded a zip file. Not sure why it has to be a zip file as it contains a single .sas file of only 3 or 5 KB.
    Suggestion: Simply provide a link to a .sas file (if at all, see item 😎 -- no need for "extraction".

  5. The next two and a half steps are intended to users of SAS UE. Other users may feel lost.
    Suggestion: If necessary after the below simplifications, SAS could write similar instructions for other common scenarios, most notably a standard SAS session. (I don't know if SAS EG, Viya, Studio, etc. would require different instructions.)

  6. While it is convenient in some situations to use the Autoexec feature, it is not necessary here.
    Suggestion: Simplify the process as much as possible (with an impatient first-time user in mind). Autoexec can be suggested in a footnote.

  7. It can happen (especially after implementing item 4 above) that people open the downloaded .sas file -- and are overwhelmed by 90+ lines of macro code. (This happened to user Tauqeer, who commented on Mark's blog post.)
    To make this less likely, we could hide the macro code and just supply a short download program which could look something like this (draft code):
    filename _d2d url 'http://.../data2datastep.sas';
    filename _prg temp;
    
    data _null_;
    infile _d2d;
    file _prg;
    input;
    put _infile_;
    run;
    
    %inc _prg;
    
    %data2datastep(!HELP)
  8. The above program being so short, we could do entirely without a download link (thus avoiding a psychological hurdle for some people), put these few lines into a code box and say: "Run this program."

  9. Macro %data2datastep should not issue warning messages under normal circumstances. Currently, the harmless "WARNING: The quoted string currently being processed has become more than 262 characters ..." can occur if the user's dataset has many variables.
    Suggestion: To suppress the warning, set system option NOQUOTELENMAX and at the end restore it to what it was before (option value saved using getoption at the start).

  10. Currently, %data2datastep creates a text file in a user-specified or default location. So, the user is supposed to know this location, navigate to it and open the file manually.
    Suggestion: Let %data2datastep open this file automatically. In a standard SAS Display Manager session this could be done with a DM command:
    dm 'filepad "&file"';
    But this might not work in SAS UE (?).

  11. Suggestions to simplify the macro call (i.e. create a "light" version of %data2datastep with fewer parameters):
    a) The user's text file could always be created with a fixed name in a new subfolder of the WORK directory.
    b) The existing macro parameter dsn already accepts both one-level and two-level dataset names, thus making parameter lib dispensable.

  12. With only two macro parameters left (after implementing item 11) -- dsn and obs --, usage instructions for %data2datastep could be fairly short. (Cf. those given by SAS Jedi himself in his reply to the above-mentioned user Tauqeer.)

  13. Without parameter outlib (from version 2 of %data2datastep) there is a risk of inexperienced users overwriting a permanent dataset with the first few observations of it.
    Suggestion: The output dataset could always be named have -- the user can still edit the name manually.

 

To be clear, this is not meant as a criticism. I was just wondering why %data2datastep is so underused.

42 REPLIES 42
BeverlyBrown
Community Manager

Hi @FreelanceReinh, your post is timely. I'm literally working on plans for a new SAS user forum where content like this would be *very* well received. We want the space to be exceptionally welcoming and helpful for folks trying to get up to speed on SAS, whether they're students or someone encountering SAS for the first time in the context of their job. I'd appreciate ideas from you and fellow members on ways to make it awesome.

Register now for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.

novinosrin
Tourmaline | Level 20

@FreelanceReinh wrote " I was just wondering why %data2datastep is so underused"

 

Feedback from my fellow DePaul university College of computing  class and lab mates, they don't even know such a thing existed. That never came up as a pop up or an alert for newbies. 

 

 

FreelanceReinh
Jade | Level 19

@novinosrin: Sure, this can be another reason. However, before writing my post I had checked the "Getting Started" page and found a link to Reeza's library article in the very first entry of that page, titled "How to get fast, helpful answers" (in item 3 of "3 steps to compose a great question"). The latter is also linked at the bottom of the "New Message" screen (under the checklist "Before pressing POST ..."). So, the information is only one or two clicks away. But of course, new users of the Communities can arrive on a variety of routes. Maybe they could be guided a little more.

art297
Opal | Level 21

@FreelanceReinh: I agree with almost everything you said. 

 

Three additional possible problems: (1) the macro doesn't use named parameters (which would make it a lot easier [IMHO] to use; (2) there isn't an example usage line; and (3) one parameter isn't defined in the local parameters list thus produces a warning.

 

I think I've corrected them in the following (which also includes the NOQUOTELENMAX parameter adjustment:

%macro data2datastep(dsn=,lib=,outlib=,file=,obs=,fmt=,lbl=);
%local varlist fmtlist LBLLIST inputlist msgtype NQLM;

%let NQLM=%sysfunc(getoption(NOQUOTELENMAX));

%if %superq(obs)= %then %let obs=MAX;

%let msgtype=NOTE;
%if %superq(dsn)= %then %do;
   %let msgtype=ERROR;
   %put &msgtype: You must specify a data set name;
   %put;
   %goto syntax;
%end;
%let dsn=%qupcase(%superq(dsn));
%if %superq(dsn)=!HELP %then %do;
%syntax:
   data _null_;
      call symput ('LS',getoption('LS','startupvalue'));
   run;
   options ls=100;
   options noquotelenmax;
   %put &msgtype: &SYSMACRONAME macro help document:;
   %put &msgtype- Purpose: Converts a data set to a SAS DATA step.;
   %put &msgtype- Syntax: %nrstr(%%)&SYSMACRONAME(dsn<,lib,outlib,file,obs,fmt,lbl>);
   %put &msgtype- dsn:    Name of the dataset to be converted. Required.;
   %put &msgtype- lib:    LIBREF of the original dataset. (Optional - if DSN is not fully qualified);
   %put &msgtype- outlib: LIBREF for the output dataset. (Optional - default is WORK);
   %put &msgtype- file:   Fully qualified filename for the DATA step code produced. (Optional);
   %put &msgtype-         Default is %nrstr(create_&outlib._&dsn._data.sas) in the SAS default directory.;
   %put &msgtype- obs:    Max observations to include the created dataset.;
   %put &msgtype-         (Optional) Default is MAX (all observations);
   %put &msgtype- fmt:    Format the numeric variables in the output dataset like the original data set? ;
   %put &msgtype-         (YES|NO - Optional) Default is YES;
   %put &msgtype- lbl:    Reproduce column labels in the output dataset? ;
   %put &msgtype-         (YES|NO - Optional) Default is YES;
   %put;
   %put NOTE:   &SYSMACRONAME cannot be used in-line - it generates code.;
   %put NOTE-   Every FORMAT in the original data must have a corresponding INFORMAT of the same name.;
   %put NOTE-   Data set label is automatically re-created.;
   %put NOTE-   Only numeric column formats can be re-created, character column formats are ingnored.;
   %put NOTE-   Use !HELP to print these notes.;
   options ls=&ls;
   %return;
%end; 
%if %superq(fmt)= %then %let fmt=YES;
%let fmt=%qupcase(&fmt);
%if %superq(lbl)= %then %let lbl=YES;
%let lbl=%qupcase(&lbl);

%if %superq(lib)= %then %do;
    %let lib=%qscan(%superq(dsn),1,.);
    %if %superq(lib) = %superq(dsn) %then %let lib=WORK;
    %else %let dsn=%qscan(&dsn,2,.);
%end;
%if %superq(outlib)= %then %let outlib=WORK;
%let lib=%qupcase(%superq(lib));
%let dsn=%qupcase(%superq(dsn));

%if %sysfunc(exist(&lib..&dsn)) ne 1 %then %do;
   %put ERROR: (&SYSMACRONAME) - Dataset &lib..&dsn does not exist.;
   %let msgtype=NOTE;
   %GoTo syntax;
%end;

%if %superq(file)= %then %do;
   %let file=create_&outlib._&dsn._data.sas;
   %if %symexist(USERDIR) %then %let file=&userdir/&file;
%end;

%if %symexist(USERDIR) %then %do;
   %if %qscan(%superq(file),-1,/\)=%superq(file) %then
      %let file=&userdir/&file;
%end;

proc sql noprint;
select Name
      into :varlist separated by ' '
   from dictionary.columns
   where libname="&lib"
     and memname="&dsn"
;
select case type
          when 'num' then 
             case 
                when missing(format) then cats(Name,':32.')
                else cats(Name,':',format)
             end 
          else cats(Name,':$',length,'.')
       end
      into :inputlist separated by ' '
   from dictionary.columns
   where libname="&lib"
     and memname="&dsn"
;
%if %qsubstr(%superq(lbl),1,1)=Y %then %do;
select strip(catx('=',Name,put(label,$quote.)))
   into : lbllist separated by ' '
   from dictionary.columns 
   where libname="&lib"
     and memname="&dsn"
     and label is not null 
;
%end;
%else %let lbllist=;
select memlabel 
   into :memlabel trimmed
   from dictionary.tables
   where libname="&lib"
     and memname="&dsn"
;
%if %qsubstr(%superq(fmt),1,1)=Y %then %do;
select strip(catx(' ',Name,format))
      into :fmtlist separated by ' '
   from dictionary.columns
   where libname="&lib"
     and memname="&dsn"
     and format is not null 
     and format not like '$%'
;
%end;
%else %let fmtlist=;
quit;

%put _local_;

data _null_;
   file "&file" dsd;
   if _n_ =1 then do;
   %if %superq(memlabel)= %then %do;
      put "data &outlib..&dsn;";
   %end;
   %else %do;
      put "data &outlib..&dsn(label=%tslit(%superq(memlabel)));";
   %end;
      put @3 "infile datalines dsd truncover;";
      put @3 "input %superq(inputlist);";
   %if not (%superq(fmtlist)=) %then %do;
      put @3 "format %superq(fmtlist);";
   %end;
   %if not (%superq(lbllist)=) %then %do;
      put @3 "label %superq(lbllist);";
   %end;
      put "datalines4;";
   end;
   set &lib..&dsn(obs=&obs) end=last; 
   put &varlist @;
   if last then do;
      put;
      put ';;;;';
   end;
   else put;
run;
options &NQLM.;
%mend;

/*
example usage:

%data2datastep(dsn=sashelp.class, file=/folders/myfolders/makeclass.sas)
*/

I, too, would like the resulting code to open up a window, but that particular set of dm commands doesn't work on SAS UE.

 

Art, CEO, AnalystFinder.com

 

 

 

Tom
Super User Tom
Super User

What type of code should the macro generate?

How much work should go into making the macro replicate exactly the input dataset versus making a macro that is easy to use and understand?

 

PS.  Just because the parameters are defined as positional does NOT mean that you can't call them using their names.  One of the nicest usability features of SAS macros. Positional parameters should be used when it is "obvious" what the argument (or at least the main argument) to the macro is. In this case that would be the name of the dataset to replicate.

 

%macro ds2post
/----------------------------------------------------------------------------
Generate data step to post content of dataset on SAS Communities
----------------------------------------------------------------------------*/
(data  /* Name of dataset to post */
....
);
....
%mend ds2post ;

 

Tom
Super User Tom
Super User

Here is what I mean. 

Say we had a simple macro like this.

** UPDATED **

%macro ds2post
/*----------------------------------------------------------------------------
Generate data step to post content of dataset on SAS Communities
----------------------------------------------------------------------------*/
(data     /* Name of dataset to post *REQ*  */
,target=  /* Name to use in generated data step. (default is memname of &DATA) */
,obs=20   /* Number of observations to generate */
,file=log /* Fileref or quoted physical name of file to hold generated code */
,format=  /* Optional format list to use when generating data lines */
);
%local _error;
%*---------------------------------------------------------------------------
Check user parameters.
----------------------------------------------------------------------------;
%let _error=0;
%if "%upcase(%qsubstr(&data,1,2))" = "-H" %then %let _error=1;
%else %if %length(&data) %then %do;
  %if not (%sysfunc(exist(%qscan(&data,1,())))
        or %sysfunc(exist(%qscan(&data,1,()),view))) %then %do;
    %let _error = 1;
    %put ERROR: "&data" is not a valid value for the DATA parameter.;
    %put ERROR: Unable to find the dataset. ;
  %end;
%end;
%else %do;
  %let _error = 1;
  %put ERROR: The DATA parameter is required.;
%end;
%if not %length(&target) %then %let target=%qscan(%qscan(&data,1,()),-1,.);
%if not %length(&obs) %then %let obs=20;
%else %let obs=%upcase(&obs);
%if "&obs" ne "MAX" %then %if %sysfunc(verify(&obs,0123456789)) %then %do;
  %let _error = 1;
  %put ERROR: "&obs" is not a valid value for the OBS parameter.;
  %put ERROR: Valid values are MAX or non-negative integer. ;
%end;
%if not %length(&file) %then %let file=log;

%if (&_error) %then %do;
*----------------------------------------------------------------------------;
* When there are parameter issues then write instructions to the log. ;
*----------------------------------------------------------------------------;
data _null_;
  put
  '%DS2POST'
//'SAS macro to copy data into a SAS Data Step in a '
  'form which you can post to on-line forums.'
//'Syntax:'
/ ' %ds2post(data=,target=,obs=,format=,file=)'
//' data   = Name of SAS dataset (or view) that you want to output.'
//' target = Name to use for generated dataset.'
  ' Default is to use name of the input.'
//' obs    = Number of observations to output. Use MAX to copy complete dataset.'
  ' Default is 20.'
//' file   = Fileref or quoted physical filename for code.'
  ' Default is the SAS log.'
//' format = Optional list of <var_list> <format spec> pairs to use when writing'
  ' data lines.' ' Setting format=_all_ will clear all formats so raw data'
  ' values are written.'
//'Note that this macro will NOT work well for really long data lines.'
 /'If your data has really long variables or a lot of variables then consider'
  ' splitting your dataset in order to post it.'
  ;
run;
%end;
%else %do;
*----------------------------------------------------------------------------;
* Get contents information and sort by VARNUM ;
*----------------------------------------------------------------------------;
proc contents noprint data=&data
  out=_contents_(keep=name varnum type length format: inform: memlabel label)
;
run;
proc sort data=_contents_ ;
  by varnum;
run;
*----------------------------------------------------------------------------;
* Generate top of data step ;
*----------------------------------------------------------------------------;
filename _code_ temp;
data _null_;
  length firstvar name $60 string $300 ;
  retain firstvar ;
  file _code_ column=cc ;
  set _contents_ end=eof ;
  if _n_=1 then do;
    put "data &target" @;
    if not missing(memlabel) then do;
      string=quote(trim(memlabel),"'");
      put '(label=' string ')' @;
    end;
    put ';';
  end;
  name=nliteral(name) ;
  if _n_=1 then firstvar=name;
  string=cats(ifc(type=2,'$',' '),length);
  put '  attrib ' name 'length=' string @;
  if formatl or not missing(format) then do;
     string=cats(format,ifn(formatl,formatl,.),'.',ifn(formatd,formatd,.));
     put 'format=' string @ ;
  end;
  if informl or not missing(informat) then do;
     string=cats(informat,ifn(informl,informl,.),'.',ifn(informd,informd,.));
     if cc+9+length(string)>80 then put / @4 @ ;
     put 'informat=' string @ ;
  end;
  if not missing(label) then do;
     string=quote(trim(label),"'");
     if cc+7>80 then put / @4 'label=' string @ ;
     else if cc+7+length(string)>80 then put 'label=' / @4 string @ ;
     else  put 'label=' string @;
  end;
  put ';' ;
  if eof then do;
     put "  infile datalines dsd dlm='|' truncover;" ;
     put '  input ' firstvar '-- ' name ';';
     put 'datalines4;' ;
  end;
run;
*----------------------------------------------------------------------------;
* Generate data lines ;
*----------------------------------------------------------------------------;
data _null_;
  file _code_ mod dsd dlm='|';
%if (&obs ne MAX) %then %do;
  if _n_ > &obs then stop;
%end;
  set &data ;
%if %length(&format) %then %do;
  format &format;
%end;
  put (_all_) (+0) ;
run;
data _null_;
  file _code_ mod ;
  put ';;;;';
run;
  %if "%qupcase(&file)" ne "_CODE_" %then %do;
*----------------------------------------------------------------------------;
* Copy generated code to target file name ;
*----------------------------------------------------------------------------;
data _null_ ;
  infile _code_;
  file &file ;
  input;
  put _infile_;
run;
  %end;
%end;
%mend ds2post ;

So with a call like this you could generate the SAS code to create a copy of SASHELP.CLASS to the log.

%ds2post(sashelp.class);

Which you can just copy and paste.

data class(label="Student Data" );
  attrib Name length=$8 ;
  attrib Sex length=$1 ;
  attrib Age length=8 ;
  attrib Height length=8 ;
  attrib Weight length=8 ;
  infile datalines dsd dlm='|' truncover;
  input Name -- Weight ;
datalines4;
Alfred|M|14|69|112.5
Alice|F|13|56.5|84
Barbara|F|13|65.3|98
Carol|F|14|62.8|102.5
Henry|M|14|63.5|102.5
James|M|12|57.3|83
Jane|F|12|59.8|84.5
Janet|F|15|62.5|112.5
Jeffrey|M|13|62.5|84
John|M|12|59|99.5
Joyce|F|11|51.3|50.5
Judy|F|14|64.3|90
Louise|F|12|56.3|77
Mary|F|15|66.5|112
Philip|M|16|72|150
Robert|M|12|64.8|128
Ronald|M|15|67|133
Thomas|M|11|57.5|85
William|M|15|66.5|112
;;;;

 

 

art297
Opal | Level 21

@Tom: I agree!

 

Art, CEO, AnalystFinder.com

 

FreelanceReinh
Jade | Level 19

@art297 and @Tom: Many thanks for taking the time to read my lengthy post and for responding not only with a couple of remarks but even with full-grown updates of the macro! I'm sure in many companies the revision of a (non-trivial) standard macro would be considered more time-consuming.

 

My primary focus was on the library article. I didn't scrutinize %data2datastep. My suggested code changes were just a "by-product" of a few test runs of the macro, which I had never used before.

 

I think ease of use is of paramount importance in this case. So, I'm in favor of a macro that covers the majority of use cases with only one or two parameters (preferably positional parameters). Good idea, @Tom, to write the code to the log by default. I was hesitant to suggest this because of the line size limitation. (Actually, one could temporarily set LS=MAX, i.e. 256.) But with the optional parameter specifying a physical file there is an effective solution for wide datasets.

ballardw
Super User

@Tom wrote:

Here is what I mean. 

Say we had a simple macro like this.

** UPDATED **

<updated macro deleted I just wanted to be sure which topic I am responding to>

 

 


How about instead of sending output to a file or log using file print to send to results?

I don't use EG or Studio so I'm not sure if that would actually work for some of the users but if the output appeared in a place that they may be learning to expect output it might help. Especially for UE the whole concept of where you can put files seems confusing.

Tom
Super User Tom
Super User

Speaking of UE how about adding this functionality as one of the default set of tasks that is included with SAS/Studio?

SASJedi
SAS Super FREQ

I am not a fan of keyword parameters because the end user has to remember the macro variable names and type them correctly for every use. I prefer the simplicity of positional parameters combined with default values (provided in the macro code) and easy-to-use self documentation. It's probably more a matter of style and personal predilection.  I'm not a great 'rememberer', neither am I the world's best typist, so I have always personally found keyword parameters problematic. 

Check out my Jedi SAS Tricks for SAS Users
PaigeMiller
Diamond | Level 26

All of this is great, I hope whatever steps are taken are very successful, and I wish you the best. However, you can mark me down as pessimistic at best.

 

I have been involved in many on-line communities (and am still involved in many on-line communities). Each one has an article instructing beginners that they need to first search, and if you can't find what you are looking for, then provide specific information and state their problem clearly, etc. etc. There have been long discussions about what to do about this, and efforts put forth to make this information easily available to users. I doubt that these efforts have ever made a difference.

 

It is an ongoing frustration for me (and probably for others) when people don't explain their problem clearly, sometimes don't even ask a clear question; they don't state what software they are using or what operating system they are using. In SAS, people have to be told to do things the way we want, because they don't know, they don't care and they won't care, they just want their answer. I doubt that an instruction document or a more welcoming user experience will change this.

 

An amusing story:

In another on-line forum, a user wrote something like this: "My software won't launch. It launched fine yesterday". That was the entire message. The response, which got many "likes", was "Your mouse is broken."

--
Paige Miller
art297
Opal | Level 21

@PaigeMiller: Regarding your anecdote thought you mind find one of my own actual experiences amusing. A couple of years ago I couldn't figure out why I wasn't able to make a full system backup of my machine. My wireless mouse was working perfectly.

After hours of futility, and I can't recall why I did it, but replacing the wireless mouse with a non-wireless one, was all that was needed for the backup to work correctly.

 

Art, CEO, AnalystFinder.com

 

FreelanceReinh
Jade | Level 19

Thanks a lot, @PaigeMiller. I'm not too optimistic either. Still, improvements to the user experience are always a good thing.

 

I understand your frustration with all these imprecise problem descriptions. Apparently, there is a correlation between the ability to describe a problem accurately and the ability to solve this problem.

 

Would it help to add (mandatory) checkboxes or drop-down lists (for SAS version etc.) to the "New Message" field?

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Discussion stats
  • 42 replies
  • 3138 views
  • 36 likes
  • 9 in conversation