Week after week many people post questions on the community boards without providing sample data in the form of a data step. Why don't they use Mark Jordan's (@SASJedi's) convenient %data2datastep macro, which is presented in @Reeza's well-received library article from March 2016?
Indeed, a post using %data2datastep ("as is") would contain the keyword datalines4. But it seems that merely 96 discussions among all threads from the last 12 months contain this keyword.
Yesterday, a user who had been pointed to the library article wrote: "I don't understand that reading at all!"
Let me try to put myself in the position of a fictitious user with only very basic SAS knowledge and a non-IT background, reading the article.
filename _d2d url 'http://.../data2datastep.sas';
filename _prg temp;
data _null_;
infile _d2d;
file _prg;
input;
put _infile_;
run;
%inc _prg;
%data2datastep(!HELP)
dm 'filepad "&file"';
But this might not work in SAS UE (?).
To be clear, this is not meant as a criticism. I was just wondering why %data2datastep is so underused.
Hi @FreelanceReinh, your post is timely. I'm literally working on plans for a new SAS user forum where content like this would be *very* well received. We want the space to be exceptionally welcoming and helpful for folks trying to get up to speed on SAS, whether they're students or someone encountering SAS for the first time in the context of their job. I'd appreciate ideas from you and fellow members on ways to make it awesome.
@FreelanceReinh wrote " I was just wondering why %data2datastep is so underused"
Feedback from my fellow DePaul university College of computing class and lab mates, they don't even know such a thing existed. That never came up as a pop up or an alert for newbies.
@novinosrin: Sure, this can be another reason. However, before writing my post I had checked the "Getting Started" page and found a link to Reeza's library article in the very first entry of that page, titled "How to get fast, helpful answers" (in item 3 of "3 steps to compose a great question"). The latter is also linked at the bottom of the "New Message" screen (under the checklist "Before pressing POST ..."). So, the information is only one or two clicks away. But of course, new users of the Communities can arrive on a variety of routes. Maybe they could be guided a little more.
@FreelanceReinh: I agree with almost everything you said.
Three additional possible problems: (1) the macro doesn't use named parameters (which would make it a lot easier [IMHO] to use; (2) there isn't an example usage line; and (3) one parameter isn't defined in the local parameters list thus produces a warning.
I think I've corrected them in the following (which also includes the NOQUOTELENMAX parameter adjustment:
%macro data2datastep(dsn=,lib=,outlib=,file=,obs=,fmt=,lbl=); %local varlist fmtlist LBLLIST inputlist msgtype NQLM; %let NQLM=%sysfunc(getoption(NOQUOTELENMAX)); %if %superq(obs)= %then %let obs=MAX; %let msgtype=NOTE; %if %superq(dsn)= %then %do; %let msgtype=ERROR; %put &msgtype: You must specify a data set name; %put; %goto syntax; %end; %let dsn=%qupcase(%superq(dsn)); %if %superq(dsn)=!HELP %then %do; %syntax: data _null_; call symput ('LS',getoption('LS','startupvalue')); run; options ls=100; options noquotelenmax; %put &msgtype: &SYSMACRONAME macro help document:; %put &msgtype- Purpose: Converts a data set to a SAS DATA step.; %put &msgtype- Syntax: %nrstr(%%)&SYSMACRONAME(dsn<,lib,outlib,file,obs,fmt,lbl>); %put &msgtype- dsn: Name of the dataset to be converted. Required.; %put &msgtype- lib: LIBREF of the original dataset. (Optional - if DSN is not fully qualified); %put &msgtype- outlib: LIBREF for the output dataset. (Optional - default is WORK); %put &msgtype- file: Fully qualified filename for the DATA step code produced. (Optional); %put &msgtype- Default is %nrstr(create_&outlib._&dsn._data.sas) in the SAS default directory.; %put &msgtype- obs: Max observations to include the created dataset.; %put &msgtype- (Optional) Default is MAX (all observations); %put &msgtype- fmt: Format the numeric variables in the output dataset like the original data set? ; %put &msgtype- (YES|NO - Optional) Default is YES; %put &msgtype- lbl: Reproduce column labels in the output dataset? ; %put &msgtype- (YES|NO - Optional) Default is YES; %put; %put NOTE: &SYSMACRONAME cannot be used in-line - it generates code.; %put NOTE- Every FORMAT in the original data must have a corresponding INFORMAT of the same name.; %put NOTE- Data set label is automatically re-created.; %put NOTE- Only numeric column formats can be re-created, character column formats are ingnored.; %put NOTE- Use !HELP to print these notes.; options ls=&ls; %return; %end; %if %superq(fmt)= %then %let fmt=YES; %let fmt=%qupcase(&fmt); %if %superq(lbl)= %then %let lbl=YES; %let lbl=%qupcase(&lbl); %if %superq(lib)= %then %do; %let lib=%qscan(%superq(dsn),1,.); %if %superq(lib) = %superq(dsn) %then %let lib=WORK; %else %let dsn=%qscan(&dsn,2,.); %end; %if %superq(outlib)= %then %let outlib=WORK; %let lib=%qupcase(%superq(lib)); %let dsn=%qupcase(%superq(dsn)); %if %sysfunc(exist(&lib..&dsn)) ne 1 %then %do; %put ERROR: (&SYSMACRONAME) - Dataset &lib..&dsn does not exist.; %let msgtype=NOTE; %GoTo syntax; %end; %if %superq(file)= %then %do; %let file=create_&outlib._&dsn._data.sas; %if %symexist(USERDIR) %then %let file=&userdir/&file; %end; %if %symexist(USERDIR) %then %do; %if %qscan(%superq(file),-1,/\)=%superq(file) %then %let file=&userdir/&file; %end; proc sql noprint; select Name into :varlist separated by ' ' from dictionary.columns where libname="&lib" and memname="&dsn" ; select case type when 'num' then case when missing(format) then cats(Name,':32.') else cats(Name,':',format) end else cats(Name,':$',length,'.') end into :inputlist separated by ' ' from dictionary.columns where libname="&lib" and memname="&dsn" ; %if %qsubstr(%superq(lbl),1,1)=Y %then %do; select strip(catx('=',Name,put(label,$quote.))) into : lbllist separated by ' ' from dictionary.columns where libname="&lib" and memname="&dsn" and label is not null ; %end; %else %let lbllist=; select memlabel into :memlabel trimmed from dictionary.tables where libname="&lib" and memname="&dsn" ; %if %qsubstr(%superq(fmt),1,1)=Y %then %do; select strip(catx(' ',Name,format)) into :fmtlist separated by ' ' from dictionary.columns where libname="&lib" and memname="&dsn" and format is not null and format not like '$%' ; %end; %else %let fmtlist=; quit; %put _local_; data _null_; file "&file" dsd; if _n_ =1 then do; %if %superq(memlabel)= %then %do; put "data &outlib..&dsn;"; %end; %else %do; put "data &outlib..&dsn(label=%tslit(%superq(memlabel)));"; %end; put @3 "infile datalines dsd truncover;"; put @3 "input %superq(inputlist);"; %if not (%superq(fmtlist)=) %then %do; put @3 "format %superq(fmtlist);"; %end; %if not (%superq(lbllist)=) %then %do; put @3 "label %superq(lbllist);"; %end; put "datalines4;"; end; set &lib..&dsn(obs=&obs) end=last; put &varlist @; if last then do; put; put ';;;;'; end; else put; run; options &NQLM.; %mend; /* example usage: %data2datastep(dsn=sashelp.class, file=/folders/myfolders/makeclass.sas) */
I, too, would like the resulting code to open up a window, but that particular set of dm commands doesn't work on SAS UE.
Art, CEO, AnalystFinder.com
What type of code should the macro generate?
How much work should go into making the macro replicate exactly the input dataset versus making a macro that is easy to use and understand?
PS. Just because the parameters are defined as positional does NOT mean that you can't call them using their names. One of the nicest usability features of SAS macros. Positional parameters should be used when it is "obvious" what the argument (or at least the main argument) to the macro is. In this case that would be the name of the dataset to replicate.
%macro ds2post
/----------------------------------------------------------------------------
Generate data step to post content of dataset on SAS Communities
----------------------------------------------------------------------------*/
(data /* Name of dataset to post */
....
);
....
%mend ds2post ;
Here is what I mean.
Say we had a simple macro like this.
** UPDATED **
%macro ds2post
/*----------------------------------------------------------------------------
Generate data step to post content of dataset on SAS Communities
----------------------------------------------------------------------------*/
(data /* Name of dataset to post *REQ* */
,target= /* Name to use in generated data step. (default is memname of &DATA) */
,obs=20 /* Number of observations to generate */
,file=log /* Fileref or quoted physical name of file to hold generated code */
,format= /* Optional format list to use when generating data lines */
);
%local _error;
%*---------------------------------------------------------------------------
Check user parameters.
----------------------------------------------------------------------------;
%let _error=0;
%if "%upcase(%qsubstr(&data,1,2))" = "-H" %then %let _error=1;
%else %if %length(&data) %then %do;
%if not (%sysfunc(exist(%qscan(&data,1,())))
or %sysfunc(exist(%qscan(&data,1,()),view))) %then %do;
%let _error = 1;
%put ERROR: "&data" is not a valid value for the DATA parameter.;
%put ERROR: Unable to find the dataset. ;
%end;
%end;
%else %do;
%let _error = 1;
%put ERROR: The DATA parameter is required.;
%end;
%if not %length(&target) %then %let target=%qscan(%qscan(&data,1,()),-1,.);
%if not %length(&obs) %then %let obs=20;
%else %let obs=%upcase(&obs);
%if "&obs" ne "MAX" %then %if %sysfunc(verify(&obs,0123456789)) %then %do;
%let _error = 1;
%put ERROR: "&obs" is not a valid value for the OBS parameter.;
%put ERROR: Valid values are MAX or non-negative integer. ;
%end;
%if not %length(&file) %then %let file=log;
%if (&_error) %then %do;
*----------------------------------------------------------------------------;
* When there are parameter issues then write instructions to the log. ;
*----------------------------------------------------------------------------;
data _null_;
put
'%DS2POST'
//'SAS macro to copy data into a SAS Data Step in a '
'form which you can post to on-line forums.'
//'Syntax:'
/ ' %ds2post(data=,target=,obs=,format=,file=)'
//' data = Name of SAS dataset (or view) that you want to output.'
//' target = Name to use for generated dataset.'
' Default is to use name of the input.'
//' obs = Number of observations to output. Use MAX to copy complete dataset.'
' Default is 20.'
//' file = Fileref or quoted physical filename for code.'
' Default is the SAS log.'
//' format = Optional list of <var_list> <format spec> pairs to use when writing'
' data lines.' ' Setting format=_all_ will clear all formats so raw data'
' values are written.'
//'Note that this macro will NOT work well for really long data lines.'
/'If your data has really long variables or a lot of variables then consider'
' splitting your dataset in order to post it.'
;
run;
%end;
%else %do;
*----------------------------------------------------------------------------;
* Get contents information and sort by VARNUM ;
*----------------------------------------------------------------------------;
proc contents noprint data=&data
out=_contents_(keep=name varnum type length format: inform: memlabel label)
;
run;
proc sort data=_contents_ ;
by varnum;
run;
*----------------------------------------------------------------------------;
* Generate top of data step ;
*----------------------------------------------------------------------------;
filename _code_ temp;
data _null_;
length firstvar name $60 string $300 ;
retain firstvar ;
file _code_ column=cc ;
set _contents_ end=eof ;
if _n_=1 then do;
put "data &target" @;
if not missing(memlabel) then do;
string=quote(trim(memlabel),"'");
put '(label=' string ')' @;
end;
put ';';
end;
name=nliteral(name) ;
if _n_=1 then firstvar=name;
string=cats(ifc(type=2,'$',' '),length);
put ' attrib ' name 'length=' string @;
if formatl or not missing(format) then do;
string=cats(format,ifn(formatl,formatl,.),'.',ifn(formatd,formatd,.));
put 'format=' string @ ;
end;
if informl or not missing(informat) then do;
string=cats(informat,ifn(informl,informl,.),'.',ifn(informd,informd,.));
if cc+9+length(string)>80 then put / @4 @ ;
put 'informat=' string @ ;
end;
if not missing(label) then do;
string=quote(trim(label),"'");
if cc+7>80 then put / @4 'label=' string @ ;
else if cc+7+length(string)>80 then put 'label=' / @4 string @ ;
else put 'label=' string @;
end;
put ';' ;
if eof then do;
put " infile datalines dsd dlm='|' truncover;" ;
put ' input ' firstvar '-- ' name ';';
put 'datalines4;' ;
end;
run;
*----------------------------------------------------------------------------;
* Generate data lines ;
*----------------------------------------------------------------------------;
data _null_;
file _code_ mod dsd dlm='|';
%if (&obs ne MAX) %then %do;
if _n_ > &obs then stop;
%end;
set &data ;
%if %length(&format) %then %do;
format &format;
%end;
put (_all_) (+0) ;
run;
data _null_;
file _code_ mod ;
put ';;;;';
run;
%if "%qupcase(&file)" ne "_CODE_" %then %do;
*----------------------------------------------------------------------------;
* Copy generated code to target file name ;
*----------------------------------------------------------------------------;
data _null_ ;
infile _code_;
file &file ;
input;
put _infile_;
run;
%end;
%end;
%mend ds2post ;
So with a call like this you could generate the SAS code to create a copy of SASHELP.CLASS to the log.
%ds2post(sashelp.class);
Which you can just copy and paste.
data class(label="Student Data" );
attrib Name length=$8 ;
attrib Sex length=$1 ;
attrib Age length=8 ;
attrib Height length=8 ;
attrib Weight length=8 ;
infile datalines dsd dlm='|' truncover;
input Name -- Weight ;
datalines4;
Alfred|M|14|69|112.5
Alice|F|13|56.5|84
Barbara|F|13|65.3|98
Carol|F|14|62.8|102.5
Henry|M|14|63.5|102.5
James|M|12|57.3|83
Jane|F|12|59.8|84.5
Janet|F|15|62.5|112.5
Jeffrey|M|13|62.5|84
John|M|12|59|99.5
Joyce|F|11|51.3|50.5
Judy|F|14|64.3|90
Louise|F|12|56.3|77
Mary|F|15|66.5|112
Philip|M|16|72|150
Robert|M|12|64.8|128
Ronald|M|15|67|133
Thomas|M|11|57.5|85
William|M|15|66.5|112
;;;;
@art297 and @Tom: Many thanks for taking the time to read my lengthy post and for responding not only with a couple of remarks but even with full-grown updates of the macro! I'm sure in many companies the revision of a (non-trivial) standard macro would be considered more time-consuming.
My primary focus was on the library article. I didn't scrutinize %data2datastep. My suggested code changes were just a "by-product" of a few test runs of the macro, which I had never used before.
I think ease of use is of paramount importance in this case. So, I'm in favor of a macro that covers the majority of use cases with only one or two parameters (preferably positional parameters). Good idea, @Tom, to write the code to the log by default. I was hesitant to suggest this because of the line size limitation. (Actually, one could temporarily set LS=MAX, i.e. 256.) But with the optional parameter specifying a physical file there is an effective solution for wide datasets.
@Tom wrote:
Here is what I mean.
Say we had a simple macro like this.
** UPDATED **
<updated macro deleted I just wanted to be sure which topic I am responding to>
How about instead of sending output to a file or log using file print to send to results?
I don't use EG or Studio so I'm not sure if that would actually work for some of the users but if the output appeared in a place that they may be learning to expect output it might help. Especially for UE the whole concept of where you can put files seems confusing.
Speaking of UE how about adding this functionality as one of the default set of tasks that is included with SAS/Studio?
I am not a fan of keyword parameters because the end user has to remember the macro variable names and type them correctly for every use. I prefer the simplicity of positional parameters combined with default values (provided in the macro code) and easy-to-use self documentation. It's probably more a matter of style and personal predilection. I'm not a great 'rememberer', neither am I the world's best typist, so I have always personally found keyword parameters problematic.
All of this is great, I hope whatever steps are taken are very successful, and I wish you the best. However, you can mark me down as pessimistic at best.
I have been involved in many on-line communities (and am still involved in many on-line communities). Each one has an article instructing beginners that they need to first search, and if you can't find what you are looking for, then provide specific information and state their problem clearly, etc. etc. There have been long discussions about what to do about this, and efforts put forth to make this information easily available to users. I doubt that these efforts have ever made a difference.
It is an ongoing frustration for me (and probably for others) when people don't explain their problem clearly, sometimes don't even ask a clear question; they don't state what software they are using or what operating system they are using. In SAS, people have to be told to do things the way we want, because they don't know, they don't care and they won't care, they just want their answer. I doubt that an instruction document or a more welcoming user experience will change this.
An amusing story:
In another on-line forum, a user wrote something like this: "My software won't launch. It launched fine yesterday". That was the entire message. The response, which got many "likes", was "Your mouse is broken."
@PaigeMiller: Regarding your anecdote thought you mind find one of my own actual experiences amusing. A couple of years ago I couldn't figure out why I wasn't able to make a full system backup of my machine. My wireless mouse was working perfectly.
After hours of futility, and I can't recall why I did it, but replacing the wireless mouse with a non-wireless one, was all that was needed for the backup to work correctly.
Art, CEO, AnalystFinder.com
Thanks a lot, @PaigeMiller. I'm not too optimistic either. Still, improvements to the user experience are always a good thing.
I understand your frustration with all these imprecise problem descriptions. Apparently, there is a correlation between the ability to describe a problem accurately and the ability to solve this problem.
Would it help to add (mandatory) checkboxes or drop-down lists (for SAS version etc.) to the "New Message" field?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!