I've been asked to comment a Foundation SAS program extensively, this program has 20 or so data steps and various Procs. I've developed the program, and am familiar with it. The intended audience has limited SAS experience. My thought is to include a standard comment section at the head of each data step or Proc, with headings such as
Purpose of data step/proc:
Inputs:
Outputs:
Does anyone have any suggestions on what other information I could include, any alternative approaches, and is there any documentation available? I have been unable to find anything online.
Thanks
Joe
thanks for stating your audience;
that is a very important consideration.
There are several links here:
http://www.sascommunity.org/wiki/Category:Documentation
my minimum block is
/* | name: |
description:
purpose: |
*******/ |
which has evolved from my leave.no.Q.unaswered form in:
http://www.sascommunity.org/wiki/Writing_for_Reading_Program_Header
for ultimate Goodness in Document see the LaTeX package statrep
http://www.sascommunity.org/wiki/Using_LaTeX_to_document_programs
Ron Fehd ReUse maven
Hope this helps.
/**************************************************************************************************************
/* <<TITLE>> samplecode.SAS */
/* Requested by <<REQUESTOR>> */
/* <<DEPT>> */
/* <<TKT#>> */
/* <<WIT#>> */
/* <<CR#>> */
/* <<DESCRIPTION>>A brief description of the program */
/* */
/* Category: <<REG/NON_REG>> REG */
/* Schedule: <<D,W,M,Q,S,Y,OTHER Specify>> Weekly */
/* Destination: <<LOCATION>> Output files location */
/* Delivery Method: <<SECURE EMAIL, FILE SERVER, FTP, OTHER Specify >> FILE SERVER */
/* */
/* CREATED BY: */
/* REPORTING TEAM */
/* <<PROGRAMMER>> */
/* <<BSA>> */
/* <<DATE>> */
/* */
/* MAINTENANCE (maintain existing entries, add new on separate lines) */
/* MM/DD/YYYY PROGRAMMER ID BSA ID TKT# WIT# CR# */
/**************************************************************************************************************/
thorough
I like that.
YeahBut
ALL CAPS IS HARD TO READ, ESPECIALLY FOR NON-NATIVE SPEAKERS.
therefore I recommend lower case
ron fehd lowcase maven
In the Enhanced Editor or EG blocks of text can easily be inserted using Keyboard abbreviations. a number of these abbreviations have been posted on a sasCommunity article. http://www.sascommunity.org/wiki/Abbreviations/Macros
How did you make the comments so well aligned on the right-hand side? Did you use any keyboard shortcut?
I think the idea of description of why a data step or block of procs is used is a good idea. I often at the head of the program have a section of comments to describe in brief english an outline of the program. Then in the body describe when the outline elements begin.
I suggest that any not-intuitively-obvious code in a data step have a line or two of comment about why or what it does. For example I have data sets were coding standards changed over a period of time so there may be date dependent manipulations. Of if using somewhat obscure code like newvar = (var1=3)*1 + (var2=4)*1 + (var3 = 1)*1; to create a 0 or 1 valued variable in one line instead of a bunch of if /then statements.
DO note any code that may not work if the range of values changes!!!! I often place a comment like
/* ### WARNING: Only works if varx is positive */ The ### is something I can search for to find potential land mines is programs that are not used frequently or are one-offs where it may not be worth the time to code around likely problems.
And if MACROS are involved maybe comment twice as often.
"If macros are involved, comment twice as much!"
LOL
Yes, I do agree,
and how I do that involves the following:
Names of macro are in the form <verb><object>
and parameters match the keyword within which they appear
%macro Stack_Facts
(base=
,data=
,...
);
DATA work.&Data;
set library.&Data;
*...;
PROC Append base = &Base
data = &Data;
run;
%*note for macros over one.page ~>= 50 lines
MAKE SURE the mend statement has the name of the macro;
%mend Stack_Facts;
I'll hold my tongue -- today --
on in-crowd.specific acronyms for parameter names.
Ron Fehd Writing for Reading maven
Whatever you decide upon for content, here is a form that I like to use. All comments should be macro language comments with an extra asterisk:
%** Here is the text of the comment **;
%** Here is the second line of the comment **;
If a comment extends beyond more line, use a separate comment statement.
The reasons for doing this is that it makes it easy to locate and extract all comments that a program contains. You can comment out a section of the code using /* and */, without having to worry about whether it will appear to be an important piece of documentation.
Astounding wrote:
> Whatever you decide upon for content, here is a form that I like to use.
> All comments should be macro language comments with an extra asterisk:
%** Here is the text of the comment **;
%** Here is the second line of the comment **;
and one can make even more useful by counting the number of asterisks
%* module comments have one asterisk;
%** routine comments have two asterisks;
%*** subroutine comments have three asterisks.
I use a similar mnemonic when writing notes, warnings or errors:
NOTE: ALL CAPS means written by SAS
Note: InitCaps(Note): comments written by a routine
note: lowcase(note): comments written by a subroutine.
Ron Fehd Style.Guide never runs out of Style maven
1. While the "/* */" combination works fine, it is not a good idea to use it for comments or headers. It inteferes when you wish to comment out a section of code for testing purposes.
2. I prefer to use the "****;" method, both for headers and in-code comments.
3. A good header will give the program name and path, the creator and date, the purpose of the program, input and output file(s) and path(s), macros defined/used, marco variables created in the program and their purpose.
4. It is essential to have a MODIFICATION section of the header to keep track of all the edits made
5. A "run instructions" section is also a good idea.
Hope this helps.
Do agree "/* */" is as not distinguishable as "******" when doing headers, but when I try to comment out section of code with comments in it, I find Macro format coming in handy.
%Macro xx;
/*1. While the "/* */" combination works fine, it is not a good idea to use it for comments or*/
/*headers. It inteferes when you wish to comment out a section of code for testing purposes*/
%Mend xx;
Haikuo
I find that it is best to put brief comments before sections of code. This could be a single proc or data steps, but will also frequently be a little less often. For example if you have two proc sorts followed by a merge then that whole block only needs one comment. The comments should describe the purpose of the code and not what it is doing. For example there is no need for a comment that says "sort by id" before code "proc sort; by id;" .
As to which of the three types of comments (block - /* */, statement - * ;, or macro %* ; ) to use it depends on the location and purpose of the comment.
I reserve block comments /* */ for program headers (especially when the program headers might include example usage statements that will have embedded semi-colons) or in the rare instance when it is necessary to put a comment in the middle of another statement. This will allow you the option of using block comments for debugging code. It also avoids issue caused by trying to nest block comments. When I use /* */ to create a block comment I make it one comment even when it extends to multiple lines (see note below about "boxes"). Again this makes it easier to include example usage code in the program header.
When placing comments before a block of code I use statement comments. Each comment is a complete line in the code file (starts with * and ends with ; ). If you use statement comments in a macro then when MPRINT option is on those comments print in the log and serve as a guide when reviewing the SAS log. Making each comment its own line of code means that they will print in the LOG as they appear in the code file, keeping the LOG readable. I also do not place comments at the end of a line after a piece of code. When reading the code I like to see the comment first to help me decide if I even need to examine the actual code in detail. Plus when your code is in a macro and MPRINT is one the comment will then print on the line after the code and make the log more confusing than if the comment precedes the code.
I use macro comments (%* ... ; ) in two places. When a macro is only generating macro statements (%let, %if, etc). Again this will make the SAS log more readable. If you use statement comments in that situation they print in the log where there is no code generated. Usually you want the log to reflect the SAS code the macro generated rather than its internal logic. The other place is when the comment is intended for the coder, like in BALLARDW's post above.
I never attempt to right align ends of comments to create pretty boxes. They never stay aligned and can lead to problems when they end up being pushed out to column 160 during editing. I just received a program to review this week that used that style and had that problem. Plus with "modern" editors that use proportional spaced fonts they can never line up anyway.
I DO use comments lines to create horizontal rules above and below blocks of comments, but preferring minimal ink I use *----; rather than *****; These lines serve two purposes. First to provide visual separation between the comment and the code or between the blocks of code. The second is to provide a reference line so that I know when my lines of code have gotten too long for human consumption. Ask yourself why do newspapers and magazines print their pages in columns? It is because of the limitations of the human field of vision. Making lines of text longer than 70 or 80 characters and they cannot be scanned or read easily. To this purpose I always make these horizontal line comments the same length. They always start in column one. I do NOT indent comment blocks to reflect the indentation level of the code. This just makes it harder to align the comments and cuts down on the number of characters you can put on one line of comment.
All great comments (pun intended!) to the OP.
Adding some thoughts:
* As Tom says, document the business logic, rather than obvious technical logic. IOW, say why are you sorting by a b c, not that that next block of code is (obviously) sorting by a b c.
* Ditch the trailing */ on a multi-line block comment. It adds no value, and is either a pain to keep aligned, or a waste of programmer time if you try to do so. We're programmers, not ASCII artists. IOW,
/****************************
* Description:
* More Stuff
* ***************************/
Not
/****************************/
* Description: */
* More Stuff */
* ***************************/
Same for trailing semi-colons on * ; style comments.
* I tend to put no space between code and trailing semi-colon, and a space between comment and trailing semi-colon, but that is very much a personal coding style. I've seen others that put a space before all semi-colons.
* Sort by keys and todate to retain the most recent record ;
proc sort data=foo;
by key1 key2 todate;
run;
* Retain the last record ;
data bar;
set foo;
by key1 key2 todate;
if last.todate;
run;
* Within macro code, * ; style comments will print in the log, /* */ and %* ; style comments will not. So, whether you are trying to document the code, or the generated code determine which commenting style to use.
* A "trick" I occasionally use, esp. during development, and more esp. if I've inherited code over using /* */ block comments, is to use an uncalled macro to comment a large block of code:
/* The previous programmer */
/* used block style comments */
/* everywhere, outside of the */
/* program header */
proc foo;
run;
%macro comment;
/* More block comments here */
data whatever;
run;
/* some final comments */
/* why didn't they just use statement style comments? */
proc bar;
run;
%mend;
I'm not advocating this as a best practice by any means, just saying it's another "commenting code out" approach that's sometimes useful during development.
* Ask 20 programmers (try to pick good ones lol), you'll get 20 program header block styles. Review a few and decide what works for you. Or your company may come up with a standard program header. What I like:
- Name
- Description
- Input (optional - if describing the input adds value)
- Output (optional - if describing the output adds value)
- Original SAS version (I've learned this over time - why didn't they just use NODUPOUT? Oh, cuz it was developed in SAS 9.1)
- Original programmer (does he/she still work here?)
- Original date
- Macros called (kind of a pain to maintain, but a big help for impact analysis if you want to change a macro)
- Modification history
- Esp. for macros, I also like a complete usage example and usage notes. If possible I like the usage examples to be working, standalone code, not pseudocode. Sometimes my macro headers are longer than the macro code!
So, here's an example (sorry for the length):
/*=====================================================================
Program Name : create_format.sas
Purpose : Create a format from an input dataset.
SAS Version : SAS 9.1.3
Input Data : Input Dataset
Output Data : Formats
Macros Called : parmv, kill, nobs
Originally Written by : Scott Bass
Date : 23MAR2007
Program Version # : 1.0
=======================================================================
Modification History :
Programmer : Scott Bass
Date : 30NOV2012
Change/reason : Changed to use SPDEWORK library for better
performance
Program Version # : 1.1
=====================================================================*/
/*----------------------------------------------------------------------
Usage:
%create_format(
DATA=WORK.FOO
,NAME=ABC_FMT
);
Create $ABC_FMT format in WORK.FORMATS catalog using WORK.FOO as input.
WORK.FOO should contain the variables START and LABEL.
=======================================================================
%create_format(
DATA=WORK.FOO
,NAME=DEF_FMT
,TYPE=NUM_FORMAT
,START=START_VALUE
,END=END_VALUE
,LABEL=DESCR
,LIB=LIBRARY
,CAT=FORMATS
)
Create DEF_NUM format in LIBRARY.FORMATS catalog using WORK.FOO as input.
WORK.FOO should contain the variables START_VALUE, END_VALUE, and DESCR.
=======================================================================
%create_format(
DATA=PERM.DATASET
,NAME=GHI_CHR
,TYPE=CHR_INFORMAT
,START=NAME
,LABEL=DESCR
,LIB=LIBRARY
,CAT=MYFORMAT
,OTHER=**UNKNOWN**
,WHERE=DATE gt "01JAN2007"d
)
Create GHI_CHR informat in LIBRARY.MYFORMAT catalog using PERM.DATASET as input.
Input data not matching any of the ranges should be coded to "**UNKNOWN**".
PERM.DATASET should contain the variables NAME and DESCR.
Only include records after 01JAN2007 in the format.
=======================================================================
%create_format(
DATA=WORK.FOO
,NAME=cats("MY_",FMT)
,TYPE=CHR FORMAT
)
Create multiple formats from input dataset WORK.FOO.
Create the formats in the default catalog (WORK.FORMATS).
The name of the format is contained in the variable FMT.
We want the format names to begin with MY_.
WORK.FOO should contain the variables START and LABEL.
------------------------------------------------------------------------
Notes:
If the input dataset contains data for multiple formats
(i.e. it contains data for the name of the format), then the format
name string should contain a left-parentheses.
For example:
(VAR):
Would just use VAR as is from the input dataset.
The parentheses are just a trigger to the macro to treat this
as a code fragment.
cats("FOO_",VAR):
Would append "FOO_" to the VAR variable in the input dataset.
Otherwise &NAME will be used as a hard-coded format name for the entire
input dataset.
Combining a code fragment for the format name with Other processing
will result in only the last format containing Other processing.
No error checking is done on the input parameters (other than checking
if required parameters are set).
The variables for VALUE and LABEL should be character or automatic type
conversion will result.
Except for START and LABEL, your input dataset should avoid using
variable names listed in the attrib statement below. For example, if
your input dataset contained the variable PREFIX, and you used it for
the VALUE or LABEL parameters, your data would likely be truncated
giving undesired results.
The macro issues a warning if your input dataset contains overlapping
ranges. You should pre-process your input dataset to circumvent this
warning.
----------------------------------------------------------------------*/
%macro create_format
/*---------------------------------------------------------------------
Create a format from an input table
---------------------------------------------------------------------*/
(DATA=
/* Input dataset/view (REQ). */
/* Should be a two-level name. */
,LIB=WORK
/* Format library (REQ). */
/* Library to which the format is written. */
,CAT=FORMATS
/* Format catalog (REQ). */
/* Catalog to which the format is written. */
,NAME=
/* Format name (REQ). */
/* Either a hard coded name, or a code fragment */
/* (that could reference a variable in the */
/* input dataset/view. */
,TYPE=CHR_FORMAT
/* Format type (REQ). */
/* Valid values are CHR_FORMAT, NUM_FORMAT, */
/* CHR_INFORMAT, NUM_INFORMAT. */
,START=START
/* Starting value for the format (REQ). */
/* Variable containing the start value for the format */
,END=
/* Ending value for the format (Opt). */
/* Variable containing the end value for the format */
,LABEL=LABEL
/* Format label (REQ). */
/* Variable containing the label for the format. */
,DEFAULT=
/* Specify the default length of the (in)format. */
/* If blank, PROC FORMAT determines the default */
/* length based on the maximum label length. */
,MIN=
/* Specify a minimum length for the (in)format (Opt). */
,MAX=
/* Specify a maximum length for the (in)format (Opt). */
,FUZZ=
/* Specify a fuzz factor for matching values to a */
/* range (Opt). */
,OTHER=
/* "Other" processing? (Opt). */
/* If non-blank, input data not matching any defined */
/* ranges will be mapped to the "other" label. */
/* If value = _MISSING_, value is translated to the */
/* appropriate label for the format type. */
,WHERE=
/* Where processing? (Opt). */
/* If non-blank, only data matching the where clause */
/* will be included in the format. */
/* Do not include the "where" keyword. */
,DEDUP=Y
/* Dedup the source dataset? (REQ). */
/* Default value is YES. Valid values are: */
/* 0 1 OFF N NO F FALSE and ON Y YES T TRUE */
/* OFF N NO F FALSE and ON Y YES T TRUE */
/* (case insensitive) are acceptable aliases for */
/* 0 and 1 respectively. */
);
%local macro parmerr other where name type;
%let macro = &sysmacroname;
%* check input parameters ;
%parmv(DATA, _req=1,_words=1,_case=N) /* words=1 allows ds options */
%parmv(LIB, _req=1,_words=0,_case=U)
%parmv(CAT, _req=1,_words=0,_case=U)
%parmv(NAME, _req=1,_words=0,_case=U)
%parmv(TYPE, _req=1,_words=0,_case=U,_val=CHR_FORMAT NUM_FORMAT CHR_INFORMAT NUM_INFORMAT)
%parmv(START, _req=1,_words=0,_case=U)
%parmv(END, _req=0,_words=0,_case=U)
%parmv(LABEL, _req=1,_words=0,_case=U)
%parmv(DEFAULT, _req=0,_words=0,_case=U,_val=POSITIVE)
%parmv(MIN, _req=0,_words=0,_case=U,_val=POSITIVE)
%parmv(MAX, _req=0,_words=0,_case=U,_val=POSITIVE)
%parmv(FUZZ, _req=0,_words=0,_case=U)
%parmv(DEDUP, _req=1,_words=0,_case=U,_val=0 1)
%if (&parmerr) %then %goto quit;
%let WHERE = %unquote(&WHERE);
libname _crtfmt_ spde "%sysfunc(pathname(work))" temp=yes;
%* I know I do not need all these attributes now, but I have left them in here ;
%* in case additional options are added to the macro in the future ;
%let cntlin=_crtfmt_._cntlin_;
data &cntlin;
format FMTNAME TYPE START END LABEL;
/*
attrib
FMTNAME length=$32 label="Format name"
START length=$200 label="Starting value for format"
END length=$200 label="Ending value for format"
LABEL length=$5000 label="Format value label"
MIN length=3 label="Minimum length"
MAX length=3 label="Maximum length"
DEFAULT length=3 label="Default length"
LENGTH length=3 label="Format length"
FUZZ length=8 label="Fuzz value"
PREFIX length=$2 label="Prefix characters"
MULT length=8 label="Multiplier"
FILL length=$1 label="Fill character"
NOEDIT length=3 label="Is picture string noedit?"
TYPE length=$1 label="Type of format"
SEXCL length=$1 label="Start exclusion"
EEXCL length=$1 label="End exclusion"
HLO length=$11 label="Additional information"
DECSEP length=$1 label="Decimal separator"
DIGSEP length=$1 label="Three-digit separator"
DATATYPE length=$8 label="Date/time/datetime?"
LANGUAGE length=$8 label="Language for date strings"
;
if _n_=1 then call missing(of _all_);
*/
set &DATA end=_last_;
%if (%superq(WHERE) ne %str() ) %then %do;
where &WHERE;
%end;
%* if the format name or label contains a left parentheses, assume it is a code fragment ;
%* otherwise it is a hard coded format name or label ;
%if (%index(%superq(NAME),%str(%())) %then
%let NAME = &NAME;
%else
%let NAME = "&NAME";
%if (&TYPE = CHR_FORMAT) %then
%let type = C;
%else
%if (&TYPE = NUM_FORMAT) %then
%let type = N;
%else
%if (&TYPE = CHR_INFORMAT) %then
%let type = J;
%else
%if (&TYPE = NUM_INFORMAT) %then
%let type = I;
%if (%superq(END) eq ) %then %let END = &START;
FMTNAME = &NAME;
TYPE = "&type";
START = &START;
END = &END;
LABEL = &LABEL;
%if (&default ne ) %then %do;
DEFAULT = &default;
%end;
%if (&min ne ) %then %do;
MIN = &min;
%end;
%if (&max ne ) %then %do;
MAX = &max;
%end;
%if (&fuzz ne ) %then %do;
FUZZ = &fuzz;
%end;
output;
%if (%superq(OTHER) ne ) %then %do;
if (_last_) then do;
call missing(START);
call missing(END);
%if (%qupcase(%superq(OTHER)) eq _MISSING_) %then %do;
%if (%sysfunc(indexc(&TYPE,CNJ))) %then %do;
LABEL = " ";
%end;
%else %do;
LABEL = .;
%end;
%end;
%else %do;
%if (%sysfunc(indexc(&TYPE,CNJ))) %then %do;
LABEL = "&OTHER";
%end;
%else %do;
LABEL = &OTHER;
%end;
%end;
HLO = "O";
output;
end;
%end;
/*
keep
FMTNAME
START
END
LABEL
MIN
MAX
DEFAULT
LENGTH
FUZZ
PREFIX
MULT
FILL
NOEDIT
TYPE
SEXCL
EEXCL
HLO
DECSEP
DIGSEP
DATATYPE
LANGUAGE
;
*/
run;
%if (&dedup) %then %do;
%let cntlin=_crtfmt_._cntlin_nodup_;
%* remove duplicate ranges from input dataset ;
proc sort data=_crtfmt_._cntlin_ out=&cntlin dupout=_crtfmt_._cntlin_dupout_ nodupkey;
by fmtname start;
run;
%* print message if duplicate observations were deleted ;
%if (%nobs(_crtfmt_._cntlin_dupout_) gt 0) %then %do;
%* put %str(WAR)NING: Duplicate ranges were detected in the &DATA dataset.;
%put %str(NO)TE: Duplicate ranges were detected in the &DATA dataset.;
%* Uncomment the below line if calling this macro from a DIS job ;
%*rcSet(4);
%end;
%end;
%* create format(s) ;
proc format cntlin=&cntlin lib=&LIB..&CAT;
quit;
%quit:
%mend;
/******* END OF FILE *******/
Thanks to everyone who posted - all your suggestions were very helpful.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.