BookmarkSubscribeRSS Feed
mediaeval
Calcite | Level 5

I've been asked to comment a Foundation SAS program extensively, this program has 20 or so data steps and various Procs. I've developed the program, and am familiar with it. The intended audience has limited SAS experience. My thought is to include a standard comment section at the head of each data step or Proc, with headings such as

Purpose of data step/proc:

Inputs:

Outputs:

Does anyone have any suggestions on what other information I could include, any alternative approaches, and is there any documentation available? I have been unable to find anything online.

Thanks

Joe

14 REPLIES 14
Ron_MacroMaven
Lapis Lazuli | Level 10

thanks for stating your audience;

that is a very important consideration.

There are several links here:

http://www.sascommunity.org/wiki/Category:Documentation

my minimum block is

/*name:

description:

purpose:
*******/

which has evolved from my leave.no.Q.unaswered form in:

http://www.sascommunity.org/wiki/Writing_for_Reading_Program_Header

for ultimate Goodness in Document see the LaTeX package statrep

http://www.sascommunity.org/wiki/Using_LaTeX_to_document_programs

Ron Fehd  ReUse maven

jshaik
Calcite | Level 5

Hope this helps.

/**************************************************************************************************************

/* <<TITLE>> samplecode.SAS                                                                         */

/* Requested by <<REQUESTOR>>                                                         */

/*                <<DEPT>>                                                                      */

/*                <<TKT#>>                            */

/*                <<WIT#>>                                                                                    */

/*                <<CR#>>                                                                                     */

/* <<DESCRIPTION>>A brief description of the program                                 */

/*                                                                         */

/* Category: <<REG/NON_REG>>  REG                                                                             */

/* Schedule: <<D,W,M,Q,S,Y,OTHER Specify>> Weekly                                                             */

/* Destination: <<LOCATION>>   Output files location                                                          */

/* Delivery Method: <<SECURE EMAIL, FILE SERVER, FTP, OTHER Specify >> FILE SERVER                            */

/*                                                                                                            */

/* CREATED BY:                                                                                                */

/*  REPORTING TEAM                                                                                            */

/*  <<PROGRAMMER>>                                                                           */

/*  <<BSA>>                                                                                         */

/*  <<DATE>>                                                                                                  */

/*                                                                                                            */

/*  MAINTENANCE (maintain existing entries, add new on separate lines)                                        */

/*  MM/DD/YYYY PROGRAMMER ID   BSA ID         TKT#            WIT#         CR#                      */

/**************************************************************************************************************/

3-28-2013 11-36-52 AM.png

Ron_MacroMaven
Lapis Lazuli | Level 10

thorough

I like that.

YeahBut

ALL CAPS IS HARD TO READ, ESPECIALLY FOR NON-NATIVE SPEAKERS.

therefore I recommend lower case

ron fehd lowcase maven

ArtC
Rhodochrosite | Level 12

In the Enhanced Editor or EG blocks of text can easily be inserted using Keyboard abbreviations.  a number of these abbreviations have been posted on a sasCommunity article. http://www.sascommunity.org/wiki/Abbreviations/Macros

xyxu
Quartz | Level 8

How did you make the comments so well aligned on the right-hand side? Did you use any keyboard shortcut?

ballardw
Super User

I think the idea of description of why a data step or block of procs is used is a good idea. I often at the head of the program have a section of comments to describe in brief english an outline of the program. Then in the body describe when the outline elements begin.

I suggest that any not-intuitively-obvious code in a data step have a line or two of comment about why or what it does. For example I have data sets were coding standards changed over a period of time so there may be date dependent manipulations. Of if using somewhat obscure code like newvar = (var1=3)*1 + (var2=4)*1 + (var3 = 1)*1; to create a 0 or 1 valued variable in one line instead of a bunch of if /then statements.

DO note any code that may not work if the range of values changes!!!! I often place a comment like

/* ### WARNING: Only works if varx is positive */  The ### is something I can search for to find potential land mines is programs that are not used frequently or are one-offs where it may not be worth the time to code around likely problems.

And if MACROS are involved maybe comment twice as often.

Ron_MacroMaven
Lapis Lazuli | Level 10

"If macros are involved, comment twice as much!"

LOL

Yes, I do agree,

and how I do that involves the following:

Names of macro are in the form <verb><object>

and parameters match the keyword within which they appear

%macro Stack_Facts

(base=

,data=

,...

);

DATA work.&Data;

set library.&Data;

*...;

PROC Append base = &Base

            data = &Data;

run;

%*note for macros over one.page ~>= 50 lines

MAKE SURE the mend statement has the name of the macro;

%mend Stack_Facts;

I'll hold my tongue -- today --

on in-crowd.specific acronyms for parameter names.

Ron Fehd  Writing for Reading maven

Astounding
PROC Star

Whatever you decide upon for content, here is a form that I like to use.  All comments should be macro language comments with an extra asterisk:

%** Here is the text of the comment                                                  **;

%** Here is the second line of the comment                                       **;

If a comment extends beyond more line, use a separate comment statement.

The reasons for doing this is that it makes it easy to locate and extract all comments that a program contains.  You can comment out a section of the code using /* and */, without having to worry about whether it will appear to be an important piece of documentation.

Ron_MacroMaven
Lapis Lazuli | Level 10

Astounding wrote:

> Whatever you decide upon for content, here is a form that I like to use.

> All comments should be macro language comments with an extra asterisk:

%** Here is the text of the comment                                                  **;

%** Here is the second line of the comment                                       **;

and one can make even more useful by counting the number of asterisks

%* module comments have one asterisk;

%** routine comments have two asterisks;

%*** subroutine comments have three asterisks.

I use a similar mnemonic when writing notes, warnings or errors:

NOTE: ALL CAPS means written by SAS

Note: InitCaps(Note): comments written by a routine

note: lowcase(note): comments written by a subroutine.

Ron Fehd  Style.Guide never runs out of Style maven

JerryH
Calcite | Level 5

1.  While the "/* */" combination works fine, it is not a good idea to use it for comments or headers.  It inteferes when you wish to comment out a section of code for testing purposes.

2.  I prefer to use the "****;" method, both for headers and in-code comments.

3.  A good header will give the program name and path, the creator and date, the purpose of the program, input and output file(s) and path(s), macros defined/used, marco variables created in the program and their purpose.

4.  It is essential to have a MODIFICATION section of the header to keep track of all the edits made

5.  A "run instructions" section is also a good idea.

Hope this helps.

Haikuo
Onyx | Level 15

Do agree "/* */" is as not distinguishable as "******" when doing headers, but when I try to comment out section of code with comments in it, I find Macro format coming in handy.

%Macro xx;

/*1. While the "/* */" combination works fine, it is not a good idea to use it for comments or*/

/*headers. It inteferes when you wish to comment out a section of code for testing purposes*/

%Mend xx;

Haikuo

Tom
Super User Tom
Super User

I find that it is best to put brief comments before sections of code.  This could be a single proc or data steps, but will also frequently be a little less often. For example if you have two proc sorts followed by a merge then that whole block only needs one comment. The comments should describe the purpose of the code and not what it is doing.  For example there is no need for a comment that says "sort by id" before code "proc sort; by id;" .

As to which of the three types of comments (block - /*  */, statement - * ;, or macro %* ; ) to use it depends on the location and purpose of the comment.

I reserve block comments /* */ for program headers (especially when the program headers might include example usage statements that will have embedded semi-colons) or in the rare instance when it is necessary to put a comment in the middle of another statement.  This will allow you the option of using block comments for debugging code.  It also avoids issue caused by trying to nest block comments.  When I use /* */ to create a block comment I make it one comment even when it extends to multiple lines (see note below about "boxes").  Again this makes it easier to include example usage code in the program header.

When placing comments before a block of code I use statement comments.  Each comment is a complete line in the code file (starts with * and ends with ;  ). If you use statement comments in a macro then when MPRINT option is on those comments print in the log and serve as a guide when reviewing the SAS log.  Making each comment its own line of code means that they will print in the LOG as they appear in the code file, keeping the LOG readable.  I also do not place comments at the end of a line after a piece of code. When reading the code I like to see the comment first to help me decide if I even need to examine the actual code in detail.  Plus when your code is in a macro and MPRINT is one the comment will then print on the line after the code and make the log more confusing than if the comment precedes the code.

I use macro comments (%* ... ;  ) in two places. When a macro is only generating macro statements (%let, %if, etc).  Again this will make the SAS log more readable.  If you use statement comments in that situation they print in the log where there is no code generated.  Usually you want the log to reflect the SAS code the macro generated rather than its internal logic.  The other place is when the comment is intended for the coder, like in BALLARDW's post above.

I never attempt to right align ends of comments to create pretty boxes.  They never stay aligned and can lead to problems when they end up being pushed  out to column 160 during editing.  I just received a program to review this week that used that style and had that problem.  Plus with "modern" editors that use proportional spaced fonts they can never line up anyway.

I DO use comments lines to create horizontal rules above and below blocks of comments, but preferring minimal ink I use *----; rather than *****;  These lines serve two purposes. First to provide visual separation between the comment and the code or between the blocks of code.  The second is to provide a reference line so that I know when my lines of code have gotten too long for human consumption.  Ask yourself why do newspapers and magazines print their pages in columns?  It is because of the limitations of the human field of vision.  Making lines of text longer than 70 or 80 characters and they cannot be scanned or read easily.  To this purpose I always make these horizontal line comments the same length.  They always start in column one.  I do NOT indent comment blocks to reflect the indentation level of the code.  This just makes it harder to align the comments and cuts down on the number of characters you can put on one line of comment.

ScottBass
Rhodochrosite | Level 12

All great comments (pun intended!) to the OP.

Adding some thoughts:

* As Tom says, document the business logic, rather than obvious technical logic.  IOW, say why are you sorting by a b c, not that that next block of code is (obviously) sorting by a b c.

* Ditch the trailing */ on a multi-line block comment.  It adds no value, and is either a pain to keep aligned, or a waste of programmer time if you try to do so.  We're programmers, not ASCII artists.  IOW,

/****************************

* Description:

* More Stuff

* ***************************/

Not

/****************************/

* Description:                      */

* More Stuff                        */

* ***************************/

Same for trailing semi-colons on * ; style comments.

* I tend to put no space between code and trailing semi-colon, and a space between comment and trailing semi-colon, but that is very much a personal coding style.  I've seen others that put a space before all semi-colons.

* Sort by keys and todate to retain the most recent record ;

proc sort data=foo;

  by key1 key2 todate;

run;

* Retain the last record ;

data bar;

  set foo;

  by key1 key2 todate;

  if last.todate;

run;

* Within macro code, * ; style comments will print in the log, /* */ and %* ; style comments will not.  So, whether you are trying to document the code, or the generated code determine which commenting style to use.

* A "trick" I occasionally use, esp. during development, and more esp. if I've inherited code over using /* */ block comments, is to use an uncalled macro to comment a large block of code:

/* The previous programmer */

/* used block style comments */

/* everywhere, outside of the */

/* program header */

proc foo;

run;

%macro comment;

/* More block comments here */

data whatever;

run;

/* some final comments */

/* why didn't they just use statement style comments? */

proc bar;

run;

%mend;

I'm not advocating this as a best practice by any means, just saying it's another "commenting code out" approach that's sometimes useful during development.

* Ask 20 programmers (try to pick good ones lol), you'll get 20 program header block styles.  Review a few and decide what works for you.  Or your company may come up with a standard program header.  What I like:

- Name

- Description

- Input (optional - if describing the input adds value)

- Output (optional - if describing the output adds value)

- Original SAS version (I've learned this over time - why didn't they just use NODUPOUT?  Oh, cuz it was developed in SAS 9.1)

- Original programmer (does he/she still work here?)

- Original date

- Macros called (kind of a pain to maintain, but a big help for impact analysis if you want to change a macro)

- Modification history

- Esp. for macros, I also like a complete usage example and usage notes.  If possible I like the usage examples to be working, standalone code, not pseudocode.  Sometimes my macro headers are longer than the macro code!

So, here's an example (sorry for the length):

/*=====================================================================

Program Name            : create_format.sas

Purpose                 : Create a format from an input dataset.

SAS Version             : SAS 9.1.3

Input Data              : Input Dataset

Output Data             : Formats

Macros Called           : parmv, kill, nobs

Originally Written by   : Scott Bass

Date                    : 23MAR2007

Program Version #       : 1.0

=======================================================================

Modification History    :

Programmer              : Scott Bass

Date                    : 30NOV2012

Change/reason           : Changed to use SPDEWORK library for better

                          performance

Program Version #       : 1.1

=====================================================================*/

/*----------------------------------------------------------------------

Usage:

%create_format(

    DATA=WORK.FOO

   ,NAME=ABC_FMT

);

Create $ABC_FMT format in WORK.FORMATS catalog using WORK.FOO as input.

WORK.FOO should contain the variables START and LABEL.

=======================================================================

%create_format(

    DATA=WORK.FOO

   ,NAME=DEF_FMT

   ,TYPE=NUM_FORMAT

   ,START=START_VALUE

   ,END=END_VALUE

   ,LABEL=DESCR

   ,LIB=LIBRARY

   ,CAT=FORMATS

)

Create DEF_NUM format in LIBRARY.FORMATS catalog using WORK.FOO as input.

WORK.FOO should contain the variables START_VALUE, END_VALUE, and DESCR.

=======================================================================

%create_format(

    DATA=PERM.DATASET

   ,NAME=GHI_CHR

   ,TYPE=CHR_INFORMAT

   ,START=NAME

   ,LABEL=DESCR

   ,LIB=LIBRARY

   ,CAT=MYFORMAT

   ,OTHER=**UNKNOWN**

   ,WHERE=DATE gt "01JAN2007"d

)

Create GHI_CHR informat in LIBRARY.MYFORMAT catalog using PERM.DATASET as input.

Input data not matching any of the ranges should be coded to "**UNKNOWN**".

PERM.DATASET should contain the variables NAME and DESCR.

Only include records after 01JAN2007 in the format.

=======================================================================

%create_format(

    DATA=WORK.FOO

   ,NAME=cats("MY_",FMT)

   ,TYPE=CHR FORMAT

)

Create multiple formats from input dataset WORK.FOO.

Create the formats in the default catalog (WORK.FORMATS).

The name of the format is contained in the variable FMT.

We want the format names to begin with MY_.

WORK.FOO should contain the variables START and LABEL.

------------------------------------------------------------------------

Notes:

If the input dataset contains data for multiple formats

(i.e. it contains data for the name of the format), then the format

name string should contain a left-parentheses.

For example:

(VAR):

   Would just use VAR as is from the input dataset.

   The parentheses are just a trigger to the macro to treat this

   as a code fragment.

cats("FOO_",VAR):

   Would append "FOO_" to the VAR variable in the input dataset.

Otherwise &NAME will be used as a hard-coded format name for the entire

input dataset.

Combining a code fragment for the format name with Other processing

will result in only the last format containing Other processing.

No error checking is done on the input parameters (other than checking

if required parameters are set).

The variables for VALUE and LABEL should be character or automatic type

conversion will result.

Except for START and LABEL, your input dataset should avoid using

variable names listed in the attrib statement below.  For example, if

your input dataset contained the variable PREFIX, and you used it for

the VALUE or LABEL parameters, your data would likely be truncated

giving undesired results.

The macro issues a warning if your input dataset contains overlapping

ranges.  You should pre-process your input dataset to circumvent this

warning.

----------------------------------------------------------------------*/

%macro create_format

/*---------------------------------------------------------------------

Create a format from an input table

---------------------------------------------------------------------*/

(DATA=

               /* Input dataset/view (REQ).                          */

               /* Should be a two-level name.                        */

,LIB=WORK

               /* Format library (REQ).                              */

               /* Library to which the format is written.            */

,CAT=FORMATS

               /* Format catalog (REQ).                              */

               /* Catalog to which the format is written.            */

,NAME=

               /* Format name (REQ).                                 */

               /* Either a hard coded name, or a code fragment       */

               /* (that could reference a variable in the            */

               /* input dataset/view.                                */

,TYPE=CHR_FORMAT

               /* Format type (REQ).                                 */

               /* Valid values are CHR_FORMAT, NUM_FORMAT,           */

               /* CHR_INFORMAT, NUM_INFORMAT.                        */

,START=START

               /* Starting value for the format (REQ).               */

               /* Variable containing the start value for the format */

,END=

               /* Ending value for the format (Opt).                 */

               /* Variable containing the end value for the format   */

,LABEL=LABEL

               /* Format label (REQ).                                */

               /* Variable containing the label for the format.      */

,DEFAULT=

               /* Specify the default length of the (in)format.      */

               /* If blank, PROC FORMAT determines the default       */

               /* length based on the maximum label length.          */

,MIN=

               /* Specify a minimum length for the (in)format (Opt). */

,MAX=

               /* Specify a maximum length for the (in)format (Opt). */

,FUZZ=

               /* Specify a fuzz factor for matching values to a     */

               /* range (Opt).                                       */

,OTHER=

               /* "Other" processing? (Opt).                         */

               /* If non-blank, input data not matching any defined  */

               /* ranges will be mapped to the "other" label.        */

               /* If value = _MISSING_, value is translated to the   */

               /* appropriate label for the format type.             */

,WHERE=

               /* Where processing? (Opt).                           */

               /* If non-blank, only data matching the where clause  */

               /* will be included in the format.                    */

               /* Do not include the "where" keyword.                */

,DEDUP=Y

               /* Dedup the source dataset? (REQ).                   */

               /* Default value is YES.  Valid values are:           */

               /* 0 1 OFF N NO F FALSE and ON Y YES T TRUE           */

               /* OFF N NO F FALSE and ON Y YES T TRUE               */

               /* (case insensitive) are acceptable aliases for      */

               /* 0 and 1 respectively.                              */

);

%local macro parmerr other where name type;

%let macro = &sysmacroname;

%* check input parameters ;

%parmv(DATA,         _req=1,_words=1,_case=N)  /* words=1 allows ds options */

%parmv(LIB,          _req=1,_words=0,_case=U)

%parmv(CAT,          _req=1,_words=0,_case=U)

%parmv(NAME,         _req=1,_words=0,_case=U)

%parmv(TYPE,         _req=1,_words=0,_case=U,_val=CHR_FORMAT NUM_FORMAT CHR_INFORMAT NUM_INFORMAT)

%parmv(START,        _req=1,_words=0,_case=U)

%parmv(END,          _req=0,_words=0,_case=U)

%parmv(LABEL,        _req=1,_words=0,_case=U)

%parmv(DEFAULT,      _req=0,_words=0,_case=U,_val=POSITIVE)

%parmv(MIN,          _req=0,_words=0,_case=U,_val=POSITIVE)

%parmv(MAX,          _req=0,_words=0,_case=U,_val=POSITIVE)

%parmv(FUZZ,         _req=0,_words=0,_case=U)

%parmv(DEDUP,        _req=1,_words=0,_case=U,_val=0 1)

%if (&parmerr) %then %goto quit;

%let WHERE = %unquote(&WHERE);

libname _crtfmt_ spde "%sysfunc(pathname(work))" temp=yes;

%* I know I do not need all these attributes now, but I have left them in here ;

%* in case additional options are added to the macro in the future ;

%let cntlin=_crtfmt_._cntlin_;

data &cntlin;

   format FMTNAME TYPE START END LABEL;

/*

   attrib

      FMTNAME     length=$32     label="Format name"

      START       length=$200    label="Starting value for format"

      END         length=$200    label="Ending value for format"

      LABEL       length=$5000   label="Format value label"

      MIN         length=3       label="Minimum length"

      MAX         length=3       label="Maximum length"

      DEFAULT     length=3       label="Default length"

      LENGTH      length=3       label="Format length"

      FUZZ        length=8       label="Fuzz value"

      PREFIX      length=$2      label="Prefix characters"

      MULT        length=8       label="Multiplier"

      FILL        length=$1      label="Fill character"

      NOEDIT      length=3       label="Is picture string noedit?"

      TYPE        length=$1      label="Type of format"

      SEXCL       length=$1      label="Start exclusion"

      EEXCL       length=$1      label="End exclusion"

      HLO         length=$11     label="Additional information"

      DECSEP      length=$1      label="Decimal separator"

      DIGSEP      length=$1      label="Three-digit separator"

      DATATYPE    length=$8      label="Date/time/datetime?"

      LANGUAGE    length=$8      label="Language for date strings"

   ;

   if _n_=1 then call missing(of _all_);

*/

   set &DATA end=_last_;

   %if (%superq(WHERE) ne %str() ) %then %do;

      where &WHERE;

   %end;

   %* if the format name or label contains a left parentheses, assume it is a code fragment ;

   %* otherwise it is a hard coded format name or label ;

   %if (%index(%superq(NAME),%str(%())) %then

      %let NAME = &NAME;

   %else

      %let NAME = "&NAME";

   %if (&TYPE = CHR_FORMAT) %then

      %let type = C;

   %else

   %if (&TYPE = NUM_FORMAT) %then

      %let type = N;

   %else

   %if (&TYPE = CHR_INFORMAT) %then

      %let type = J;

   %else

   %if (&TYPE = NUM_INFORMAT) %then

      %let type = I;

   %if (%superq(END) eq ) %then %let END = &START;

   FMTNAME  = &NAME;

   TYPE     = "&type";

   START    = &START;

   END      = &END;

   LABEL    = &LABEL;

   %if (&default ne ) %then %do;

   DEFAULT  = &default;

   %end;

   %if (&min ne ) %then %do;

   MIN      = &min;

   %end;

   %if (&max ne ) %then %do;

   MAX      = &max;

   %end;

   %if (&fuzz ne ) %then %do;

   FUZZ     = &fuzz;

   %end;

   output;

   %if (%superq(OTHER) ne ) %then %do;

      if (_last_) then do;

         call missing(START);

         call missing(END);

         %if (%qupcase(%superq(OTHER)) eq _MISSING_) %then %do;

            %if (%sysfunc(indexc(&TYPE,CNJ))) %then %do;

               LABEL = " ";

            %end;

            %else %do;

               LABEL = .;

            %end;

         %end;

         %else %do;

            %if (%sysfunc(indexc(&TYPE,CNJ))) %then %do;

               LABEL = "&OTHER";

            %end;

            %else %do;

               LABEL = &OTHER;

            %end;

         %end;

         HLO   = "O";

         output;

      end;

   %end;

   /*

   keep

      FMTNAME

      START

      END

      LABEL

      MIN

      MAX

      DEFAULT

      LENGTH

      FUZZ

      PREFIX

      MULT

      FILL

      NOEDIT

      TYPE

      SEXCL

      EEXCL

      HLO

      DECSEP

      DIGSEP

      DATATYPE

      LANGUAGE

   ;

   */

run;

%if (&dedup) %then %do;

  %let cntlin=_crtfmt_._cntlin_nodup_;

  %* remove duplicate ranges from input dataset ;

  proc sort data=_crtfmt_._cntlin_ out=&cntlin dupout=_crtfmt_._cntlin_dupout_ nodupkey;

     by fmtname start;

  run;

  %* print message if duplicate observations were deleted ;

  %if (%nobs(_crtfmt_._cntlin_dupout_) gt 0) %then %do;

     %* put %str(WAR)NING:  Duplicate ranges were detected in the &DATA dataset.;

     %put %str(NO)TE:  Duplicate ranges were detected in the &DATA dataset.;

     %* Uncomment the below line if calling this macro from a DIS job ;

     %*rcSet(4);

  %end;

%end;

%* create format(s) ;

proc format cntlin=&cntlin lib=&LIB..&CAT;

quit;

%quit:

%mend;

/******* END OF FILE *******/


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
mediaeval
Calcite | Level 5


Thanks to everyone who posted - all your suggestions were very helpful.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 5212 views
  • 18 likes
  • 11 in conversation