Maxims of Maximally Efficient SAS Programmers

102 Likes

Recently I added a joking remark to one of my posts that included a not-so-hidden reference to the 70 Maxims of Maximally Effective Mercenaries from http://www.schlockmercenary.com/.

This earned me a question from @ballardw, and so I decided to start the project for earnest.

This is planned to be an ongoing work, in the tradition of the New Hacker's Dictionary and similar works, so feel free to contact me with suggestions for new maxims or enhancement of already existing maxims. Reordering is possible, but once a maxim has been used in an other place, I intend to keep its number static.

So - HERE THEY ARE, the

Maxims of Maximally Efficient SAS Programmers

Maxim 1

Read the documentation.

SAS provides extremely well done documentation for its products. Learning to read the documentation will enhance your problem-solving skills by orders of magnitude.

Maxim 2

Read the log.

Everything you need to know about your program is in the log. Interpreting messages and NOTEs is essential in finding errors.

Maxim 3

Know your data.

Having a clear picture of data structures – variable types, lengths, formats – and content will provide you with a fast-path to solving problems. Many simple problems can be cleared by taking a look at the "Columns" section in dataset properties. Use proc contents frequently.

Maxim 4

If in doubt, do a test run and look at the results. If puzzled, inquire further.

SAS is in its core an interpreting language. Running one step is just a few mouse-clicks away. Use that to your advantage.
"Try it." (Grace Hopper, Admiral, US Navy)

Maxim 5

Ask, and you will be answered.

SAS Technical Support and the SAS user community stand at the ready to help you. Provide a clear question, example data, your code (with or within the log, see Maxim 2), and where you seem to have failed. Help will be on the way.

Maxim 6

Google is your friend.

Just entering the text of an error message or problematic issue, prepended with "SAS", will often yield resources with at least a hint for solving your issue on the first result page; the same is true for other search engines.

Maxim 7

There is a procedure for it.
(The exception proves the rule)

Learn to use prefabricated procedures for solving your tasks. 5 lines of proc means may equal 20 lines (or more) of data step logic.

Maxim 8

There is a format for it.
(The exception proves the rule)

As with Maxim 7, SAS provides a plethora of formats for input and output. Use them to your advantage. If one that fits your needs is not present, rolling your own will usually beat complicated data step logic.

Maxim 9

There is a function for it.
(The exception proves the rule)

As of 2019-01-31, documentation.sas.com lists 649 data step functions and call routines. It is almost impossible to not find a function that makes solving an issue easier, or solves it altogether.

Maxim 10

SQL may eat your WORK and your time.

With large datasets, the way proc sql handles joins will lead to the buildup of large utility files, with lots of random accesses to these. This may well prove much less efficient than a join done with a data step and the necessary proc sort steps.

Maxim 11

A macro is not needed.
(The exception proves the rule)

There may be no need for repeating code when using by-group processing can do the trick. The macro language is also not meant for handling data - like calculating dates - but for creating dynamic code. Instead of creating lists in macro variables, store them in datasets and use call execute from there. Do calculations in data steps and save the results in macro variables (for later use) with call symput.

A macro cannot solve anything that can't be solved with Base SAS code. But it can reduce the effort needed to write that code, and it can help in making code data-driven. Also see Maxim 33 in reference to call execute.

Maxim 12

Make it look nice

Proper visual formatting makes for better code. Use indentation to make semantic blocks visible. What is on the same logical level, needs to be at the same column.
Avoid overlong lines, make a block of lines if necessary (that 80 character limit of old is not so bad at all).
Be consistent; the next one to maintain that piece of code might be your five-year-older self. Be nice to her.
Make frequent use of comments.

Maxim 13

When you're through learning, you're through.

(Will Rogers, John Wooden)

As long as you keep your ability and will to learn, you are alive. When you stop learning, you may not be dead, but you start smelling funny. Never say "I don't have the time to learn that now". The time to learn is NOW.
"Much have I learned from my teachers, more from my colleagues, but most from my students." - from the Talmud.

Maxim 14

Use the right tool.

"How many times do I have to tell you, the right tool for the right job!"
(Montgomery Scott, Captain, Starfleet)
Never restrict yourself by a simple "Use XYZ" or "Don't use XYZ". If something is better solved with a certain procedure, use it. If you don't yet know it, learn it (see Maxim 13). If a 3rd-party tool is better suited, use it (think of DBMS operations; have them done in the DB itself). Leave operating system tasks to the operating system (see Maxim 15).

Maxim 15

Know your playing field.

Make sure to gain knowledge about the environment in which SAS is implemented. Know the system's layout, its basic syntax, and its most important utilities. Especially UNIX is rich with tools that can and will make your life easier. Control the system, don't let the system control you.

Maxim 16

If it isn't written, it is not.

Anything that is not properly documented, does not exist. It may look like it was there at some time, but the moment you have to deal with undocumented things (programs, data, processes), it is as if you are starting from scratch. The Jedi programmer spends 90% of her time documenting, and reading documentation.

“I should have written that down. - Dilbert”
Scott Adams

Maxim 17

Do it the SAS way.

A data warehouse needs different structures than databases, and it needs different structures than spreadsheets. Adapt your thinking away from normalized tables (redundancy is avoided in DBMS designs, but is typical in data warehousing tables used for analysis), and also away from cells (spreadsheets have no mandatory attributes for columns, SAS tables have). See a SAS table as just the technical entity it is, and model it according to the analytic need.

Maxim 18

Separate your names.

Do not use names of predefined SAS functions, call routines or formats for your objects (variables, datasets). This avoids confusion and prevents hard-to-trace errors when a simple typo happens.

Maxim 19

Long beats wide.

(Don't keep data in structure)

In the world of spreadsheets, people tend to line up data side-by-side, and put data items (dates, categories, …) into column headers. This runs counter to all the methods available in SAS for group processing, and makes programming difficult, as one has variable column names and has to resort to creating dynamic code (with macros and/or call execute) where such is not necessary at all if categories were represented in their own column and data aligned vertically.
There are times where a wide format is needed, eg when preparing data for regression analysis. But for the processing and storing of data, long formats are always to be preferred.

Dynamic variable names force unnecessary dynamic code.

Maxim 20

Keep your names short.

Short, concise variable names make for quicker typing with less risk of typos, and make the code more readable. Put additional information in labels, where it belongs.

Maxim 21

Formats beat joins.

Creating a format from a dataset lets you do a "join" on another dataset in one sequential pass. As long as the format fits into memory, it provides an easily understandable means to avoid the sort of a large dataset.

Maxim 22

Force correct data.

When reading from external sources, don't rely on proc import. Instead use a data step that will fail with an error if an unexpected event in the input data happens. This lets you detect errors early in your analytic chain. Maxim 22 specifically precludes using the Excel file format for data interchange, because of the many involved automatisms.

Maxim 23

Recursion.

See Maxim 23.

Maxim 24

Return with a zero.

No batch program can be allowed to end a succesful run with a non-zero return code. No WARNINGs, no ERRORs. Any unexpected event shall cause a non-zero return code, so a chain of jobs is halted at the earliest possible moment.

Maxim 25

Have a clean log.

The log of a production-level process should be free of any extraneous NOTEs. Automatic type conversions, missing values and so on must not be tolerated.
This allows for easier detection of semantic problems or incorrect data, as these will often cause unexpected NOTEs.

As long as a part of your log is "unclean", all following ERRORs or other problems might be caused from that, and most of the time are not worthy of your attention. Debug from the top down.

Maxim 26

.sas trumps catalogs.

While it is possible to keep code in catalogs, storing it in simple text files is better. SAS catalogs are specific to SAS versions and operating system environments. Text files are never version or operating system specific, eventual changes in codepages are handled by SAS itself over the IOM bridge or by file transfer utilities.
Text files also enable you to use external tools for text handling (think grep, awk, sed and so on).
In the same vein, always store the code of proc format (and its related cntlin datasets), and do not rely on the created formats in their catalogs.

Maxim 27

Textuality rules.

Use plain text whenever possible. Use simple file formats for data transfer, so you can inspect the data with a text editor. Prefer modern, XML- or JSON-based files (even when compressed, like xlsx) over older, binary formats. Store your code in .sas files, use Enterprise Guide projects as temporary containers only while developing. Text lends itself well to versioning and other tools that help the programmer, binary files don't. See Maxim 26.

Maxim 28

Macro variables need no formats.

Keep in mind that '01jan1960'd and 0 represent the same value in SAS code. Therefore you only need formatting of macro variables when they are meant for display (eg in a title statement). When you only need them for conditions, the raw values are sufficient and easier to create.

Maxim 29

When in doubt, use brute force.
(Ken Thompson)

Use simple, easy-to-understand code. Only revert to "clever" algorithms when circumstances require it. This is also a variant of the KISS (Keep It Simple, Stupid) principle.

Maxim 30

Analyze, then optimize.

Do not engage in optimizing before you know where the bottleneck is, or if there is a bottleneck at all. No need to speed up a step that takes 1 second in real life. Keep your code simple as long as you can (see Maxim 29).

"Premature optimization is the root of all evil"
(prob. C. A. R. Hoare)

Maxim 31

Computers are dumb.

Never trust a piece of software to do on its own what you intend it to do. If you need a certain order, sort or use "order by" in SQL. If a variable should be of certain length, set it. Do not rely on the guessing of proc import, roll your own data step. (see Maxim 22)
Even if things work in a certain way on their own now, this does not guarantee anything for the future. New software releases or changes in data might wreck your process. Take control.

Maxim 32

No test without COMPARE.

When making a change on an existing piece of code, never release it without running a regression test against data from the previous version with PROC COMPARE. This makes sure that only the changes you wanted (or in the case of an optimization, no changes at all) are introduced. Even use COMPARE at crucial steps in your development process, so you catch mistakes when they are introduced, instead of having to search for that spot later. If COMPARE takes its time, enjoy your coffee. It's time well spent.

Maxim 33

Intelligent data makes for intelligent programs.

The Jedi programmer strives for intelligent data structures. The code will then be intelligent and well-designed (almost) on its own.

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

(Linus Torvalds)
When dealing with time-related data, use SAS date and datetime values.
Categories are best kept in character variables.
Automate the creation of formats (eg from domain-tables in the DB system). Use these formats instead of select() blocks or if-then-else constructs. See Maxim 8.
Create control datasets and use call execute to create code dynamically from them.
Keep (especially non-ASCII) literals out of the code (by putting them in data); this will avoid problems when code is moved across systems.

Use properly structured formats for dates when the sequence matters (YYMMDD instead of MMDDYY).
Also see Maxim 19.

Maxim 34

Work in steps.

Don't solve a task in one everything-including-the-kitchen-sink step. Create intermediate datasets/results, so that each step solves one issue at a time that can easily be comprehended (see Maxim 29).
In the same vein, don't implement multiple changes to a program in one iteration. Instead do a check-in for every single logical change. This allows regression tests for every change, and if something breaks which was not immediately obvious in regression testing, it is easier to step back and find which change introduced the bug.

Maxim 35

There's two kinds of stupidity.

#1 "This is old, and therefore it's good".
#2 "This is new, and therefore it's better".
Fads have come and gone, but every now and then there's a gem in all the rubble. Be critical, but open-minded.

Maxim 36

No work is done until the final test has run.

The finishing step of every piece of work must include a test of the program in exactly the same environment it will be used (same server, same user, same batch script, etc). Only then can you be sure that the first use does not result in a call for support that interrupts your well-earned coffee break. "Normal" users may have a more restricted environment than developers, so keep that in mind when developing code for them.

Maxim 37

Perfection.

Perfection is attained not when there is nothing more to add, but when there is nothing more to remove.
(Antoine de Saint-Exupery)
Another application of the KISS principle; also think of the practice of "muntzing" applied to programming.
Get rid of unnecessary code, redundant variables, irrelevant observations.

"Perfection is not attainable, but if we chase perfection we can catch excellence."
(Vince Lombardi)

Maxim 38

Beware of the dread god Finagle.

And heed the words of his mad prophet Murphy:
"What can go wrong, will go wrong".
Write your code in this light, and include safeguards in expectation of the unexpected.

Make sure that you are notified when the unexpected happens (eg make a batch program crash on invalid data, so the scheduler catches it).

(With regards to Larry Niven.)

Maxim 39

Be human.

Everybody makes mistakes, including you. Be kind when dealing out critique, as someone might be forced to critique you 5 minutes later.
There's always a better, and a worse. You're not the worst, but surely you're not the best. Learn from both, and be kind to both.
And remember that the code is not the coder. Very nice people can write incredibly ugly code, and vice versa.

Maxim 40

Talk to the customer.

Having a good line of communication with the ultimate consumer of data is essential.
Make them formulate their needs in clear, unambiguous language; often this act of questioning themselves provides a major advance in solving the problem.
Never solve a problem that you think the customer needs solved.
Deliver example results early and often.

Maxim 41.

Wisdom emerges from experience.

Unfortunately, most experience emerges from stupidity.

Maxim 42

It's not the answer, it's the question.

A well-formulated question may already lead you in the right direction before you speak it out loud or write it down.
An ill-formulated question will get you the equivalent of "42".

Maxim 43

Keep the final goal in mind.

Which information needs to end up where? The answer to this question shall guide your selection of tools and methods, not the other way round. Information is at the core of your work, and delivering it correctly, timely, and usable is your ultimate goal.
Once the content is delivered and accepted, you can think of "prettying up".
All this implies that the requirements have been refined and clearly worded (see Maxim 42). Have test cases defined against which you can prove your code.

Maxim 44

Leave no spaces.

All operating systems and programming languages use blank space as the prime separator of names and objects. It is therefore a VERY BAD idea to have spaces in names (of files, directories, columns and so on). Even when a certain environment provides a means to do so (encapsulating file names in quotes in UNIX and Windows, or the 'my name'n construct in SAS), one should not use that. Underlines can convey the same meaning and make handling of such names easier by orders of magnitude.
(The fact that certain products make happy use of blanks in file- and pathnames does not make this practice right; it just illustrates the incompetence accumulated in some places.)

Maxim 45

TANSTAAFL.

There Ain't No Such Thing As A Free Lunch.
A big salute to the great Robert A. Heinlein for this perennial truth.
Good, durable code requires thought and hard work. Be prepared to provide both.
Every thing of value comes at a price, although the price is often not measurable in money.
Also see
"Fast. Cheap. Good. Select any two."

Maxim 46

Beware of the hidden blanks.

One tends to forget the fact that character variables have a fixed defined length, and values will always be padded with blanks up to that length. So it is always a good idea to use trim() when doing comparisons or concatenations.
Example:

if charvar in ('AAA','BBB','CCC');

will work even if the defined length of charvar is > 3, but

if findw('AAA BBB CCC',charvar) > 0;

will fail if charvar has a length of 5 and will therefore contain 'AAA ', which can't be found in the comparison string.
Similarly, charvar = charvar !! 'some_other_string'; will invariably surprise the unwary novice.

Maxim 47

Set a length.

When creating new character variables, always set a sufficient length explicitly. When character variables are assigned literals, the first such assigment sets the length, causing confusion afterwards, because longer data will be truncated.
Example:

flag = 'no';
if (condition) then flag = 'yes';
if flag = 'yes' then …;

Surprise!

Maxim 48

The dot, the dot, always the dot.

Make it a habit to always terminate macro variable references with a dot. It's never wrong to have the dot, but omitting it can have consequences and cause confusion.

See

libname myexcel xlsx "/path/excelfile&year.xlsx";

Maxim 49

Don't forget to run;

People frequently omit the run; statement, because SAS takes care of step boundaries all by itself.
But there are circumstances when this can cause a hard-to-detect problem; imagine this part of a macro:

data _null_;
call symputx('nobs',nobs);
set have nobs=nobs;
%do i = 1 %to &nobs;

Since no run; was encountered, the data step was not yet compiled and executed, leading to a "symbolic reference not resolved" error.
Bottom line: as with dots in macro variable references, the run; never hurts, but its absence sometimes does.

Maxim 50

Check your OBS= system option.

Often OBS= was set to limit the number of observations for tests. This will cause seemingly inexplicable terminations of steps later on.

Maxim 51

Hash can fire you up.

The rather recently (version 9) introduced hash object provides a high-performance tool for in-memory lookups that helps solve complicated issues which required complex programming earlier on, or multi-step operations consuming time and disk space.
Learning its basic operation, and use in advanced situations is a must for the aspiring SAS programmer.
A very good introduction is found in SAS® Hash Object Programming Made Easy by Michele M. Burlew.

Maxim 52

Take a break.

Whenever you run into a seemingly unsolvable problem, turn away from it for a while. Play a game (or watch one), listen to music, have a conversation with your S.O., make up a good meal, or just have a good coffee. Or have a night's sleep.
Cleansing your mind from all the (non-helping) thoughts you already had will open it up for the single new one you need to get over the obstacle.
It may even be the case that a solution comes to you in your dreams (I had that happen to me once; got up in the middle of the night, took some notes, and solved the issue in just a few minutes after arriving back at the office).

RW9 · ‎04-24-2017

Nice, I like it. Now sticky plaster that across any new post. A couple of suggestions:

- Data modelling - structuring and re-structuring your data is necessary to work cleanly and efficiently with your data. Remove the mindset of have to work like Excel in transposed format (or vice versa for DB programmers).

- Self imposed restrictions - avoid them. To often is heard the call, "I can't use XYZ" or "I have to use XYZ". If output needs to be in transposed Excel then fine, but that does not determine how you work with the data, only the output.

- And my big favorite: Planning and documentation. With any programming task 99% of the work should be put into documentation - e.g. specs, testing plans, user guides etc. Actually programming code is almost negligible nowadays (and once you have the documentation). An example, question such as - my import has changing structure how do I cope - a documented process here would throw the data back at the vendor to comply with agreement - no longer a problem.

ballardw · ‎04-24-2017

As a paraphrase from the inspiration:

Variables, datasets and function names should be easier to tell apart.

And a couple of suggested additions

The Label statement is your friend.

Don't name variables: Mean_of_state_income_in_2016, put that information in the Lable and use something shorter to type.

Formats may be the easiest way to create bins/ groups/ category levels from single variables.

Informats can do a lot of data validation.

mftuchman · ‎04-26-2017

"Don't keep data in structure" is definitely one observed more in the breach than the observance. There must be a cognitive reason for this - that people create these variables rather instinctively.

I think the phrase "Adapt your thinking .... normalized tables" could use some more elaboration. It is not clear from what is written why such an adaptation would be advantageous.

SASKiwi · ‎04-28-2017

Love it. This pretty much lists most of the maxims I try to follow. I'd like to suggest an addition:

Testing Requires PROC COMPARE

If you haven't used PROC COMPARE in testing a change in a SAS application, you haven't tested it properly. It confirms that the change has applied correctly to your data as well as checking you haven't accidentally changed data you shouldn't have.

I've lost count of the number of times PROC COMPARE has helped me build better, more error-free applications.

Maxim 21 is my favourite. The power and flexibility of SAS formats is just amazing. I've got examples using up to 50 SAS format lookups in a single DATA step. I'd hate to do that in SQL, and I use it a lot where appropriate.

Reeza · ‎04-29-2017

Awesome work.

PeterClemmensen · ‎05-01-2017

This is so short and precise and cool.

Kurt_Bremser · ‎05-01-2017

Thank you all so far for the positive responses. As you can see (revision history), quite a lot of the suggestions made it into the Maxims.

It's very motivating.

GeorgeSAS · ‎05-03-2017

I believe this works for all programmer

ChrisNZ · ‎05-04-2017

So many truths in this article!

Especially #23! 🙂

Keeping everything in text files (maxim # 26), including sas programs (maxim # 27) is certainly too often overlooked, sometimes with dire consequences.

A few comments:

Maxim 24
Return with a zero. No WARNINGs, no ERRORs

SAS makes this hard at times, especially SAS/GRAPH. Some warnings, like unevenly spaced ticks on the axis, can't be avoided and create a huge pain for no reason. (see here )

Maxim 20
Keep your names short. Short, concise variable names make for quicker typing with less risk of typos, and make the code more readable. Put additional information in labels, where it belongs.

As for myself, I much prefer longer and meaningful variable names rather than shorter names. The label is hard and distracting to access when reading code.

Maxim 12

Make it look nice

Yes yes yes! And especially, align things as much as possible. Nice is not enough. It should be orderly and pretty.

Maxim 13

When you're through learning, you're through.The time to learn is NOW.

So true!

Kurt_Bremser · ‎05-05-2017

@ChrisNZ wrote:

Maxim 20
Keep your names short. Short, concise variable names make for quicker typing with less risk of typos, and make the code more readable. Put additional information in labels, where it belongs.

As for myself, I much prefer longer and meaningful variable names rather than shorter names. The label is hard and distracting to access when reading code.

By concise I don't mean to reduce variable names to unintelligible abbreviatons, but to something just long enough to convey meaning.

eg

employee_average_salary_for_a_year

(bad)

'This variable contains the average salary for a year'n

(even worse)

aasal

(bad)

avg_annual_salary

(good)

Kurt_Bremser · ‎05-05-2017

The inspiration for Maxim 23 came from the glossary in a HP (back when HP was still HP) manual for HP-UX on the 9000 series:

Recursion

See recursion.

The same could be found in the manual for their programmable hand-helds, IIRC

ChrisBrooks · ‎05-05-2017

I always tell new developers "Talk to the customer" - if you develop a good working relationship with your customers/clients it can mitigate a lot of problems particularly if they haven't expressed themselves well in the specification

Shmuel · ‎05-15-2017

I agree with every maxim, every word.

Experience Brings Wisdom.

i.e. Recursion.

bbenbaruch · ‎05-18-2017

Hyperlinks to examples would be helpful.

Also, a similar sheet for statisticians and analysts would be helpful. There are important differences in doing production work and doing custom analytics. (And I distinguish "analytics" from creating "production reports".)

JasonDiVirgilio · ‎05-18-2017

"Maxim 23

Recursion.

See Maxim 23."

Nicely done. Although, I think seeing recursion in a SAS program would blow most SAS programmers' minds.

bbenbaruch · ‎05-18-2017

"Maxim 23

Recursion.

See Maxim 23."

Nicely done. Although, I think seeing recursion in a SAS program would blow most SAS programmers' minds.

Artp · ‎05-26-2017

Excellent code of ethics. Must be shared with managers. Quick results based on shaky data foundations tend to lead to bad policies and decision-making!

BeverlyBrown · ‎05-26-2017

There's so much awesome in this article and comment thread, I can't even...

Kurt_Bremser · ‎05-29-2017

As you can see, some additions have been made.

Thanks again to all for the positive feedback and the encouragement.

Ron_MacroMaven · ‎06-26-2017

keep a Daily Job Diary.

1. your DJD is the key to being able to accurately guesstimate how much time you will spend on a project in the future.

2. a regular analysis at end.of (day or week) will keep you on track;

e.g.: "Just how far off track are you willing to go in this new&improved work-around?"

3. a list of accomplishments is invaluable in your quarterly/semi-annual/annual personal evaluation

chithra · ‎07-13-2017

great..Thanks alot.

It's realy a motivation for the beginners like me.

bbenbaruch · ‎10-31-2017

"Best practices" differ slightly depending upon the type of work one is doing. Developing "production code" is different from developing analyses. And analysis is different from reporting. (Most people with the title of "Analyst" actually don't do any analysis at all, rather reporting of data in ways that assist people in understanding them. They are "data clerks" but unfortunately the job title "clerk" like "secretary" has been devalued in our society so people are given titles that obscure what they really do. And just as unfortunately, by not valuing "data clerking" and reporting in their own right, this work is also denigrated. An unfortunate additional result of this is that the terms "analysis" and "data mining" have become meaningless. Analysis means any work that reports numbers in a different format or aggregation from the way they are stored. Data mining now means any query into a "large dataset" or any query involving two or more datasets. The goal of data mining has become shovelling out the dirt rather than finding the diamonds-in-the-rough.) The main point is that best practices for 1. writing production code, 2. data reporting and data clerking, and 3. data analysis and data mining differ slightly. Sometimes these differences were noted in the maxims above, but this needs to be more explicitly acknowledged.

fifthand57th · ‎10-31-2017

Please tell me you plan to write a SAS Global Forum paper to present this!

Kurt_Bremser · ‎11-09-2017

Regarding recursion with SAS, see https://communities.sas.com/t5/SAS-Data-Management/recursive-joins-with-PROC-SQL/m-p/411841#M12592

Kurt_Bremser · ‎11-16-2017

Finally, the answer to The Ultimate Question. 42 likes.

RW9 · ‎11-16-2017

Thats likely to be those 42 people who actually follow them like us. Or is it just me but all I see now is:
DATA DATA; SET XYZ; %DO_SOMETHING(&&&SORT10));

lychege · ‎02-07-2018

I like it! What to consciously abide by to make work easier.

JBailey · ‎02-14-2018

This is great!

Sushil_rsk · ‎02-15-2018

Thank you Kurt for the time you took to draft these Maxims. These are very helpful, and a good reference for everyone

PavelD · ‎07-04-2018

I had a good laugh reading these, and learned a lot in the process.

Now I will have to read them again once in a while.

FreelanceReinh · ‎08-31-2018

A recent post about numeric representation issues (a recurring topic) inspired me to propose this new maxim (draft):

Beware of numerical accuracy issues.

Whenever you work with numbers with a few decimals, remember that in most cases (!) SAS cannot store their exact values (in the binary system). The resulting numeric representation error can easily accumulate in calculations. Avoid such hidden inaccuracy in calculated numeric values.

u=0.36;              /* Decimals! Danger ahead! */
t=round(10*u, 1e-9); /* preferable to: t=10*u; */
if t>=3.6 then ...;  /* No issues thanks to rounding. */

ChrisNZ · ‎08-31-2018

Excellent idea!

First draft: I suggest changing the sentence to something like: When working with decimal numbers, remember that computers cannot always store the numbers' exact values. This is because the decimal numbers that humans use cannot alwyays be perfectly represented in the binary system that computers use.

Or not.. Just a suggestion...

FreelanceReinh · ‎08-31-2018

Thanks @ChrisNZ. Actually it's a good idea to provide a bit more background ("This is because ..."). I just tried to keep it as succinct as possible. I think the wording "cannot always" is an understatement, given that, for example, 99.2% of the numbers with up to 3 decimal places are affected by numeric representation error. That's why I emphasized "in most cases".

Also, while virtually all non-integer numbers are prone to those issues, most practical examples I've seen involved numbers with only a few decimals.

bbenbaruch · ‎08-31-2018

Many of us also work across platforms, for example querying a DB2, Oracle, Netezza or Terradata database and writing our SAS dataset to the unix server on which SAS is installed.

In my experience, (usually small but sometimes fateful) computer precision errors may occur simply because data are transferred across platforms. SAS has tools/functions to deal with this (e.g. ROUND and FUZZ). But we need to be aware of the issue.

In my work, an issue that occurs frequently is the expectation that when I sum values that should result in 0 the actual sum may be 0.00000000001 (or similar). When I subsequently filter on "WHERE VAR0 = 0" I get unexpected results.

Benjamin (Benjy) Ben-Baruch
Business Intelligence and Marketing Analytics
Harry and David, LLC
www.HarryandDavid.com<>

ChrisNZ · ‎09-01-2018

Typo in #44
or the 'my name'd construct in SAS
d should be n

LinusH · ‎09-02-2018

I can't believe that missed out on this post (or worse forgot about it). It pure gold.
That said, agree to almost every word except #17 (too much simplification of the data warehouse concept) and #28 (even programmers and maintenance can be helped with some easy to understand values on the log).

bbenbaruch · ‎01-31-2019

Re Maxims 40 and 42:

It is the job of analysts to consult with clients and not just do what they ask.

They do not always know what they want or need -- or at least cannot always articulate this in a way that is meaningful to analysts.
When consulting with them it often becomes apparent that what they asked for is not exactly what they need.
I often (correctly) anticipate that as soon as I give them what they need they will ask the next question and they will ask for a "deeper dive" into the data. I have therefore found that it is often efficient to pull more data than is required for the immediate task -- or to at least include the code to do so even if I "comment it out". Doing this makes for ugly logs (see Maxim 25 above) but for analytic work that is not in production I would still recommend this practice.

Reeza · ‎01-31-2019

@bbenbaruch A clean log refers to no notes or warnings, not necessarily looking 'clean'. Commented out code shouldn't affect a clean log.

I'm fully with you, I try to understand the question being asked, what the next question would be and what people will do with that data.

IMO, this is the biggest difference between a really good analyst and a good analyst 🙂

sureshprabhu153 · ‎02-07-2019

Nice Explanation,,

Thanks for valuable information

ScottBass · ‎02-11-2019

1) I have a personal "maxim" - "Sorting is Evil :)". What I mean by that is, often with large data volumes, sorting is one of the biggest hits to job performance. Sure, if it's needed, it's needed. But would an index work? Can I use an index key lookup instead of a merge (but that's just an "outer join")? Or a hash object (if it fits in memory)? Or can I get the RDBMS to do the sort via explicit passthrough, and will that perform better? Can I create the index(es) as part of my nightly ETL, i.e. out-of-hours? Would the overhead of random I/O via the index be outweighed by the time savings in not doing the sort (esp. for wide datasets)? To that end, perhaps add a maxim about the use of indexes? In my experience they are often under used in SAS.

2) IMO (my opinion only) Maxim 48 (always the dot) contradicts Maxim 12 (make it look nice). It's just a personal preference, but I don't like the needless termination of macro variables with dots. I know which tokens will cause the compiler to terminate a macro variable name, so I choose to only use dots when required. But hey, that's me, YMMV.

But I find this "ugly":

%if (&mvar. eq &someothermvar.) %then %let filepath=/&mvar1./&mvar2./&mvar3./somefile.txt;

Every single one of those dots could be removed and the code would work fine and IMO look better.

Kurt_Bremser · ‎02-11-2019

@ScottBass

It has been my experience that in most cases, indexes don't improve performance; they only "work" (in the sense of providing better performance) when the result of a specific query is a considerably small part of the dataset; when the whole dataset needs to be read anyway, they tend to worsen the performance, and a dedicated sort (where you can also reduce dataset size by only keeping what's needed) and merge is faster. Please note that in my part of the DWH process I mostly use whole datasets, which may skew my view.

For the second point, mind that this is targeted mainly at newbies, who often run into trouble because of the missing dot (see many threads here on the communities). Using the dot always remedies that, and with growing experience people will get a solid idea when the dot can be omitted.

I'm quite sure we can have a good discussion in Dallas 🙂

ChrisNZ · ‎02-11-2019

I agree about sorting being best avoided as it's so extensive.
Another way to bypass a sort is that SPDE tables can be used with a BY statement without being ordered, and are sorted on the fly. It works very well and can save time if sorting is a one-off need.

About the final dot for macro variables, I find that they create screen clutter but I use them because that's the only way the colour parser will highlight them.

RW9 · ‎02-11-2019

@ScottBass , just in terms of the dot at the end of the macro variables its more about having a set of base style elements that all use. So in general terms there are instances where the dot is necessary, therefore the default position for a consistent approach would be to always have the dot there. The alternative is to have a rule something like: Use the dot where necessary except in cases... Which would then produce a lengthy rule and still not really end up with any kind of uniformity anyways. In the same way from your example, I could write my datasteps like this:

DATA WANT;DO I=1 TO10;%S;OUTPUT;END;

Would you be wanting to maintain code that looks like this? I sure wouldn't. (The %S is deliberate, to cover the "well I know what its doing").

@ChrisNZ also raises one other aspect of omitting the dot in that color editors no longer always pick up on it.

Take a look at the many languages out there, python for instance has the PEP guidelines. SAS (both in the code it creates/publishes in the documents, and the training taken therefrom) is sorely missing this type of thing, has been for many years, and it is one of the reasons the editors we get are so far behind all the other ones and why coding is still done using all uppercase, all on one line etc.

tomrvincent · ‎08-29-2019

Great compilation. I have it hanging on my office wall. I think I'd add Maxim 0: Show appreciation and gratitude towards those who help you. So often I see no appreciation (or worse, immediate criticism) expressed in the communities and elsewhere. No wonder many don't bother sharing, participating or contributing.

Kurt_Bremser · ‎08-29-2019

@tomrvincent it's also now a paper, as I did a presentation at SAS GF 2019 in Dallas:

https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3062-2019.pdf

tomrvincent · ‎08-29-2019

@Kurt_Bremser wonderful! And it looks like it's even got typos corrected (24 & 47, for example)! When I put it in Word, I made sure no maxim was split by page breaks just to keep them together on a page.

Kurt_Bremser · ‎08-29-2019

Where do you see the typos in 24 & 47 in the (current) online version above? I can't find them, but I'm no native English speaker/writer either.

tomrvincent · ‎08-29-2019

@Kurt_Bremser succesful and assigment

others such as 'eg' instead of 'e.g.', 'sql' instead of 'SQL, 'every thing' instead of 'everything'.

gsvolba · ‎03-21-2020

Great paper!
Thank you Kurt, for presenting this paper at the SAS Club Austria in November 2018 in Vienna

andreas_lds · ‎07-03-2020

Any easy way to force the implementation of Maxim 25 is by using

options dsoptions= note2err;

With that option enabled all unexpected notes are turned into errors.