An off-topic spot to chat about your musings of the day

Soft Skills – Part 7 – Data Management: Part II – Documentation

Regular Contributor
Posts: 233

Soft Skills – Part 7 – Data Management: Part II – Documentation

There’s a joke I’ve seen online that the worst programmer is you three months ago; this is, for me, a result of skills development and trying to remember what you did.  I’m sure I’m like most programmers / analysts that have some horribly complicated, constantly evolving code, trying to read through sub-queries, remembering why I set certain parameters to specific values, etc. is a time-consuming and frustrating exercise.  Kristin Briney, in her book “Data Management for Researchers”, covers the importance of documentation through review of notebooks, data dictionaries, code books, etc.  I wanted to focus on two different types – notes I take while reading and code documentation. 


In my article on Efficiency and Productivity I touch on a couple of apps that I use every day, including Microsoft’s OneNote.  The goal of this article is to cover some techniques that I’ve picked up on actually taking notes, whether they are from a meeting, a journal article, or a user group presentation.  When I was in high school and first year university, I spent countless hours literally copying out chapters of textbooks by hand; I thought this was the best way to memorise the material.  Then I learned the distinction between memorisation and understanding, and approached the material very differently.  I started learning themes and key concepts, and was able to cover a lot more information with a lot stress.  Now, when reading something that I need to understand, I take the same approach with my notes – I start off with a central theme and then branch off from there, making more of a mind map than actual notes.  I find this allows me to see how other articles / textbooks integrate into the information I’ve already read, allowing me to see the “big picture” a lot faster than trying to compare pages upon pages of bullet points.  Finding your own method of note-taking is a long process, and involves research on different types.  I recommend trying multiple versions, and taking bits and pieces of each to form your own style. 


Documentation of and in your code takes a long time, involves a lot of thought, and makes life a lot easier for you in the long run.  As I started off this article, the worst programmer is you three months ago – and without documentation, you’ll be your own worst enemy.  As this is a SAS community, it makes sense that’s the commenting style I use here; note that I use this consistently whatever language I’m writing.


/* Title of Report: Analysis of SASHELP.CLASS for Height / Weight Variance */
/* Purpose: Extract and analyze height and weight data; using scatter, box plot, and */
/* variance calculations in three distinct groups: Overall, by Gender, by Age */
/* Date code written: January 12, 2016 */
/* Validation: Raw data review */
/*                                                                    */
/* Date Code last run: January 15, 2016 */
/* Report Provided to:
/*        January 13, 2016: Dr Smith
/*        January 15, 2016: Drs Smith and Allan
/* Version: 1.1 */
/* Version History:  */
/*        1.0 – January 12 2016 – Code developed and validated        */
/*        1.0.5 – January 13 – Code updated based on conversation with Dr Smith to modify */
/*                    box plots  */
/*        1.1 – January 14 – Final changes made – colours changed in boxplots, titles */
/*                    formatted   */


In the section of code where I generate the boxplots, I will indicate changes I made, with a date time stamp and referencing any phone calls / emails or meetings. 


I mentioned a code book at the start of this section, and I’m sure everyone either has their own style or has never heard of them.  I use code books to compile my code, detailed notes of what I’ve done, references (if I’ve used them) or copies of articles used in developing the code, and if feasible, a copy of the final product.  This doesn’t necessarily need to be a physical binder (though I prefer it as I like hardcopies) but a folder-based system on a server that is routinely backed up would work as well. 


Documentation is something that if you’ve never done it, you won’t miss until something happens and you need to have it.  I have friends that have been doing data analysis for 20 plus years and have never documented a thing.  I have other friends (myself included) who may be seen as overzealous in the documenting of our research notes and code.  The time it takes to do this level of detail however is nothing compared to the time it may take to fix or rebuild a massive report that took 6 months to build and is organizationally-critical. 


Do you have any tips or tricks on documentation?  I’m curious to see what other people put in their comments!


Has my article or post helped? Please mark as Solution or Like the article!
Ask a Question
Discussion stats
  • 0 replies
  • 1 in conversation