Hi all,
I was wondering if there is any advantage of doing data coding in SAS over excel? It seems to me that the conventions in SAS are roughly the same in this situation as doing a Ctrl + F.
I have a dataset with many variables that are mostly character and am looking to code them into nominal and ordinal variables. I also am looking to code missing values and change variable names as well as label the methodology behind the coding. I was wondering what the advantages of SAS are since it came directly from RedCap (survey much like Qualtrics) and there are few free response answers. I have included a picture of part of the data as a reference.
I was reading up and saw that CSV is a better format to import into SAS and plan to use that in the process, but was wondering if anyone had any pros and cons from their experience. 
@joebacon wrote:
I do have survey data. I would need to change the names of the variables first as they are the questions themselves. Would that not be easier in excel?
You bring up a fair point though. If I am going to run tests in SAS anyway, why not do the whole process in SAS for continuity. Another pro for SAS that I can see is that it is documented all in your code.
However, excel shows you your changes immediately without having to output your new dataset.
Is there really any con to doing it all in SAS?
I must say that that macro would be incredibly handy and if that was ever something you wanted to share, I'd be eternally grateful. 😉
Reasons to not use Excel:
True story: I was working on a survey analysis in September - as a contractor so not a lot of SME with the topic. I sent the results over. Almost immediately the director cals me and asks about the number of facilities included. The analyst had sent me the wrong data and I had to re do everything. The deadline for this report was November and that could not change because it was to a regulator. How much time would I spend rerunning that if I’d done it in excel? So I re-ran, had my tables and rewote the sections I needed too. One month later, the contractor who carried out the survey sent us a message. They’d made a mistake with the data and had a new file to deliver. Rinse and repeat.
My macros are on GitHub. I’ll find a link and send it to you later.
Cons to using SAS - graphs are a bit harder at first.
Have to program and it’s a proprietary language.
I do have survey data. I would need to change the names of the variables first as they are the questions themselves. Would that not be easier in excel?
You bring up a fair point though. If I am going to run tests in SAS anyway, why not do the whole process in SAS for continuity. Another pro for SAS that I can see is that it is documented all in your code.
However, excel shows you your changes immediately without having to output your new dataset.
Is there really any con to doing it all in SAS?
I must say that that macro would be incredibly handy and if that was ever something you wanted to share, I'd be eternally grateful. 😉
@joebacon wrote:
I do have survey data. I would need to change the names of the variables first as they are the questions themselves. Would that not be easier in excel?
You bring up a fair point though. If I am going to run tests in SAS anyway, why not do the whole process in SAS for continuity. Another pro for SAS that I can see is that it is documented all in your code.
However, excel shows you your changes immediately without having to output your new dataset.
Is there really any con to doing it all in SAS?
I must say that that macro would be incredibly handy and if that was ever something you wanted to share, I'd be eternally grateful. 😉
Reasons to not use Excel:
True story: I was working on a survey analysis in September - as a contractor so not a lot of SME with the topic. I sent the results over. Almost immediately the director cals me and asks about the number of facilities included. The analyst had sent me the wrong data and I had to re do everything. The deadline for this report was November and that could not change because it was to a regulator. How much time would I spend rerunning that if I’d done it in excel? So I re-ran, had my tables and rewote the sections I needed too. One month later, the contractor who carried out the survey sent us a message. They’d made a mistake with the data and had a new file to deliver. Rinse and repeat.
My macros are on GitHub. I’ll find a link and send it to you later.
Cons to using SAS - graphs are a bit harder at first.
Have to program and it’s a proprietary language.
Thank you Reeza, very helpful once again. Your anecdote sounds truly horrendous, but luckily you had your program which you could just change up as you needed.
More than half of the stuff that I actually do in Excel is comparing column headings and using it to write SAS LABEL syntax.
Frequently data from the same source with supposedly the same content will randomly switch column order in data supplied in Excel files.
So copy the row with headers and paste Transpose into a column on another worksheet, often the same one from the last time. I can then write a formula comparing the two columns and look for inequalities.
If I have data, such as you describe with stupid amounts of text as column headings I can go down the column next to the column headings just pasted and create a SAS variable name that is usable. Sometimes as simple as Q1 instead of "What was the last restaurant you went to for a special occasion". That column becomes the basis for an INPUT statement and any needed INFORMAT information.
Then a separate column can contain a formula to create label statements such as
Label q1="What was the last restaurant you went to for a special occasion";
So I can use a much nicer variable for coding and have the text of the question (or a cleaned up version for some truly egregious questions).
When the next file comes with the inevitable insert/deleted/added question I can manipulate an other set of columns using the comparison information approach previously mentioned to insert rows or cells to align the new data with the old, reuse existing variable names and recreate labels.
Since some of these survey groups may make minor wording changes such as
"What was the last restaurant you went to for special occasions" your comparison can let you decide if the questions are actually to be treated the same or not and whether the variable needs a new name. HINT: if you want to analyze both sets as the same, the variables should be the same.
Note that the metadata that SAS maintains has original data positions for the read file so you actually have some programmatically searchable to document changes (which comes up when contracting work and you have to provide exactly why a customer is getting billed for programming changes with every file sent).
Another reason to use SAS: You can have many more observations and/or variables in single data set. I have had multiple times in the past where I had to work around the row and column limitations of a spreadsheet. Admittedly most survey data doesn't approach the row limit but I have seen some exceed the columns.
Formats are an extremely powerful tool that is not easily replaced with VLOOKUP. I can maintain a library that has maybe 15 different age grouping formats. I can run Proc freq on data set with a variable that represents age and by changing the format name get different summaries. That process goes across graphic and almost all of the analysis procedures. Need to change groups for a procedure? Create a new custom format and the only change in the procedure is to change the format associated with the variable.
And for a limited number of procedures the MULTILABEL formats can be quite useful for certain types of tables.
Since I learned SAS/Graph before ever using spreadsheet graphics I will kindly disagree with @Reeza about graphics being easier.
I never had some of the data bombs people will put in an occasional Cell. I spent almost two days with an Excel graphic template and data trying to determine why one of the columns of a bar chart didn't quite reflect the actual changes in data. I finally found ONE cell with a formula that divided the one value of the entered data by 4. No comment. No description any where. I had to physically click in every single cell of a dependent data area linked to the entered data area to find that.
Another reason not to use Excel: Typing certain things into a cell may result in calculations you didn't intend or interpretation by Excel.
Type 5-1 into a cell. Watch that become 1-May.
Yes there are some things that might be quicker if done one time in Excel. But if something is to be done multiple times the programming in SAS is likely going to be superior in the long run. Also SAS is extremely conservative about removing code features.
So you program may work for a long time even after a feature is not documented in a later version.
Hi ballardw,
This was super insightful- especially the bit about using each tool separately and metadata with column position.
Between you and Reeza, I've decided to carry on with SAS. I was curiously to see how different people would handle their data differently and learn of the pros and cons.
Thank you both of you!
@Reeza I would humbly like to still ask for your github for the macro for the chi square. I am going to check your profile to see if you put it in there by any chance.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.
