Hello,
I haven't used SAS in years and I am now trying to use it for some data analysis. I have a large data set that is output from another program into an excel spreadsheet. It is a survey that has many parts. Currently there are 229 columns. The first column is the specialty. The next 228 columns are Name, Question 1 Answer, Q2 Ans, Q3 And, Q4 Ans, and Q5 Ans for 38 specialties. The responses are all characters (Excellent, Good, Needs Improvement) with some left blank. Also, each column is currently named a unique random letter and number combination. Below is a crud example of what is currently looks like:
Q2_1 Q2_2_1 Q2_2_2 Q2_2_3 Q2_2_4 Q2_2_5 Q109 Q110_1 Q110_2 Q110_3 Q110_4 Q110_5.....
Specialty1 Bob Ex Ex Good Ex Ex
Specialty1 Tom Good Good NI Ex Ex
Specialty1 Sue Ex NI Good Ex
Specialty1 Bob Ex NI Ni Good Ex
Specialty2 Lee Ex Ex Good NI Ex
Specialty2 Jim Good Ex Ex Good
Specialty2 Joe Ex Ex Good Ex Ex
I need it to look more like this:
Name Q1_Ans Q2_Ans Q3_Ans Q4_Ans Q5_Ans
Specialty1 Bob Ex Ex Good Ex Ex
Specialty1 Tom Good Good NI Ex Ex
Specialty1 Sue Ex NI Good Ex
Specialty1 Bob Ex NI Ni Good Ex
Specialty2 Lee Ex Ex Good NI Ex
Specialty2 Jim Good Ex Ex Good
Specialty2 Joe Ex Ex Good Ex Ex
What would be the best/simplest way to combine 228 variables down to 6 variables?
I am using SAS 9.4.
If this were my data I would likely save that Excel file to a CSV and read the data so that I had variables I wanted.
Maybe something like:
data example; infile "something.csv" dlm=','; informat specialty name $25. q1 - q5 $10.; input specialty @; do i=1 to 38; input name q1 - q5 @; /* only keep the records with a name*/ if not missing (name) then output; end; input; /* advance to next line*/ drop i; run;
The @ holds the reading on the current line until an INPUT without the @ is used.
I calculated 38 iterations needed but you know your data.
IF you KNOW that Specialty and NAME (or any of the other columns) would absolutely positively never have a space in the value then you could save the file as a TEXT file with a space delimiter and use simple list input as the input statement would skip over all of the spaces.
data example; infile "something.txt" dlm=' '; informat specialty name $25. q1 - q5 $10.; input specialty name q1-q5; run;
This is almost working. The problem is some of the Specialties have commas, spaces, and "/" in them. (Ex: "Gynecology and Obstetrics / Maternal-Fetal Medicine" and "Psychology, Psychiatry, and Developmental Medicine") And each name is formatted Last, First Middle, Title. Included are hyphenated names and periods in titles. Example: Doe-Smith, John, M.D. Sorry my original post wasn't very descriptive.
Any thoughts?
@Sodube wrote:
This is almost working. The problem is some of the Specialties have commas, spaces, and "/" in them. (Ex: "Gynecology and Obstetrics / Maternal-Fetal Medicine" and "Psychology, Psychiatry, and Developmental Medicine") And each name is formatted Last, First Middle, Title. Included are hyphenated names and periods in titles. Example: Doe-Smith, John, M.D. Sorry my original post wasn't very descriptive.
Any thoughts?
The comma should work if you add the DSD option to the infile statement. That indicates that there may be delimiters inside the data values.
There are other ways to read this data but due to the nature of Excel and such it may be a tad more difficult to get a consistent length for the character variables.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.