BookmarkSubscribeRSS Feed
Sodube
Calcite | Level 5

Hello, 

I haven't used SAS in years and I am now trying to use it for some data analysis. I have a large data set that is output from another program into an excel spreadsheet. It is a survey that has many parts. Currently there are 229 columns. The first column is the specialty. The next 228 columns are Name, Question 1 Answer, Q2 Ans, Q3 And, Q4 Ans, and Q5 Ans for 38 specialties. The responses are all characters (Excellent, Good, Needs Improvement) with some left blank. Also, each column is currently named a unique random letter and number combination.  Below is a crud example of what is currently looks like:

 

                 Q2_1  Q2_2_1 Q2_2_2 Q2_2_3 Q2_2_4 Q2_2_5 Q109 Q110_1 Q110_2 Q110_3 Q110_4 Q110_5.....

Specialty1   Bob      Ex          Ex         Good      Ex        Ex 

Specialty1   Tom    Good      Good       NI          Ex        Ex

Specialty1   Sue      Ex          NI                         Good    Ex

Specialty1   Bob      Ex         NI             Ni         Good     Ex 

Specialty2                                                                                   Lee      Ex         Ex        Good       NI           Ex

Specialty2                                                                                   Jim      Good     Ex                    Ex            Good

Specialty2                                                                                   Joe       Ex         Ex       Good       Ex           Ex

 

I need it to look more like this:

 

                 Name Q1_Ans Q2_Ans Q3_Ans Q4_Ans Q5_Ans

Specialty1   Bob      Ex        Ex       Good    Ex           Ex 

Specialty1   Tom      Good   Good   NI         Ex            Ex

Specialty1   Sue      Ex        NI                     Good       Ex

Specialty1   Bob      Ex        NI        Ni         Good        Ex 

Specialty2   Lee      Ex        Ex       Good      NI           Ex

Specialty2   Jim       Good   Ex                     Ex           Good

Specialty2   Joe       Ex       Ex       Good      Ex           Ex

 

 

What would be the best/simplest way to combine 228 variables down to 6 variables? 

 

I am using SAS 9.4. 

 

 

 

3 REPLIES 3
ballardw
Super User

If this were my data I would likely save that Excel file to a CSV and read the data so that I had variables I wanted.

Maybe something like:

data example;
   infile "something.csv" dlm=',';
   informat specialty name $25. q1 - q5 $10.;
   input specialty @;
   do i=1 to 38;
      input name q1 - q5 @;
      /* only keep the records with a name*/
      if not missing (name) then output;
   end;
   input; /* advance to next line*/
   drop i;
run;

The @ holds the reading on the current line until an INPUT without the @ is used.

I calculated 38 iterations needed but you know your data.

 

IF you KNOW that Specialty and NAME (or any of the other columns) would absolutely positively never have a space in the value then you could save the file as a TEXT file with a space delimiter and use simple list input as the input statement would skip over all of the spaces.

data example;
   infile "something.txt" dlm=' ';
   informat specialty name $25. q1 - q5 $10.;
   input specialty name q1-q5;
run;

 

 

 

 

Sodube
Calcite | Level 5

This is almost working. The problem is some of the Specialties have commas, spaces, and "/" in them. (Ex: "Gynecology and Obstetrics / Maternal-Fetal Medicine" and "Psychology, Psychiatry, and Developmental Medicine") And each name is formatted Last, First Middle, Title. Included are hyphenated names and periods in titles. Example: Doe-Smith, John, M.D. Sorry my original post wasn't very descriptive. 

 

Any thoughts? 

ballardw
Super User

@Sodube wrote:

This is almost working. The problem is some of the Specialties have commas, spaces, and "/" in them. (Ex: "Gynecology and Obstetrics / Maternal-Fetal Medicine" and "Psychology, Psychiatry, and Developmental Medicine") And each name is formatted Last, First Middle, Title. Included are hyphenated names and periods in titles. Example: Doe-Smith, John, M.D. Sorry my original post wasn't very descriptive. 

 

Any thoughts? 


The comma should work if you add the DSD option to the infile statement. That indicates that there may be delimiters inside the data values.

 

There are other ways to read this data but due to the nature of Excel and such it may be a tad more difficult to get a consistent length for the character variables.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 440 views
  • 0 likes
  • 2 in conversation