BookmarkSubscribeRSS Feed
ColeG
Obsidian | Level 7

I receive survey data on a monthly basis. Every few months, the survey process is updated and occasionally the order of questions asked changes (especially if a new question gets tacked in). Unfortunately, the Excel files I receive follow the pattern "Question Header (Q#)"; when I import this into SAS, therefore, the variable is named question_header__q1_ (or __q2_, __q3_, etc.). But, Q1 one month might be Q3 the next month if the order changes. I need to combine multiple months' worth of files at a time AND clean up the observations, therefore it's important to get consistent headers.

 

Is there a way to remove the __q#_ from the end of all headers that have that pattern? (e.g., date, name, etc. DO NOT have that string). There are 189 variables in total, maybe 20 of them do not have this question number. At present I'm trimming in Excel by hand.

4 REPLIES 4
SuryaKiran
Meteorite | Level 14

Once the Excel file are imported into SAS then you can use dictionary.columns or SASHELP.VCOLUMN to find the variables in the dataset. You can also put the variables you need into a macro and then use RETAIN to order the variables in a dataset.

 

proc sql;
select name 
from dictionary.columns
where libname="SASHELP" and memname="CLASS"
  and name not in ('Height','Weight')
;
quit;

 Also you can use these tables to re-create or re-structure the existing variables.

Thanks,
Suryakiran
Reeza
Super User

Sure, you can grab the names from the dictionary tables and then modify them accordingly.

 

This may work:

rename __q1_ - __q#_ = Q1-Q#;

If not, you can use COMPRESS to strip out the _ and leave it as Q1-Q#.

 

%let dsn=Survey;
%let lib=mySurvey;


proc sql noprint;
select catx("=", name, compress(name, '_')) 
into :rename_list
separated by " "
from sashelp.vcolumn
where upper(libname)=upper("&lib.")
and upper(memname)=upper("&dsn.")
and upper(trim(name)) like '__Q%';
quit;


%put &rename_list;


proc datasets library=&lib. nodetails nolist;
modify &dsn.;
rename &rename_list;
run; quit;

proc print data=&lib.&dsn (obs=10) noobs;
run;

 Slightly modified from here:

https://gist.github.com/statgeek/82d9f2854edc01560e0f


@ColeG wrote:

I receive survey data on a monthly basis. Every few months, the survey process is updated and occasionally the order of questions asked changes (especially if a new question gets tacked in). Unfortunately, the Excel files I receive follow the pattern "Question Header (Q#)"; when I import this into SAS, therefore, the variable is named question_header__q1_ (or __q2_, __q3_, etc.). But, Q1 one month might be Q3 the next month if the order changes. I need to combine multiple months' worth of files at a time AND clean up the observations, therefore it's important to get consistent headers.

 

Is there a way to remove the __q#_ from the end of all headers that have that pattern? (e.g., date, name, etc. DO NOT have that string). There are 189 variables in total, maybe 20 of them do not have this question number. At present I'm trimming in Excel by hand.


 

ballardw
Super User

@ColeG wrote:

I receive survey data on a monthly basis. Every few months, the survey process is updated and occasionally the order of questions asked changes (especially if a new question gets tacked in). Unfortunately, the Excel files I receive follow the pattern "Question Header (Q#)"; when I import this into SAS, therefore, the variable is named question_header__q1_ (or __q2_, __q3_, etc.). But, Q1 one month might be Q3 the next month if the order changes. I need to combine multiple months' worth of files at a time AND clean up the observations, therefore it's important to get consistent headers.

 

Is there a way to remove the __q#_ from the end of all headers that have that pattern? (e.g., date, name, etc. DO NOT have that string). There are 189 variables in total, maybe 20 of them do not have this question number. At present I'm trimming in Excel by hand.


I deal with some similar behaving data. My approach is to not rely on anything that creates inconsistent data. So I save the Excel files to CSV, examine the format/contents, and use a data step to read the data. That way I control what the variable names are so "Q1" stays "Q1" in all of the data sets, the lengths and variable types (character or numeric) stay the same. I assign meaningful labels for each variable.

When new columns are added I only need to provide information on the new variables: informat, length and label and place the new variable name in the correct position in the INPUT statement. If an existing "question" significantly changes position I just make sure it is in the correct order in the input statement. If a variable is removed (by topic not the "q" header you are getting) then remove it from the input statement (and informat/length/ label sections if desired).

This approach means that when the successive data sets are combined for analysis we don't have issues with mismatched data types, truncation of values due to changing lengths of character variables and such. Saving each version of the data step code also documents how which files were read.

 

In effect I am only using the column headers to check the order of the variables I want to use. Possibly the text might be useable for a label.

ColeG
Obsidian | Level 7

Thanks for you help everyone!

 

I took your suggestions and worked around on it and came up with this:

proc sql noprint;
 select cats(name,"=",substr(name,1,find(name,'__')-1)) into :rename_list separated by ' '
 from dictionary.columns
 where upper(libname)=upper("work") and upper(memname)=upper("have");
quit;

data want;
 set have;
 rename &rename_list.;
run;

The outcome of this is that all the variables that include the string '__' (as part of __Q#_) have the whole Q# stripped out of their name.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 4020 views
  • 2 likes
  • 4 in conversation