Removing text string from a variable name?

Reply
Occasional Contributor
Posts: 13

Removing text string from a variable name?

I receive survey data on a monthly basis. Every few months, the survey process is updated and occasionally the order of questions asked changes (especially if a new question gets tacked in). Unfortunately, the Excel files I receive follow the pattern "Question Header (Q#)"; when I import this into SAS, therefore, the variable is named question_header__q1_ (or __q2_, __q3_, etc.). But, Q1 one month might be Q3 the next month if the order changes. I need to combine multiple months' worth of files at a time AND clean up the observations, therefore it's important to get consistent headers.

 

Is there a way to remove the __q#_ from the end of all headers that have that pattern? (e.g., date, name, etc. DO NOT have that string). There are 189 variables in total, maybe 20 of them do not have this question number. At present I'm trimming in Excel by hand.

Valued Guide
Posts: 591

Re: Removing text string from a variable name?

[ Edited ]

Once the Excel file are imported into SAS then you can use dictionary.columns or SASHELP.VCOLUMN to find the variables in the dataset. You can also put the variables you need into a macro and then use RETAIN to order the variables in a dataset.

 

proc sql;
select name 
from dictionary.columns
where libname="SASHELP" and memname="CLASS"
  and name not in ('Height','Weight')
;
quit;

 Also you can use these tables to re-create or re-structure the existing variables.

Thanks,
Suryakiran
Super User
Posts: 23,724

Re: Removing text string from a variable name?

[ Edited ]

Sure, you can grab the names from the dictionary tables and then modify them accordingly.

 

This may work:

rename __q1_ - __q#_ = Q1-Q#;

If not, you can use COMPRESS to strip out the _ and leave it as Q1-Q#.

 

%let dsn=Survey;
%let lib=mySurvey;


proc sql noprint;
select catx("=", name, compress(name, '_')) 
into :rename_list
separated by " "
from sashelp.vcolumn
where upper(libname)=upper("&lib.")
and upper(memname)=upper("&dsn.")
and upper(trim(name)) like '__Q%';
quit;


%put &rename_list;


proc datasets library=&lib. nodetails nolist;
modify &dsn.;
rename &rename_list;
run; quit;

proc print data=&lib.&dsn (obs=10) noobs;
run;

 Slightly modified from here:

https://gist.github.com/statgeek/82d9f2854edc01560e0f


@ColeG wrote:

I receive survey data on a monthly basis. Every few months, the survey process is updated and occasionally the order of questions asked changes (especially if a new question gets tacked in). Unfortunately, the Excel files I receive follow the pattern "Question Header (Q#)"; when I import this into SAS, therefore, the variable is named question_header__q1_ (or __q2_, __q3_, etc.). But, Q1 one month might be Q3 the next month if the order changes. I need to combine multiple months' worth of files at a time AND clean up the observations, therefore it's important to get consistent headers.

 

Is there a way to remove the __q#_ from the end of all headers that have that pattern? (e.g., date, name, etc. DO NOT have that string). There are 189 variables in total, maybe 20 of them do not have this question number. At present I'm trimming in Excel by hand.


 

Super User
Posts: 13,542

Re: Removing text string from a variable name?


@ColeG wrote:

I receive survey data on a monthly basis. Every few months, the survey process is updated and occasionally the order of questions asked changes (especially if a new question gets tacked in). Unfortunately, the Excel files I receive follow the pattern "Question Header (Q#)"; when I import this into SAS, therefore, the variable is named question_header__q1_ (or __q2_, __q3_, etc.). But, Q1 one month might be Q3 the next month if the order changes. I need to combine multiple months' worth of files at a time AND clean up the observations, therefore it's important to get consistent headers.

 

Is there a way to remove the __q#_ from the end of all headers that have that pattern? (e.g., date, name, etc. DO NOT have that string). There are 189 variables in total, maybe 20 of them do not have this question number. At present I'm trimming in Excel by hand.


I deal with some similar behaving data. My approach is to not rely on anything that creates inconsistent data. So I save the Excel files to CSV, examine the format/contents, and use a data step to read the data. That way I control what the variable names are so "Q1" stays "Q1" in all of the data sets, the lengths and variable types (character or numeric) stay the same. I assign meaningful labels for each variable.

When new columns are added I only need to provide information on the new variables: informat, length and label and place the new variable name in the correct position in the INPUT statement. If an existing "question" significantly changes position I just make sure it is in the correct order in the input statement. If a variable is removed (by topic not the "q" header you are getting) then remove it from the input statement (and informat/length/ label sections if desired).

This approach means that when the successive data sets are combined for analysis we don't have issues with mismatched data types, truncation of values due to changing lengths of character variables and such. Saving each version of the data step code also documents how which files were read.

 

In effect I am only using the column headers to check the order of the variables I want to use. Possibly the text might be useable for a label.

Occasional Contributor
Posts: 13

Re: Removing text string from a variable name?

Thanks for you help everyone!

 

I took your suggestions and worked around on it and came up with this:

proc sql noprint;
 select cats(name,"=",substr(name,1,find(name,'__')-1)) into :rename_list separated by ' '
 from dictionary.columns
 where upper(libname)=upper("work") and upper(memname)=upper("have");
quit;

data want;
 set have;
 rename &rename_list.;
run;

The outcome of this is that all the variables that include the string '__' (as part of __Q#_) have the whole Q# stripped out of their name.

 

Ask a Question
Discussion stats
  • 4 replies
  • 102 views
  • 2 likes
  • 4 in conversation