BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JohnDooley
Calcite | Level 5

I am trying to import an xlsx file. Unfortunately the variable names will contain special characters and blanks (there are many such columns). The data is being downloaded from a third party provider to this xlsx sheet and I have no control over their naming convention. Furthermore, we will want to periodically update this spreadsheet (possibly adding more variables and switching the order of columns) and rerun the import statement for analysis purposes. Because of these concerns it would not be practical to hard code the names of each variable upon import.

I would like to strip out all of the special characters and spaces upon import (or replace them with underscores). This used to be feasible with the GETNAMES option for xls files but I do not believe that it is supported for xlsx file types (we need the xlsx file type because there is to much data for an xls).

Does anyone have any thoughts about how to rename these variables upon import (without changing the excel file)?

1 ACCEPTED SOLUTION

Accepted Solutions
jakarman
Barite | Level 11

Than do it in two steps....
At first just reading the column headers.  Do not forget options like validvarname=V7 obs= 

Do your inputvalidation on that dataset checking the unique=-names, after input-validation is passing than process further.

Remembering the rootcause of heart-bleed. The coder did not check/validate the incoming request. Just forgot it.

Fool/full-proof coding is hard work.

---->-- ja karman --<-----

View solution in original post

5 REPLIES 5
jakarman
Barite | Level 11

Bypass the column names row and use your own naming

---->-- ja karman --<-----
JohnDooley
Calcite | Level 5

It is important that I know which data column corresponds to which variable since the analysis that I need to run is unique for each column. If I use a standardized naming scale at the import level I am concerned that if the order of the columns changes in the excel spreadsheet (the next time I download the sheet from our data provider) it will not be noticed causing misleading results. This is why I suspect I need the variable names to be imported as valid sas variable names, but unique names that will not depend on ordering of columns.

jakarman
Barite | Level 11

Than do it in two steps....
At first just reading the column headers.  Do not forget options like validvarname=V7 obs= 

Do your inputvalidation on that dataset checking the unique=-names, after input-validation is passing than process further.

Remembering the rootcause of heart-bleed. The coder did not check/validate the incoming request. Just forgot it.

Fool/full-proof coding is hard work.

---->-- ja karman --<-----
JohnDooley
Calcite | Level 5

Thank you for your help. The option validvarname=v7 worked very nicely and I will be sure to validate the names.

Reeza
Super User

SAS will generally do this by default, assuming you don't have the option validvarname=any set.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Discussion stats
  • 5 replies
  • 1670 views
  • 0 likes
  • 3 in conversation