- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello Friends,
I am working on a project where I have to read hundreds of html files stored as .xls into sas dataset so I can build the analytical process using that data. My initial proc import was unsuccessful. I was able to build import step in EG using File--> Import Data but can't do it for hundreds of files.
Alternately, I was successful in opening a source data file in excel(it spit a warning that file format and extension are not aligned; understandable as data is HTML and file extension is .xls) and save it as a true .xls file and then using proc import to bring the data into sas. Problem is I have hundreds of files so it is impossible to do this with all those files one by one.
I have looked for articles on importing HTML files far and wide but can't find any recent article, using which I could build my code. Latest one was 2009 but I could not replicate the method exampled in it.
Hoping community will be able to help me as always.
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There's a VBS macro here that converts XML to XLSX which you could then automatically import. It will do all files in a single folder.
https://gist.github.com/statgeek/c51f58a009f8d315a200f34912e494b1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Rezza for your response .
I am looking for process to bring in HTML files into SAS dataset.
Not sure if your recommended code will be helpful. Can you please confirm?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What @Reeza proposes might or might not work for you depending on your environment. Can you please run below code and copy/paste from your SAS log to what these SAS macro variables resolve to? This will tell us what SAS version under which OS you're using.
%put &=SYSSCP;
%put &=SYSSCPL;
%put &=SYSVLONG;
Please also execute the following code and search in the log if you find "SAS/ACCESS Interface to PC Files" - just tell us; don't share the whole log with us.
proc setinit;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Patrick, thanks for looking into my request.
SYSSCP=LIN X64
SYSSCPL=Linux
SYSVLONG=9.04.01M6P110718
Proc Setinit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for this info and it's what I "feared" it might be.
SAS Enterprise Guide is your client. It's installed and runs in your client side Windows environment ("on your PC").
You connect with the SAS EG client to a SAS Server. Code you write in your SAS EG gets sent to the SAS Server for execution.
The Import Wizard is a bit special. The Import Wizard actually runs client side and uses Windows components to import .xls(x) files. It then send the output SAS table to the SAS server where you can use it for further processing. Because the wizard does client side things it's nothing you can really automate - you just can re-run it out of EG once defined but to my knowledge there is no out of the box way available to automate the process so it loops over multiple source files in your local folder.
If you want to automate reading of the .xls(x) using SAS then the source files need to be in a location accessible by your SAS Server instance - so you would have to upload them to storage/a location accessible by the SAS server.
Reading Excel files using SAS needs SAS/Access Interface to PC Files licensed. From the looks of it you've got that. Reading .xls files needs also the SAS PC file server up and running and you will know how to connect to it. Have you ever read a .xls file in your environment using SAS code (i.e. Proc Import)?
Reading a .xlsx file is much easier and that's why @Reeza suggests to first convert the files from .xls to .xlsx. The code shared would dynamically generate .vbs scripts doing such a conversion but.... this .vbs code only works under Windows and your SAS Server is under Linux.
I can think of two options right now:
Option 1
1. Batch convert all your .xls to .xlsx in your local Windows environment
- outside of SAS using some script. Could be pulled out of the code Reeza shared or you just Google something that fits your purpose.
There should be a lot out there
- as an alternative: Import all .html directly into a single .xlsx with a sheet per .html file. I've done some Googling and it looks like you could find something that does this for you. Having a single .xlsx workbook with sheets would make reading all the data out of SAS likely very simple - especially if all the data in the sheets end up to have the same structure (column headers starting on the same row and column in all sheets).
2. Upload the created .xlsx files onto your Linux server
- Using WinSCP or some other client that can do this task
3. Use SAS to read all the .xlsx in the server side folder location into SAS tables
- if you don't know how to do this then once you've got the .xlsx on the SAS server ask a new question here
Option 2
1. Using EG import one of the .xls using the import wizard
2. Create a task template
3. Use this task template to import all the other .xls
- This is still a manual process but using a task template should at least avoid that you have to go through all the wizard screens for every single .xls
"Option 3"
It's eventually still worth to try to read some proper .xls saved on the SAS using the PC File Server. If you can make this work then next step would be to try the same with one of your existing .xls - I'm not sure if that's going to work given that they are not "proper" .xls. What did you do? Have you just changed the suffix from .htm to .xls?
I believe Option 1 is likely what will get you where you want to be. ....or someone else comes up with an approach that's easier and I didn't think/know about.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Many Thanks Partick for the detailed and in-depth answer.
If I had pure .xls files, that would be very easy to handle for me using a macro to loop through all the files and upload them onto a sas dataset without even converting those to .xlsx.
Issue is these are .HTML files saved with .XLS extension. When I use proc import engine to read .XLS files, it fails and spits out error because I am asking it to read .HTML which is sitting with a cloak of .XLS. These files are not even opening straight in Excel and spit out warning regarding file layout not matching file extension due to ExtensionHardening rules in Excel.
SO I am hoping to get some help with code to read HTML file into a SAS dataset. Meanwhile i am trying my luck with VBA to write a macro to save these files as .xls/.xslx
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"If I had pure .xls files, that would be very easy to handle for me using a macro to loop through all the files and upload them onto a sas dataset without even converting those to .xlsx."
IF the .xls is on the SAS Server side ...and then you don't need to "upload" any SAS dataset as it's already there.
"Meanwhile i am trying my luck with VBA to write a macro to save these files as .xls/.xslx"
VBA doesn't feel right. Could be VB, VBS, PowerShell or even something like Python.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rezza, what code are you referring to ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If those HTML files have a consistent structure, you can write a data step that reads all files in one swoop and extracts the TABLE data contained in them.