11-08-2017 03:14 AM
I have a problem with importing some 30 different excel files which are password protected and are in a sharepoint into SAS EG.
Request you to kindly share your ideas in dealing with the situation.
11-08-2017 04:21 AM
forgot to mention that all those files are XLSB and are having different passwords.
11-08-2017 04:23 AM
What a mess. Excel is the worst possible data storage format and the password protection is a joke.
Sharepoint is not particularly secure or a good platform to share.
I suppose its totally impossible for you to just get "data" in a data format yes?
All I can suggest is that for each file you download to a secure working location, unpack the Excel file and then import each. Make sure you bill whoever came up with that setup with your full working time. Perhaps next time they will have a more controlled approach rather than dump files on sharepoint.
11-08-2017 05:33 AM
I completely agree with u RW9, in addition, the data in sheets (in each of the excel ) have merged cells. My head is exploding.
downloading the files and unlocking each of them is taking hell lot of a time. even if I do that and dev the code and handit over to the concerned team, they shd be able to do the same every time (which does not fall under "process Automation")..
Any suggestions are welcomed!!
11-08-2017 05:57 AM
Two things which may help. First you can create a network path to Sharepoint sites. Ask your IT group for the network mapping. This helps as you can then refer to the network map in code.
Secondly, you can use VBA in Excel to perform a dump to CSV - create an empty Excel file and put the passwords and filenames in that one file, and then create within that a VBA macro which loops through each file in the list. The macro will open the file, and for each sheet save to CSV. So the macro will load and drop out CSV data for everything in the Excel workbooks.
Your SAS code then is a datastep import, that should not need any changes, and if the Excel files change - which they will - then the import will fail and it will need re-programming, so you have reasons to go back and bill more.
11-08-2017 11:20 PM
actually am not good at VBA coz I havent done that before. trying to learn whenever I get free time.
11-08-2017 03:51 PM
I am a fan of Excel and SharePoint so count me as the oddball. I have to work with both of them all of the time.
How would I approach it? I would write a C# program to read SharePoint and then loop the Excel files using VSTO (or Gembox but you have to pay for that).
This is really not a very difficult issue but somewhat cumbersome.
Skip VBA. Not worth the effort IMO and a poor language. Do some simple C# and you should be in good shape. If you need some starter code, let me know and I can whip something up to at least show you the direction.
What version of SharePoint are you using? That matters a lot.You can get at the SP site using REST services and do it that way too.
11-08-2017 04:40 PM
I just found out about this:
Gembox or Aspose are best ways to deal with Office but both cost (Aspose costs a lot).
11-08-2017 11:18 PM
You definitely made it sound simple, but I havent heard most of the words that you have used
All I have infront of me is EG.
One idea I have in my mind is, --> to create a new excel file and then link all the (sources excel files in to this new file with excel formulae.
this would be a one time manual thing and if we link for one file we can copy paste the formula for the rest.
Once this is done, I willl import this file to SAS and start with the actual work. so, whenever they place new files in sharepoint, I just need to refresh the formuale in my file..
Does this sound good enough??
11-08-2017 11:48 PM
11-09-2017 12:03 AM
Yeah, it isn't too bad even if some of the stuff sounds like foreign speak. Let's start with some basics. If you learn how to fish this first time, you won't have to worry next time. This is a bit of a learning curve but not too bad.
I really need to understand what version of SP you are on. That helps a lot.
REST is the way the internet works. Everything in SP will be accessible from a REST service. Imagine a website address that returns data vs a UI. That is all REST is. I Bing'd it and here is the first one I saw that quickly illustrates REST:
Ok, that is REST. Using a REST interface (simple url you type into a browser) you can get the worksheets. Once you get the SP info, I can help you formulate the query. Here is info on the SP REST services:
Next, let's talk C# and VSTO. Download Visual Studio Community Edition. It is free.
Do a Hello World C# variant. It is simple and there are loads of examples (search YouTube). Just get used to running some code. Once that is done, get VSTO or Gembox:
Now you are prepared to work with Office. Not just Excel, but all of Office.
All of the above should take around 1 hour. An hour that will change how you work, BTW. 'If' you are ok with spending a few dollars, I would highly, highly recommend Gembox vs VSTO:
30 day eval is free. If you go that route, I can give you some code and we will be rocking very quickly. Once this pops in your head (HelloWorld, plus a quick example in Gembox), lights will come on. With Gembox (or VSTO) you will quickly realize the power.
Why do I recommend against VBA and that route? VBA kind of works but it is deprecated technology. New development in it is not recommended. Gembox is 1000x faster and that is not an exaggeration. Don't do VBA because it buys you nothing to learn it. It may work, for now, but its future is sealed. Grit your teeth and learn how to do this properly. It will pay huge rewards.
Don't do linking, BTW. It is frowned upon or banned by IT shops.
Finally, why learn this stuff vs using Base SAS? Easy. This will now allow you to create EG add-ins and extend the power of EG. Keep in mind, EG is written in C#. A little knowledge will now become critical in a cool journey.
11-09-2017 04:53 AM
I would have to disagree with you here @AlanC. I have mentioned VBA above, but this is only as a last ditch attempt to try to fix a very badly damaged process in the first place, and that VBA will already be present as Office is installed. Whilst C# is great, I would never recommend that should go down the route of installing new software (for which they may/may not have the rights to do), but not just that there is a level of support that needs to be granted to this new software, not to mention training to staff, and documentation for handovers and future users. Integrating new software is quite a large undertaking.
Now to return to the problem specifically. The problem at its core is the use of Excel - which is a known very poor data medium. So the focus should be on removing that block in the process. To do this would need some further knowledge of the process e.g.:
Why was the data in this format in the first place
Is there a true source of the data
Will the data be updated ongoing
And possibly is there any structure at all in these files
From this it can be deduced whether the Excel part can be skipped (i.e. go back to source), whether there will be an ongoing process (i.e. one migration, or consistent work forseable), and if any similarities can reduce the complexity in the files.
We currently know very little of this. My personal approach assuming this is a one time affair, and there is no true source to return to would be:
Create a migration plan. Detail within what your process will be. Show all the related files, where they are located, where the processing will take place, and how you will validate conversion.
Next copy all the files out of SP to your working area, then individually process each file - be this:
manually opening and saving to xlsx without password - then you can use libname excel to read the Excel file (although you may still get garbage data if the data is not structured well, and it will be guessing what the data is)
manually opening, processing into a usable format, saving to CSV - then read in using a datastep
You could automate some of the above.
With the above you then have an approach which is documented (important) and repeatable. You can also use this process as a means to justify billing the requester for time and labor on the task, and if it needs to be done in future have a good estimate of resource needed.
11-10-2017 03:42 AM
11-10-2017 10:44 AM
I don't see how you do this w/o going to an outside tool. You can look at PROC HTTP for the REST services but it is hard to use due to the parsing of the responses. SAS also has a JSON engine that might help.
Ultimately, you have to decide on a path that is maintainable. What I provided is considered very standard for dealing with SP, Excel, Tableau, etc. You are using Microsoft technologies to manage Microsoft technologies.
Excel can also bring in SP data. You could look at doing that. Look at InfoPath as well. Again, you may not have it installed but it is part of Office normally. It is not directly what you need but it can define business flows for you and that is what you are looking for. The integration of SP and Excel behind the scenes, InfoPath handles.