Hi all, I could really use some help and appreciate the support in advance.
I want to point out that I am an end user and am an account manager; a large part of my job involves supporting clients w/ insightful analysis, which is why I use the Excel add-in. Frankly, our SAS admin tried getting us to use EG, but we found it difficult for non tech-savvy folks like us. The reason I am turning to the community is b/c I have to imagine there is a better solution out there that we (and our admin) might not be aware of or envisioning.
Here is the issue: we have a number of olap cubes and tables/data sets built from these cubes (I think that's how it works), which we access to do our analysis. However, since we are often on the road or working remotely, and the file is quite large, it is often not possible or extremely time consuming to connect to the db to run the analysis. Even when we are connected to the network, refreshing the data takes considerable time. Therefore, we have been saving the data to Excel and refreshing it periodically (couple times/month). However, as the data set continues to grow, not only does refreshing take more and more time, but our Excel file is becoming quite large.
There are a number of other issues we have going, but not necessary to expand on them here. What is important to know is that I am working w/ the dba to possibly build another table that will encompass only the data we need for the analysis we do; however, I believe this file will also grow to be quite large.
All that said, I am wondering if anyone out there has any suggestions as to how we can improve our processes to accomodate the needs/issues I have described?
One of the (perhaps moronic) thoughts I had was to use MS Access to store the data and then run Excel analysis from that location but I am not even sure this would help accomplish the goal and more significant, I have learned there is no SAS Access add-in.
Out of naivety (sp?), I may not have provided all of the info needed to help, or you may have specific questions I either overlooked or is foreign to me, so pls let me know if you need additional information and I will hunt it down.
I very much look forward to hearing back and am very appreciative!
Is the analysis being done in Excel or is it a SAS process? In my experience the bigger the dataset you're using the more likely it is that SAS is the best place for the analysis to happen. Excel can still be used as a destination for the results, it just might not be the best tool to crunch a million rows of data.
Stored processes could be the way to proceed if you do adopt this approach - you could then run the process on a SAS server using AMO (the Excel add-in) to kick it off.
Sorry without knowing more about what it is exactly you're doing it's hard to be more specific - I hope this helps.
Guys, thanks for getting back to me so quickly. To answer your questions...
Most of the analysis I am doing is very basic (e.g., calculating % change) and does not entail any true statistical analysis (regression testing). From what I understand, SAS is designed for true statistical work and for better or worse, this is the product that was adopted years ago. So I really don't have a need for all of the neat and fancy capabilities offered by the software, just not sure what my options are to get at the data. Any thoughts on this point?
All of the work I perform uses pivot tables, as that seems to be the best way to handle large data sets and more so, perrform the analysis quickly.
The file I am using has 1.7 million rows.
The analysis is being done in Excel.
Geoff, I am curious about using SAS to do the analysis - do you know where I might learn more about this? Would this be something I do in Enterprise Guide?
After the data is loaded into Excel, my pc is pretty good about handling it and the speed is fine. Though, I know this prob isn't a long term solution.
Also, can you explain the AMO a bit more, you may be on to something here.
Thanks again guys, you don't know how painful this is and your feedback is greatly appreciated.
I think 1.7 million rows is always going to cause problems with Excel. I'm not 100% sure of how the add-in does things differently with pivot tables but the usual situation entails a pivot cache being created as part of the file, which is probably where the time is being taken up waiting for this to refresh, and why the bloat is occurring. That's a lot of data to store in an Excel file. My thinking was to use SAS (and yes Enterprise Guide would be the place to code this) to calculate your % changes and so forth - proc freq, proc summary or proc means would be candidates to produce these sorts of numbers (there may also be a "codeless" solution, using one of the tasks built in to Enterprise Guide). The results from this would then be a much smaller rowset to transfer to Excel.
By AMO I mean the Add-in for Microsoft Office - sorry for using the TLA. If you have a stored process built in SAS, the Add-in can then call it and get results into the Excel file. Version 4.3 also has an OLAP viewer for working with OLAP cubes - this may also be an avenue to a solution, depending on the version you're using.
I'm assuming simply connecting Excel to the OLAP presents a connection problem, rather than to reports built off the OLAP? What about creating pre-run reports that are updated every week/month?
Why not use the SAS business intelligence tools? I'm assuming they can build reports where you can change things as you need.
Unfortunately I've only worked with Cognos and Microsoft Reporting Services not SAS BI, and they both handle this easily...another option could be an open source BI tool, ie Pentaho or Microstrategy.
SAS EG does have a few tools to connect to OLAP that I found quite interesting, but again, little experience with those tools. Maybe repost this question in the Business Intelligence forum?
Message was edited by: Reeza