11-10-2014 09:15 AM
Hello, I am currently using SAS EG6.2 to manipulate/analyze the data of multiple input files (from excel). The resulting dataset is about 1.2 millon rows. I need to export that Excel 2010 for others to use for their purposes, such as creating pivot tables. I use the EG function to export or export in the step, but it takes over a hour. And sometimes it results in an error, which I think may be due to the fact that it's over 1 millon rows. 1. What is the most efficient way to export such a large dataset? 2. I currently export as an excel (.xlsx) file. Is it better to export as .txt, .csv or some other extension? Are there any differences in the ability to use the file for pivot tables when it's .txt, .csv or .xlsx? Thank you!
11-10-2014 09:28 AM
Using Excel to do transformations of a 1.2 million record dataset when you have SAS looks quite foolish to me. The shortcomings of Excel as a BI tool have been sufficiently discussed around here
Excel 2010 has a limit of 1M rows, 1048576, so you fail there. Regardless of file format.
You can reduce the time for the transfer by reducing your row size (trim character vars, drop unused columns).
But keep in mind that Excel is not a tool for tables, but for _spreadsheets_. Those sheets should rarely expand beyond your screen size, because they tend to be uncontrollable.
11-10-2014 11:29 AM
If your site has SAS OLAP Cube and SAS MS Add-in license, then it becomes straightforward in term of making/exporting Excel pivot tables, if your users still want to use Pivot tables instead of OLAP cubes.
11-10-2014 12:28 PM
I agree with all the comments above - it's preaching to the choir. It's an big uphill battle for me to change the mindset of a large corporation to stop using excel for reporting. As reeza stated in one of her posting...excel is a spreadsheet calculator. Majority of the people do not know SAS and I am the dept sole user. They run a lot of adhoc reports and I cannot keep up with those and having to constantly change my codes to fit the needs. So people said to give them the raw data....I am doing my best with the situation. I am generated required summaries but then this person wants this, this other person want that....and back to where I started from. I posted the original post to see if anyone have an insight into exporting a large dataset - that's all.
11-10-2014 01:57 PM
If your audience is looking for %, this data are big enough that a sample might be sufficient. You would need to do some education as to the limitations of a sample, but it is likely sufficient to support decisions.
11-10-2014 03:11 PM
There are still a lot of these Excel shops around and they all have the same problems. The overall costs (work) are quite significant but don't show as a single line item in a balance sheet, data quality is questionable and different departments sometimes produce different numbers and then spend a lot of time investigating and explaining the differences,....
On the other hand your users need to do some data analysis and "play" with the data so they can't give you clear specs for reports. And as their requirements and questions change frequently they also need constantly different data extracts from you. Excel is for them the only tool available to play with data - so that's what they are used to and use.
Using SAS tools I can think of two ways to better the situation:
- SAS Add In for MS Office (SAS AMO). This would allow you to create a bunch of stored processes which users then can call, apply some filters and then get the data directly into Excel by themselves or also use some of SAS' reporting and analytic functions directly.
- SAS Visual Analytics (SAS VA): If there is an opportunity to create a set of granular de-normalised analytical tables which could serve as basis for the majority of data requirements, then using VA would be ideal. I've seen business users with zero SAS knowledge and no programming experience learning in two days of how to retrieve and analyse data and how to build reports. I was pretty impressed by this and it was the closest I've seen so far in "self serve BI".
11-10-2014 03:34 PM
If your users are Excel based and you're the sole SAS user sometimes you have to work with that, even though it sucks.
I'd export the data to an Access DB and then have them connect to the data via DATA>External Data Sources. I think that will still work if the number of rows is over 1 million, but I'm not in an environment to test it at the moment.
11-11-2014 03:21 AM
You have flexible in SAS.
If you meant "simple enough that any monkey can produce shiny reports", you need Excel. But don't expect the results to be correct. Or rise up to the scrutiny of compliance inspectors w/regards to ISO standards, SOX, BASEL II, ....
11-11-2014 04:01 AM
Nothing wrong about Excel as such. It's only about finding the right balance and mix for the tasks at hand and for data aggregation, analysis & reporting on 1M+ source rows Excel is eventually not the ideal tool.
I'm very much for empowering end users and believe that's exactly what SAS VA in the SAS world does when it comes to bigger data volumes which Excel can't handle anymore.
11-11-2014 04:41 AM
Teach a man to sharpen a pitchfork and he will ...
Fact of the matter is, empowering end user is all well and good until such time as something goes wrong. E.g. someone high up asking why there are mistakes, figures inaccurately shown, global financial crisis? In my role, as I am sure in most others, we are strongly controlled by procedure and validation. Its hard, explaining the concept of validation to someone new, what can we validate on importing a spreadsheet was one question recently. From my side everything, down to the last individual bit should be checked and double checked (independently in most cases) with full documentation, otherwise as a well know statement goes, it didn't happen.
VA is SAS's new Excel-like which was demonstrated at Phuse this year. It all looks well and dandy but still falls into the category of putting information in the hands of those not following the same strict processes. I would expect after a few months of working with it you will get an urgent question back on some documents which have been supplied to someone asking for some changes and when you look at it, you will not recognize the data, output etc. and then spend the next 6 months trying to re-engineer it only to find half a dozen errors in it that you then need to try to explain.
I personally would either:
a) dump the data to CSV and mention the limitations of Excel.
b) Request specs for outputs. If they don't know what they want then how can they achieve it.
11-11-2014 04:50 PM
If anything then VA in it's core is a new way of OLAP (as a process not a storage concept - and with newer VA versions it's OLAP++). VA allows a lot of flexibility to the end user but unlike in Excel you can't do "anything you want".
Everything is calculated "on-the-fly" so there are no 10 steps with intermediary results and manual changes to data like possible in Excel but the same base data will be used. This also makes results repeatable.
11-11-2014 09:09 PM
Thanks for the response. Re: your option B...they don't know what they want until they decide it's needed. :-) I am in situation like many others...and I strongly dislike excel for number crunching, analysis, etc...