Learning SAS? Welcome to the exclusive online community for all SAS learners.

execution of large data

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

execution of large data

Hi

I am using SAS University Edition , on HP Pavilion dv6 , 4gb ram , core i5 , windows 7.

(1) I want to know the approximate execution time to import (read) data from an excel file which has around 68,000 obs and 60 columns.

As soon as I run the simple code for importing the data using PROC IMPORT , the program runs for about 30 minutes , after that it shows a message -An exception was thrown while waiting for a reply from the peer.

Is there any solution for this? Is the data too large for SAS to handle (limitations)? Any help would be appreciated.I

(2) I created a new library using the following code:

     libname newdata '/folders/myfolders/';

     after executing , I can see 5 libraries in the libraries section (4 default & 1 which I created) .Now when I click on Assign libraries it doesn't show any library name .The problem is that when I open SAS next time (after exiting properly) , I can't see my library in the libraries section (only 4 default libraries are listed)


Accepted Solutions
Solution
‎05-21-2015 08:46 AM
SAS Super FREQ
Posts: 388

Re: execution of large data

Posted in reply to Abhinav_Piplani

Hi, Abhinav!

1) You might try increasing the amount of memory allocated to your VM (in the VMware or VirtualBox settings) and see if that makes a difference. I believe it's 1GB by default, so you might bump it up to 2GB and see if your code executes.

2) I would suggest you put your libname statement in your SAS Autoexec file. If you go to the top right of the SAS Studio window, right beside the "?" icon, you can edit that file and then when you login to SAS Studio, that code will be executed and your libname will be all set!

Hope this helps!

Randy

View solution in original post


All Replies
Solution
‎05-21-2015 08:46 AM
SAS Super FREQ
Posts: 388

Re: execution of large data

Posted in reply to Abhinav_Piplani

Hi, Abhinav!

1) You might try increasing the amount of memory allocated to your VM (in the VMware or VirtualBox settings) and see if that makes a difference. I believe it's 1GB by default, so you might bump it up to 2GB and see if your code executes.

2) I would suggest you put your libname statement in your SAS Autoexec file. If you go to the top right of the SAS Studio window, right beside the "?" icon, you can edit that file and then when you login to SAS Studio, that code will be executed and your libname will be all set!

Hope this helps!

Randy

Occasional Contributor
Posts: 5

Re: execution of large data

Posted in reply to RandyMullis

Hi Randy !

Thanks for the quick reply. The second issue is resolved by your solution. Regarding the first problem , I increased my virtual memory allocation by 1 GB ( it is now 2 GB) and have also ensured that warning message  appears if output is greater than 8MB (in preferences) , earlier it was 4MB. But still I get the following error :

The size of the results is greater than 8MB. Insufficient memory could cause performance problems or an error. Do you want to display the results anyway?


Should I choose "Display Anyway" or  "Don't Display"?

Should I convert the excel file to csv as suggested by other people?


Abhinav

Super User
Super User
Posts: 7,973

Re: execution of large data

Posted in reply to Abhinav_Piplani

So you have 68,000 * 60 columns and your using Excel for this?  Excel really isn't a data capture/base/entry/transport format.  I would suggest that you save the Excel file into a comma delimited format (CSV), then write a datastep import for it. 

One of the reasons it will take a long time is that Excel is stored as a related XML structure called Open Office.  Each read write has to address the structure of the file to find the data.  CSV on the other hand is very basic flat file straight read. 

Again, seriously, don't use Excel for anything other than playing around with pivot tables.

Occasional Contributor
Posts: 5

Re: execution of large data

thanks for the info

Contributor
Posts: 22

Re: execution of large data

Posted in reply to Abhinav_Piplani

There seems to be a variety opinions on the number of records that the free version of SAS Studio can process. In a test, I successfully loaded 17.5 million records with 78 variables in a CSV format into a SAS dataset in under 9 minutes.

Super User
Posts: 7,824

Re: execution of large data

Posted in reply to Abhinav_Piplani

Since SAS has a database-style file format, it is best to use a data-only format for data transfers, like .csv. You don't need all the formatting information from Excel, which is contained in the Excel file and needs to be parsed.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
SAS Super FREQ
Posts: 388

Re: execution of large data

Posted in reply to Abhinav_Piplani

I second both what Restonian and Kurt said!

Super User
Posts: 19,822

Re: execution of large data

Posted in reply to Abhinav_Piplani

I imported a 68000 obs, 62 variable file from Excel in under 30 secs in SAS UE.

Post your import code.

You can also try the code below and see how long it takes for you:

data have;

array vars(60) var1-var60;

do obs=1 to 68000;

do i=1 to 60;

vars(i)=floor(rand('uniform')*100+1);

end;

output;

end;

run;

/**Export an XLSX file.  **/

PROC Export FILE="/folders/myfolders/sample.xlsx"

     Data=have

     DBMS=XLSX

     REPLACE;

RUN;

/** Import an XLSX file.  **/

PROC IMPORT DATAFILE="/folders/myfolders/sample.xlsx"

     OUT=WORK.MYEXCEL

     DBMS=XLSX

     REPLACE;

RUN;

/** Print the results. **/

PROC PRINT DATA=WORK.MYEXCEL (obs=100); RUN;

Occasional Contributor
Posts: 5

Re: execution of large data

Did your code print the data for 68000 obs in 30 sec ?

my code:

proc import datafile="/folders/myfolders/Advance.xlsx"

out=newdata.file1

dbms=xlsx

replace;

sheet='Data5';

run;

*printing results;

proc print data=newdata.file1;

run;

Super User
Posts: 19,822

Re: execution of large data

Posted in reply to Abhinav_Piplani

I didn't print the whole file, only 100 obs, that would be a big HTML output. Trying to print it may be the cause of the error. 

If you want to navigate the results - open the data set but I can't imagine you can visually review 68000*60 data points with any sort of accuracy.

SAS Super FREQ
Posts: 388

Re: execution of large data

Posted in reply to Abhinav_Piplani

Yes, I think you'll find that the CSV approach is the way to go for the reasons the other folks have stated.  The Excel file has a lot of metadata specific to that application that you don't need.

Occasional Contributor
Posts: 5

Re: execution of large data

Posted in reply to RandyMullis

ok , I will convert the file into CSV and let you know if I encounter another problem. Thanks for your help. I have one more small issue , whenever I start my SAS Studio , (after clicking start session in Google Chrome) , a message appears at the beginning of the session which says :

Null <with a cross sign)>

<ignore the 3 stamps>

Snap3.png

After this I am able to work normally, till now this hasn't affected my work. What do you think is the reason for this & how should I rectify it ?

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 1332 views
  • 11 likes
  • 6 in conversation