06-06-2013 01:49 AM
Hi to all,
is it possible to read only specific variabiles from text files into sas as sas dataset using proc import? I have very big data in my text file which contains like 1000 observations and more than 42,000 variables. I tried reading this file into sas using proc import but i failed in doing so i thought may be because of size issues. Now i decided to read only specific variables (columns) from this big text file so that i can reduce size of file to read into sas system. is there any ideas to read like this by using data step? Suggestion or help would be appreciated
Thanking you very much,
06-06-2013 06:25 AM
Yes you can.
How is dependant of the structure of your text file. It should be fairly easy if this is a file with fixed width fields.
There are ton of documentation/papers on how to read external data using the data step.
My guess is the reason for failure in your import is that you are exceeding the limit on number of variables in a data set (32K).
Maybe it's easier to get the file with a different structure...?
06-06-2013 07:15 AM
This sounds like a one off task. I believe the fastest way for you to get to the desired result would be to use the import wizard in SAS Enterprise Guide (File\import data).
I've just created a text file with 42000 columns and 1000 rows and imported it using SAS EG on my SAS9.3 Win7 laptop. It took EG a while to scan all rows but else worked like a charm.
The 32K limitation applies only to outdated SAS versions and OS: 8213 - Understanding the maximum allowable size of SAS® data sets in the Windows, UNIX, and z/OS ope...
06-06-2013 08:50 AM
First off, a file that contains just 1000 rows or observations isn't regarded as a big for a product of SAS capacity. 1000 observations is hardly a drop in the ocean considering the fact if you are using SAS version 6 or above. I handle datasets that has millions or sometimes billions of records.I guess that's one of the hallmark of SAS.
Fair enough, WIDER dataset makes sense in your question.
I read a piece of a solution by ART on my search, and here you go for reference-
I don't know what the maximum number of variables is on SAS 9.1.3 (it may
73 data have;
74 array x(1100000);
75 do i=1 to dim(x);
NOTE: The data set WORK.HAVE has 1 observations and 1100001 variables.
NOTE: DATA statement used (Total process time):
real time 38.40 seconds
cpu time 31.95 seconds
However, computer froze when he attempted to raise that to 2,000,000.
So, In practice there is no need for you reduce the number of variables. If you still wanna do it without import wizard, ideally you should be using an array in my opinion. Of course, I am a newbie too, you will indeed get better answers than mine from experts here.
Charlotte from England
06-07-2013 03:30 AM
Thank you very much all
for your help, i can not use input statement because my file contains 42K variables and it takes lot of time, i tried like this
then my system is freezing, my system config is 120gb HDD, 2gb RAM. can you please correct where i am doing wrong?
06-07-2013 05:19 AM
Well, you could experiment by adding a keep= ds option to you output table.
Your statement "I can not use input statement" seems a bit too absolute.
If you know the variable order in your file, you could use the automatic variable _input_, and the use the scan() function to assign your variables rather than specifying them on an input statement.
06-08-2013 10:06 PM
I strongly assume that the issue you encounter occurs because the record length of your text file exceeds the maximum input buffer of SAS which is 32767 bytes ("characters").
SAS can read longer records reading a file as a byte stream (recfm=n) but Proc Import can't deal with it.
I've done some testing with Proc Import. It didn't freeze in my environment when trying to read 42K columns - it just took very long and the input statement generated by Proc Import is "garbadge". I couldn't find documentation which clearly states a 32767 limit for Proc Import when reading a text file - but it appears such a limitation exists.
Is the source text file available to you via your workstation (meaning: you can access it without SAS)? If so and this is a one off task then why can't you use the EG import wizard? That works. I've tested it.
If this is not an option for you because it's a repeated task or you don't have SAS EG then you will need to write code "mimiking" a Proc Import which can deal with long records. It's may be worth searching the Internet first as someone might already have solved a similar problem and "published" the code.
06-09-2013 06:46 AM
Thank you very much for all your suggestions in solving my problem, Because of my urgent need i solved this problem by using PERL and in near future i should try to solve these in SAS but I appreciate all your suggestions.
Once again thanks to all
06-09-2013 12:30 PM
You should probably convert your data structure to make it more workable. Handling 42K variables is probably not very workable. But one class variable with 42K values is. If the variables are consistently of the same type (character strings or numbers) then you can convert it automatically.
data want ;
infile cards dsd dlm='|' truncover;
array names (100) $32 _temporary_;
length col value 8 name $32 ;
if _n_=1 then do;
do nvars=1 by 1 until (name=' ');
input name @;
if name ne ' ' then names(nvars)=name;
do col=1 to nvars;
input value @;
name = names(col);