BookmarkSubscribeRSS Feed
genetist
Calcite | Level 5

Hi to all,

is it possible to read only specific variabiles from text files into sas as sas dataset using proc import? I have very big data in my text file which contains like 1000 observations and more than 42,000 variables. I tried reading this file into sas using proc import  but i failed in doing so i thought may be because of size issues. Now i decided to read only specific variables (columns) from this big text file so that i can reduce size of file to read into sas system. is there any ideas to read like this by using data step? Suggestion or help would be appreciated

Thanking you very much,


9 REPLIES 9
LinusH
Tourmaline | Level 20

Yes you can.

How is dependant of the structure of your text file. It should be fairly easy if this is a file with fixed width fields.

There are ton of documentation/papers on how to read external data using the data step.

My guess is the reason for failure in your import is that you are exceeding the limit on number of variables in a data set (32K).

Maybe it's easier to get the file with a different structure...?

Data never sleeps
Patrick
Opal | Level 21

@genetist

This sounds like a one off task. I believe the fastest way for you to get to the desired result would be to use the import wizard in SAS Enterprise Guide (File\import data).

I've just created a text file with 42000 columns and 1000 rows and imported it using SAS EG on my SAS9.3 Win7 laptop. It took EG a while to scan all rows but else worked like a charm.

@LinusH

The 32K limitation applies only to outdated SAS versions and OS:  8213 - Understanding the maximum allowable size of SAS® data sets in the Windows, UNIX, and z/OS ope...

LinusH
Tourmaline | Level 20

Thanx Patrick, I was reading the note a bit carelessly...

Data never sleeps
CharlotteCain
Quartz | Level 8

Hi,

First off, a file that contains just 1000 rows or observations isn't regarded as a big for a product of SAS capacity. 1000 observations is hardly a drop in  the ocean considering the fact if you are using SAS version 6 or above. I handle datasets that has millions or sometimes billions of records.I guess that's one of the hallmark of SAS.

Fair enough, WIDER dataset makes sense in your question.

I read a piece of a solution by ART on my search, and here you go for reference-

I don't know what the maximum number of variables is on SAS 9.1.3 (it may
be dependent upon ram) but, on my computer at least, yes it can handle over
a million columns:

73   data have;
74     array x(1100000);
75     do i=1 to dim(x);
76       x(i)=1;
77     end;
78     output;
79   run;

NOTE: The data set WORK.HAVE has 1 observations and 1100001 variables.
NOTE: DATA statement used (Total process time):
      real time           38.40 seconds
      cpu time            31.95 seconds

However, computer froze when he attempted to raise that to 2,000,000.

So, In practice there is no need for you reduce the number of variables. If you still wanna do it without import wizard, ideally you should be using an array in my opinion. Of course, I am a newbie too, you will indeed get better answers than mine from experts here.

Thanks,

Charlotte from England

genetist
Calcite | Level 5

Thank you very much all

for your help, i can not use input  statement because my file contains 42K variables and it takes lot of time, i tried like this

  • proc import datafile="file.txt" out=mydata dbms=tab replace;
  • getnames=yes;
  • run;

then my system is freezing, my system config is 120gb HDD, 2gb RAM. can you please correct where i am doing wrong?

Thanks

LinusH
Tourmaline | Level 20

Well, you could experiment by adding a keep= ds option to you output table.

Your statement "I can not use input statement" seems a bit too absolute.

If you know the variable order in your file, you could use the automatic variable _input_, and the use the scan() function to assign your variables rather than specifying them on an input statement.

Data never sleeps
Patrick
Opal | Level 21

I strongly assume that the issue you encounter occurs because the record length of your text file exceeds the maximum input buffer of SAS which is 32767 bytes ("characters").

SAS can read longer records reading a file as a byte stream (recfm=n) but Proc Import can't deal with it.

I've done some testing with Proc Import. It didn't freeze in my environment when trying to read 42K columns - it just took very long and the input statement generated by Proc Import is "garbadge". I couldn't find documentation which clearly states a 32767 limit for Proc Import when reading a text file - but it appears such a limitation exists.

Is the source text file available to you via your workstation (meaning: you can access it without SAS)? If so and this is a one off task then why can't you use the EG import wizard? That works. I've tested it.

If this is not an option for you because it's a repeated task or you don't have SAS EG then you will need to write code "mimiking" a Proc Import which can deal with long records. It's may be worth searching the Internet first as someone might already have solved a similar problem and "published" the code.

genetist
Calcite | Level 5

Dear all,

Thank you very much for all your suggestions in solving my problem, Because of my urgent need i solved this problem by using PERL and in near future i should try to solve these in SAS but I appreciate all your suggestions.

Once again thanks to all

Genetist

Tom
Super User Tom
Super User

You should probably convert your data structure to make it more workable.  Handling 42K variables is probably not very workable.  But one class variable with 42K values is.  If the variables are consistently of the same type (character strings or numbers) then you can convert it automatically.

data want ;

  infile cards dsd dlm='|' truncover;

  array names (100) $32 _temporary_;

  length col value 8 name $32 ;

  if _n_=1 then do;

    do nvars=1 by 1 until (name=' ');

      input name @;

      if name ne ' ' then names(nvars)=name;

    end;

    nvars+-1;

    input;

  end;

  do col=1 to nvars;

    input value @;

    name = names(col);

    output;

  end;

cards;

a|b|c|d

1|2|3|4

2|3|4|5

3|4|5|6

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 7967 views
  • 0 likes
  • 5 in conversation