import in sas 9.2 easily the files : txt , xls, xlsx file

Accepted Solution Solved
Reply
Super Contributor
Posts: 371
Accepted Solution

import in sas 9.2 easily the files : txt , xls, xlsx file

 

Hello experts,

 

I want to import a files (.txt, .xls, xlsx) into sas 9.2 without taking care about the "delimiter" ,  "textsize" or guessingrow more than  32767

I do not want to explore file, the program will be avalaible for all files( .txt,xls, xlsx)

 

I want to do this :

 

Step 1

 

%let file=V1/How_ever_theFile.txt   / * or .xls, .xlsx  */

%import(file=&file);

 

 

 

 


Accepted Solutions
Solution
‎09-25-2016 04:07 AM
Super User
Super User
Posts: 7,039

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

I am not big fan of PROC IMPORT because it has to guess about what your varaibles are. I find that I can write the data step to read the file almost faster than I can write the code to run PROC IMPORT.  And then I know what it is doing.

 

PROC IMPORT will NOT guess the delimiter for you.  You need to tell it what delimiter to use.

 

It is not hard for a human to find the delimiter.  If you have command prompt just type out the first few lines and look at them.  If you can't do that then just ask SAS to do it.

 

data _null_;
  infile "&file" obs=5 ;
  input;
  list;
run;

If you want to make a program do that then I guess you could try testing a set of common delimiters and see if they make sense. Probably the one that finds the most words is the right one.

data _null_;
  infile "&file" obs=1;
  input ;
  do dlm=',','09'x,'|',';',':' ;
    nwords = countw(_infile_,dlm,'Q');
    put dlm= $hex2. ' ' nwords=;
  end;
run;

View solution in original post


All Replies
Super User
Posts: 19,789

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Write a macro,that will handle all your cases. 

Super Contributor
Posts: 371

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

[ Edited ]

Yes, with sas documentation, we can program the cases possible.

But how can  we identify the "delimiter" without opening the file ? and making more performance guessingrow , even geater than 32767 ?. I know that, I can imoprt automatcally, then use log file , to identify the "delimiter", but I do not want to use this method, it can be very hard

 

Super User
Posts: 19,789

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

AFAIk there isn't a way to detect the delimiter that's going to fall into the 'easy' category. 

 

You can use input/infile to read an entire row regardless of delimiter and try and parse it out. 

 

There res are entire courses on how to read data in R/Python. I'm not sure why you're trying to do this, it seems unrealistic and you're not going to gain much. In fact the time spent developing this will probably never be gained back. 

 

Anyways, good luck with your macro. 

Super Contributor
Posts: 371

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Thank you for your answer, reeza.

Really ? t is very realistic by using sas 9.2, if you have so many files from differents sources, with a differents "delimiter" and you must have to use only sas, the output of the import, will be used in so many sas applications.. we can use "sas log" or checking the external files one by one...but this action will take much time ...that's why , an automatic solution can be better ..

Super User
Posts: 19,789

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

I'm think of it this way - Proc Import has been developed by programmers dedicated to this task. They've decided that providing the minimum of a delimiter and file type will provide the best result. It still usually results in issues, there are questions on here that illustrate this. 

 

I dont doubt that things can be improved - they always can. I just question the use of time. If my staff had time to work in this they don't have enough work Smiley Wink

 

Dealing with the different file types xlsx, Xls, and txt files is worth it. Determining the delimiter doesn't seem worth it. My opinion and you can obviously spend time on whatever you're interested in. You can't program your way out of bad processes. 

 

Here's a line from a real text file, The number of genres at the end can vary. 

This is the Movielens dataset if your interested in a file to test your process with. 

 

12:Smiley Very Happyracula: Dead and Loving It (1995)::Comedy|Horror

 

 

 

 

 

Super User
Posts: 19,789

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Also, 9.2 is pretty old, but a I don't think there's been significant changes to dealing with text files, mostly Excel files. 

Super User
Super User
Posts: 7,039

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

What have you tried?  How did it not meet your needs?

 

Super Contributor
Posts: 371

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

[ Edited ]

Hello TOM

Thank you for your interst

What have you tried?  

 

I have tried two methods:

First one:

To identify the "delimiter" in the file , and running the proc import

 

the second one: running the import, and using the "the log file" to identify the "delimiter"

 

How did it not meet your needs?

using the last methods for a so many  and many files ...il will take a much time.

That's why , I am looking for an automatic and quick program

I

Super User
Posts: 19,789

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Post the code you've tried and your logic for determining a delimiter. 

Solution
‎09-25-2016 04:07 AM
Super User
Super User
Posts: 7,039

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

I am not big fan of PROC IMPORT because it has to guess about what your varaibles are. I find that I can write the data step to read the file almost faster than I can write the code to run PROC IMPORT.  And then I know what it is doing.

 

PROC IMPORT will NOT guess the delimiter for you.  You need to tell it what delimiter to use.

 

It is not hard for a human to find the delimiter.  If you have command prompt just type out the first few lines and look at them.  If you can't do that then just ask SAS to do it.

 

data _null_;
  infile "&file" obs=5 ;
  input;
  list;
run;

If you want to make a program do that then I guess you could try testing a set of common delimiters and see if they make sense. Probably the one that finds the most words is the right one.

data _null_;
  infile "&file" obs=1;
  input ;
  do dlm=',','09'x,'|',';',':' ;
    nwords = countw(_infile_,dlm,'Q');
    put dlm= $hex2. ' ' nwords=;
  end;
run;
Super Contributor
Posts: 371

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

[ Edited ]

@Tom thank you  very much for your answer, it's  a geat answer.

You said "I find that I can write the data step to read the file almost faster than I can write the code to run PROC IMPORT"

I am not cloded on "proc import", I am open for an other solution, please can you give more detail  about your solution? 

In your solution, you do not need to explore the external file( vaiables: type, length ) ?

 

Thank you again

Super User
Super User
Posts: 7,039

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

[ Edited ]

If you have no idea what is in your data then let PROC IMPORT try to figure it out.

 

But the problem with that is that if you are receiving multiple versions of data in the same format than PROC IMPORT will make different decisions about lengths of character variables (or even the whether the variable is numeric or character) based on what records happen to be in the version it is currently looking at. Then when you try to combine the datasets later you get problems. 

 

It is better to know what is in your data and write the data step to read it.  Then the structure will not change.

 

If you did want to implement your own code that might apply different rules for guessing what is in the data then start by reading the data into a simple structure and analyzing what you find.  You could just read everything into character variables. So this will read in upto 100 columns into character variables.

 

data raw ;
  infile "&file" dsd truncover lrecl=100000 ;
  length x1-x100 $500 ;
  input x1-x100;
run;

Or you could read it into an even simplier structure that has ROW, COL and VALUE.

data raw ;
  infile "&file" dsd truncover lrecl=100000 column=ccol length=llen;
  length row col 8 value $500 ;
  row+1;
  do col=1 by 1 until (ccol > llen);
    input value @ ;
    output;
  end;
run;

 

Super Contributor
Posts: 371

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

@Tom: Thank you very much, that's very kind from you.

as you said "start by reading the data into a simple structure and analyzing what you find"

The problem is that, a visual analysis for what I find, it will take much time, if I have more and more  of files.

An other attention case, we can find: if the vaiable(V) is mixed : charcter and numeric.

V

1

2

..

6  /* line 32767*/

A1 /* line 32768*/

A2 /* line 32769*/

..

8  /* line 120000*/

 

 

Super User
Posts: 19,789

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

That's variable type. Determined the delimiter should be straightforward visually. 

 

Problem is a processor or will make the same mistake as you've illustrated. I just had a 5 column file where it wasn't reading in all of the text. 

 

Also, hoe do you plan to analyze information your unfamiliar with?

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 19 replies
  • 908 views
  • 6 likes
  • 3 in conversation