Solved: import in sas 9.2 easily the files : txt , xls, xlsx file

LineMoon · Posted 09-24-2016 07:33 AM

Hello experts,

I want to import a files (.txt, .xls, xlsx) into sas 9.2 without taking care about the "delimiter" , "textsize" or guessingrow more than 32767

I do not want to explore file, the program will be avalaible for all files( .txt,xls, xlsx)

I want to do this :

Step 1

%let file=V1/How_ever_theFile.txt / * or .xls, .xlsx */

%import(file=&file);

Tom · Posted 09-24-2016 09:16 PM

I am not big fan of PROC IMPORT because it has to guess about what your varaibles are. I find that I can write the data step to read the file almost faster than I can write the code to run PROC IMPORT. And then I know what it is doing.

PROC IMPORT will NOT guess the delimiter for you. You need to tell it what delimiter to use.

It is not hard for a human to find the delimiter. If you have command prompt just type out the first few lines and look at them. If you can't do that then just ask SAS to do it.

data _null_;
  infile "&file" obs=5 ;
  input;
  list;
run;

If you want to make a program do that then I guess you could try testing a set of common delimiters and see if they make sense. Probably the one that finds the most words is the right one.

data _null_;
  infile "&file" obs=1;
  input ;
  do dlm=',','09'x,'|',';',':' ;
    nwords = countw(_infile_,dlm,'Q');
    put dlm= $hex2. ' ' nwords=;
  end;
run;

View solution in original post

Reeza · Posted 09-24-2016 08:43 AM

Write a macro,that will handle all your cases.

LineMoon · Posted 09-24-2016 10:28 AM

Yes, with sas documentation, we can program the cases possible.

But how can we identify the "delimiter" without opening the file ? and making more performance guessingrow , even geater than 32767 ?. I know that, I can imoprt automatcally, then use log file , to identify the "delimiter", but I do not want to use this method, it can be very hard

Reeza · Posted 09-24-2016 12:37 PM

AFAIk there isn't a way to detect the delimiter that's going to fall into the 'easy' category.

You can use input/infile to read an entire row regardless of delimiter and try and parse it out.

There res are entire courses on how to read data in R/Python. I'm not sure why you're trying to do this, it seems unrealistic and you're not going to gain much. In fact the time spent developing this will probably never be gained back.

Anyways, good luck with your macro.

LineMoon · Posted 09-24-2016 03:18 PM

Thank you for your answer, reeza.

Really ? t is very realistic by using sas 9.2, if you have so many files from differents sources, with a differents "delimiter" and you must have to use only sas, the output of the import, will be used in so many sas applications.. we can use "sas log" or checking the external files one by one...but this action will take much time ...that's why , an automatic solution can be better ..

Reeza · Posted 09-24-2016 03:59 PM

I'm think of it this way - Proc Import has been developed by programmers dedicated to this task. They've decided that providing the minimum of a delimiter and file type will provide the best result. It still usually results in issues, there are questions on here that illustrate this.

I dont doubt that things can be improved - they always can. I just question the use of time. If my staff had time to work in this they don't have enough work 😉

Dealing with the different file types xlsx, Xls, and txt files is worth it. Determining the delimiter doesn't seem worth it. My opinion and you can obviously spend time on whatever you're interested in. You can't program your way out of bad processes.

Here's a line from a real text file, The number of genres at the end can vary.

This is the Movielens dataset if your interested in a file to test your process with.

12::Dracula: Dead and Loving It (1995)::Comedy|Horror

Reeza · Posted 09-24-2016 04:00 PM

Also, 9.2 is pretty old, but a I don't think there's been significant changes to dealing with text files, mostly Excel files.

Tom · Posted 09-24-2016 04:32 PM

What have you tried? How did it not meet your needs?

LineMoon · Posted 09-24-2016 04:49 PM

Hello TOM

Thank you for your interst

What have you tried?

I have tried two methods:

First one:

To identify the "delimiter" in the file , and running the proc import

the second one: running the import, and using the "the log file" to identify the "delimiter"

How did it not meet your needs?

using the last methods for a so many and many files ...il will take a much time.

That's why , I am looking for an automatic and quick program

I

Reeza · Posted 09-24-2016 07:15 PM

Post the code you've tried and your logic for determining a delimiter.

Tom · Posted 09-24-2016 09:16 PM

I am not big fan of PROC IMPORT because it has to guess about what your varaibles are. I find that I can write the data step to read the file almost faster than I can write the code to run PROC IMPORT. And then I know what it is doing.

PROC IMPORT will NOT guess the delimiter for you. You need to tell it what delimiter to use.

It is not hard for a human to find the delimiter. If you have command prompt just type out the first few lines and look at them. If you can't do that then just ask SAS to do it.

data _null_;
  infile "&file" obs=5 ;
  input;
  list;
run;

If you want to make a program do that then I guess you could try testing a set of common delimiters and see if they make sense. Probably the one that finds the most words is the right one.

data _null_;
  infile "&file" obs=1;
  input ;
  do dlm=',','09'x,'|',';',':' ;
    nwords = countw(_infile_,dlm,'Q');
    put dlm= $hex2. ' ' nwords=;
  end;
run;

LineMoon · Posted 09-25-2016 04:19 AM

@Tom thank you very much for your answer, it's a geat answer.

You said "I find that I can write the data step to read the file almost faster than I can write the code to run PROC IMPORT"

I am not cloded on "proc import", I am open for an other solution, please can you give more detail about your solution?

In your solution, you do not need to explore the external file( vaiables: type, length ) ?

Thank you again

Tom · Posted 09-25-2016 12:34 PM

If you have no idea what is in your data then let PROC IMPORT try to figure it out.

But the problem with that is that if you are receiving multiple versions of data in the same format than PROC IMPORT will make different decisions about lengths of character variables (or even the whether the variable is numeric or character) based on what records happen to be in the version it is currently looking at. Then when you try to combine the datasets later you get problems.

It is better to know what is in your data and write the data step to read it. Then the structure will not change.

If you did want to implement your own code that might apply different rules for guessing what is in the data then start by reading the data into a simple structure and analyzing what you find. You could just read everything into character variables. So this will read in upto 100 columns into character variables.

data raw ;
  infile "&file" dsd truncover lrecl=100000 ;
  length x1-x100 $500 ;
  input x1-x100;
run;

Or you could read it into an even simplier structure that has ROW, COL and VALUE.

data raw ;
  infile "&file" dsd truncover lrecl=100000 column=ccol length=llen;
  length row col 8 value $500 ;
  row+1;
  do col=1 by 1 until (ccol > llen);
    input value @ ;
    output;
  end;
run;

LineMoon · Posted 10-01-2016 04:07 AM

@Tom: Thank you very much, that's very kind from you.

as you said "start by reading the data into a simple structure and analyzing what you find"

The problem is that, a visual analysis for what I find, it will take much time, if I have more and more of files.

An other attention case, we can find: if the vaiable(V) is mixed : charcter and numeric.

V

1

2

..

6 /* line 32767*/

A1 /* line 32768*/

A2 /* line 32769*/

..

8 /* line 120000*/

Reeza · Posted 10-01-2016 04:40 AM

That's variable type. Determined the delimiter should be straightforward visually.

Problem is a processor or will make the same mistake as you've illustrated. I just had a 5 column file where it wasn't reading in all of the text.

Also, hoe do you plan to analyze information your unfamiliar with?

import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Re: import in sas 9.2 easily the files : txt , xls, xlsx file

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away