Solved: Re: delimiter #

Ronein · Posted 11-15-2021 04:25 AM

Hello

I want to import tst file with delimiter #

proc import file='/path'
    out=work.shoes
    dbms='#';
run;

I get an error

ERROR: DBMS type # not valid for import.

What is the way to define that the delimiter is # ( sladder)

Tom · Posted 11-15-2021 10:47 AM

So you need to figure out WHY some of the observations are split between two records.

If you are lucky it is just because when the file was written they used too short of a setting for the LRECL options and sometimes the last few fields were written to a new line. In that case try reading the file with the FLOWOVER option instead of TRUNCOVER (or MISSOVER) option and perhaps it will automatically jump to the next line properly.

If you are unlucky it is because there were end of line characters in the value of one of the character variables in the observation. That is harder for SAS to read and you will probably need to fix the file first.

Again if you are lucky the actual lines are terminated with CR+LF pair and the embedded line breaks are just single CR or single LF characters. That you can fix by using the TERMSTR=CRLF option on the infile statement.

Next level is that the fields that contain the end of line characters are quoted. Then you can use a method like in this macro to create a version of the file that can be parsed. https://github.com/sasutils/macros/blob/master/replace_crlf.sas

If the fields with the end of line characters are not quoted then perhaps you can fix it by counting the number of delimiters. There are examples on this forum of that solution also.

View solution in original post

s_lassen · Posted 11-15-2021 04:47 AM

The correct way to do it is this:

proc import file='/path'
    out=work.shoes
    dbms=dlm;
delimiter='#';
run;

Ronein · Posted 11-15-2021 07:00 AM

The code you provided is not working well, I get one column only

This code was created by using import wizard and it works well except of some problems:

1-IF the observation is written on 2 rows (instead of 1 ) then It doesnt take the data correctly.

May you please explain why your code is not working well?

How can you adjust your code please?

thanks

DATA WORK.kelet;
obs=_n_;
    LENGTH
        F1               $ 55
        F2                 8
        F3                 8
        F4                 8
        F5                 8
        F6                 8
        F7                 8
        F8                 8
        F9                 8
        F10              $ 26
        F11                8
        F12                8
        F13              $ 65
        F14              $ 14
        F15              $ 90
        F16              $ 81
        F17                8
        F18              $ 1 ;
    FORMAT
        F1               $CHAR55.
        F2               BEST9.
        F3               BEST9.
        F4               BEST4.
        F5               DDMMYY10.
        F6               DDMMYY10.
        F7               BEST17.
        F8               BEST16.
        F9               BEST3.
        F10              $CHAR26.
        F11              BEST4.
        F12              BEST9.
        F13              $CHAR65.
        F14              $CHAR14.
        F15              $CHAR90.
        F16              $CHAR81.
        F17              BEST9.
        F18              $CHAR1. ;
    INFORMAT
        F1               $CHAR55.
        F2               BEST9.
        F3               BEST9.
        F4               BEST4.
        F5               DDMMYY10.
        F6               DDMMYY10.
        F7               BEST17.
        F8               BEST16.
        F9               BEST3.
        F10              $CHAR26.
        F11              BEST4.
        F12              BEST9.
        F13              $CHAR65.
        F14              $CHAR14.
        F15              $CHAR90.
        F16              $CHAR81.
        F17              BEST9.
        F18              $CHAR1. ;
    INFILE 'path'
        LRECL=285
        ENCODING="HEBREW"
        TERMSTR=CRLF
        DLM='7F'x
        MISSOVER
        DSD ;
    INPUT
        F1               : $CHAR55.
        F2               : ?? BEST9.
        F3               : ?? BEST9.
        F4               : ?? BEST4.
        F5               : ?? DDMMYY10.
        F6               : ?? DDMMYY10.
        F7               : ?? COMMA17.
        F8               : ?? COMMA16.
        F9               : ?? BEST3.
        F10              : $CHAR26.
        F11              : ?? BEST4.
        F12              : ?? BEST9.
        F13              : $CHAR65.
        F14              : $CHAR14.
        F15              : $CHAR90.
        F16              : $CHAR81.
        F17              : ?? BEST9.
        F18              : $CHAR1. ;
RUN;

HarrySnart · Posted 11-15-2021 05:07 AM

Hello @Ronein, you can read a file with custom delimiter using the data step's infile statement. See below a simple example. I uploaded a dummy text file into SAS Studio and read in via data step. Note I added a header which I ignore in the infile statement by setting which observation to start at.

*Read inputs from .txt file with # delimiter;
data work.example;
length cat1 cat2 $200 num1 3. ;
infile '/home/mydata/quick_example.txt'
delimiter='#' dsd truncover firstobs=2;
input cat1 cat2 num1;
run;

*Print example output;
proc print data=work.example;title 'Parsed Infile with # Delimiter';run;

Hope this helps answer your question.

Thanks

Harry

Ronein · Posted 11-15-2021 06:36 AM

Thanks,
I found that in some observations it happended that each observations is in 2 rows (instead of 1).
What is the way to tell SAS to continue add data to the observation?
I know the using of @@ ut may you show how to use it in the following :
/****A*****/
data work.example;
length cat1 cat2 $200 num1 3. ;
infile '/home/mydata/quick_example.txt'
delimiter='#' dsd truncover firstobs=2;
input cat1 cat2 num1;
run;

/****B***************/
proc import file='/path'
out=work.shoes
dbms=dlm;
delimiter='#';
run;

HarrySnart · Posted 11-15-2021 07:20 AM

Hi @Ronein please can you share a few lines of dummy data as an example to the data you're trying to import? Using the delimiter option in the infile the data step expects the same number of columns defined on each row. If your data is very messy you can alternatively read the lines into a single row and then subset items using scan() for example, I've done this previously where some rows only contain a subset of the total expected columns.

Thanks

Harry

Ronein · Posted 11-15-2021 10:17 AM

Thanks.
The source data file (txt with # delimiter) has 2.5 million observations.
In most of cases each observation is located in 1 row.
But, unfortunately sometimes the observation is located in 2 rows .
In such cases using proc report wizard was not good because sas created 2 observations instead of 1( please see before the code was generated by import wizard).
My question is how to solve this problem via the code I provided before.thanks

Tom · Posted 11-15-2021 10:47 AM

So you need to figure out WHY some of the observations are split between two records.

If you are lucky it is just because when the file was written they used too short of a setting for the LRECL options and sometimes the last few fields were written to a new line. In that case try reading the file with the FLOWOVER option instead of TRUNCOVER (or MISSOVER) option and perhaps it will automatically jump to the next line properly.

If you are unlucky it is because there were end of line characters in the value of one of the character variables in the observation. That is harder for SAS to read and you will probably need to fix the file first.

Again if you are lucky the actual lines are terminated with CR+LF pair and the embedded line breaks are just single CR or single LF characters. That you can fix by using the TERMSTR=CRLF option on the infile statement.

Next level is that the fields that contain the end of line characters are quoted. Then you can use a method like in this macro to create a version of the file that can be parsed. https://github.com/sasutils/macros/blob/master/replace_crlf.sas

If the fields with the end of line characters are not quoted then perhaps you can fix it by counting the number of delimiters. There are examples on this forum of that solution also.

Tom · Posted 11-15-2021 08:54 AM

You tell the INPUT statement to go to a new line when reading a delimited file the same way you would when reading any text file.

The character / in an input statement means to go to a new line. So if you text file is structured to have three fields on the first line and two on the second then your code to read it might look like this:

data want;
   infile 'myfile.txt' dlm='#' ;
   length var1 var2 8 var3 $20 var4 $10 var5 8 var6 $30 ;
   input var1 var2 var3 / var4 var 5;
run;

Registration is open

SAS Training: Just a Click Away