BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Santt0sh
Lapis Lazuli | Level 10
Hi All,
Need Help with reading a .txt file with multiple variable lengths.
The file I am trying to read in to SAs dataset is Space delimited and has multiple variable lengths.
I have tried reading the file using all the input methods but not able to read it properly.
Issue lies with each variable being separated by multiple spaces or tabs.
The file has around 160 variables and I'm facing issues setting and determining the column start point and end point.

Kindly suggest
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@Santt0sh wrote:
Hi Ballard,

I tried importing the data into SAS using proc import but the data is restricted to a single column and i have also tried to create required variables using



What delimiter or DBMS did you tell Proc Import to use for reading the file?

 

There is a chance that your data is NOT delimited, especially if the spaces or tabs (and you need to determine which) are irregularly placed, but is FIXED Column, which means you need to tell which column to read each value.

Do you have any document that describes the content of your source file? If so you should share it. Or check with the source of the file for a description.

View solution in original post

12 REPLIES 12
andreas_lds
Jade | Level 19

Please post the code you are using to read the file and, if possible, some lines of the file. Both as text using "Insert <SAS> code".

ballardw
Super User

The question is are the variables separated by spaces or by tabs or both? Or is it an exact number of spaces and tabs between each variable? That might indicate that you want the DLMSTR option instead of DLM.

 

Example of the file really is needed. Copy enough lines from the text file using a text editor like Notepad and on the forum open a text window with the </> icon that appears above the message window and paste the text. The text box is important as the message window will reformat text, especially spaces and tabs, and the result will not be like your actual file.

 

Often when "variables are separated by multiple spaces <or any delimiter>" it means that some columns are not populated for that row but are on other rows. So you need to be sure if that is or is not the case.

 

I would start with Proc Import and a large value for the Guessingrows option and see what the result looks like. If you don't quite like the result then the generated code from Proc Import might

Santt0sh
Lapis Lazuli | Level 10
Hi Ballard,

Thank you for your quick response.

I have pasted a few lines of the text in the main message. I was unable to find <//> to paste the lines of the txt file.

Im trying to read the file using the belo:

data all ds &month id;
infile "/p[ath/mth_newton_all.txt" lrec input @1 SRCE SYS CDE
$2. @3 ABC_INFO SENT IDC @4 kgd INFO_SENT IDC @5 Obnk_ INFO SENT IDC @6 bK INFO SENT IDC @7 Bbnk INFO SENT IDC @8 xdr INFO SENT IDC
$1. @9 PROC DATE
YYMMDD10 @19 TRAN DATE
YYMMDD10. @29 TRAN NBR
$24. @53 PAY DATE
YYMMDD10. @63 BANK ID
$3. @ 66 LOCATION
$32. 18
@104 SRC CODE 19
@107. ADV TYPE 20
@110 WIRE TYPE @113 TRAN AMT
16.2. 22
@129 NTWK SN IDTYPE
@130 NTWK_SN_ID 24.
$35. @165 DBT IDTP
$1. @166 DB ID
$35. @201 DB ID BNK NumBR 27
$3. @204 DB NAME1 @254 DB NAME 2
$40. @294 DB NAME 3
$40. @334 DB NAME 4
$40 @374 DB NAMES
$40. @414 DB_ACC_TIN @424 CD IDTPE @425 CD ID
$35.


$1.

$50.
$10.

@460 CD ID BNK NBR
$3.




Santt0sh
Lapis Lazuli | Level 10
Hi Ballard,

I tried importing the data into SAS using proc import but the data is restricted to a single column and i have also tried to create required variables using

data temp;
infile 'path\sample.txt' dlm = ' ' lrecl = 3500;
length id $2.;
input ;
id = scan(_infile_,1));
id2 = (scan(_infile_,2));
run;
Santt0sh
Lapis Lazuli | Level 10
Hi Balalrd,

The issues appears when the data is split in multiple tabs or spaces then i need to use scan for each variable and then concatenate them to arrive at a variable for example name of the office, address1,address2,location,state,country etc.

kindly suggest
ballardw
Super User

@Santt0sh wrote:
Hi Ballard,

I tried importing the data into SAS using proc import but the data is restricted to a single column and i have also tried to create required variables using



What delimiter or DBMS did you tell Proc Import to use for reading the file?

 

There is a chance that your data is NOT delimited, especially if the spaces or tabs (and you need to determine which) are irregularly placed, but is FIXED Column, which means you need to tell which column to read each value.

Do you have any document that describes the content of your source file? If so you should share it. Or check with the source of the file for a description.

Santt0sh
Lapis Lazuli | Level 10
Hi Ballard,

I tried the DLM = Space when importing, yes I do have a document which states the length of the Variables. Let me try reading the file again with the specific lengths

Thank You
ballardw
Super User

Does your browser not show stuff above the message window starting with a B U S ? The 7th Icon the </> is for a text box.

What you have pasted we cannot tell where any line of data might end and as I said.

 

That data step would throw a number of errors.

The Infile statement does not end in a ; You also do not indicate any delimiter between values. If the file is space delimited you should have something like DLM=' ' , just in case.

I don't see a ; to end the INPUT statement either.

You have way too many variables named INFO, SENT and IDC. I suspect that your intent was variable names like "SRCE_SYS_CDE" and not the "SRCE SYS CDE", which is 3 variables that actually appear in your code.

Lets parse what some of your code is saying:

input @1 SRCE SYS CDE
$2. @3 ABC_INFO SENT IDC @4 kgd INFO_SENT IDC

Which I am going to reformat a bit for readability:

input 
  @1 SRCE SYS CDE  $2. 
  @3 ABC_INFO SENT IDC 
  @4 kgd INFO_SENT IDC

This says start in column 1 (okay so far but typically not needed) read the variable SRCE using the default numeric informat (space after SRCE), then read a variable SYS after some space separating values with the default numeric informat, then after some spaces read 2 characters into the variable CDE.

Then move to column 3 (likely either the middle of what ever was attempted to read for SRCE or possibly SYS) then read the next value into a numeric variable ABC_Info, then read the next value into a variable SENT, then the next value into IDC.

After that move to column 4, very likely back into the value of Sent if not ABC_Info, read a numeric value Kgd, then another numeric value INFO_SENT, then another numeric value into IDC replacing the previous values read into IDC.

 

I suspect you need to copy the text of the LOG from running your data step. Really look above the message window for the </> ONE slash, or the little "Running man" icon to open boxes to paste text or code.

 

 

Suggestion: Use an INFORMAT statement to assign the formats instead of placing them in the INPUT statement.

Then just list the variables on the INPUT statement.

As written your code forces every variable to read exactly the number of characters indicated in the informats when present.

Which may cause problems.

 

Santt0sh
Lapis Lazuli | Level 10
Hi Ballard,

Thank you for your time and suggestions.
I was able to import the file into SAS using Proc import fixed width.now I can create or keep the required variables.


Regards,
Santosh
Santt0sh
Lapis Lazuli | Level 10
SAMPLE TEXT FILE:
--------------------------------------------------------------

CP 2022-05-19X21X9X9X662X01 2020-05-19 BNK TRN000000000000000054166 B1022XXC00000000000003338488333 MACYSSDC INCOMING TEST CUST 2 XMSO0000000444256844444

2020-05-19 XXXXX000000000000000000000000000000000001000022156




APPLIED ABC CORPORATION MGMNT LTD ABC
??


THIS is hex>>>> 25>>>


VP 2022-05-20221BGHJU5215875415240 2020-05-20 FINS TRN000000000000000000971 B1032XXC00000000000003339557412 UPTRAX CAN INC

MONTRAL ON H3KL 65GX
CA
KKKKLX000000000000000000000000000000000008642555330 TEST 2020-05-20DEFRTY99YYY

USA KBNMCCMKIJIMN BANK OF XYZ

PRK STREET
MONTREAL CA H3KL 658 PXD XCVFTBEBDFRTYX

TEST BNK NAME BRAZIL BR
BLVD GEN SQURE 35
2000
BXYSEZCVFG009
CA

USD 000000000000000000000000000000000001205200 000000000000000000000000000000000001205200
XXXXXXXBP PC XPZ 0.0455 3000000000000000000000000000000000000000000000000000045857512565989SD KNKNKN

00000000000000000000000000000000000000000000000000000


1



Tom
Super User Tom
Super User

If you just paste the text into the main editor box without using the pop-up window you get when you press on the Insert Code icon on the menu above the editor box then it is treated as PARAGRAPHS and the formatting is destroyed.

First click the icon.

Tom_0-1659538275022.png

 

Then paste into the new box that opens for the text/code.

Tom
Super User Tom
Super User

Open the file in a simple text editor, such as Notepad, and look at with a fixed space font so that you can see if the same fields appear in the same place on each line.

 

If you cannot find a good editor then just use the LIST command to dump some of the lines to the SAS log.

For example this code should show the first 5 lines.

data _null_;
  infile "/p[ath/mth_newton_all.txt"  obs=5;
  input;
  list;
run;

You will be able to see the column numbers above the lines. And if there are any non-printable characters it will also show an extra two rows in the output that indicate under the character the two digit hexadecimal value used to represent that character.  For example the tab character is '09'x.

 

There were no tab characters in the text that the forum made out of your previous attempt to share example.  But there were a lot of empty lines.  click "spoiler" to see 

Spoiler
43  data _null_;
244    infile tmpfile1 ;
245    input;
246    list;
247  run;

NOTE: The infile TMPFILE1 is:
      Filename=C:\Users\ABERNA~1\AppData\Local\Temp\1\SAS Temporary Files\_TD12844_AMRAPY3WVP0VKU0_\#LN00110,
      RECFM=V,LRECL=32767,File Size (bytes)=956,
      Last Modified=03Aug2022:11:01:00,
      Create Time=03Aug2022:11:01:00

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1         CP 2022-05-19X21X9X9X662X01 2020-05-19 BNK TRN000000000000000054166 B1022XXC00000000000003338488333
     101  MACYSSDC INCOMING TEST CUST 2 XMSO0000000444256844444 153
2          0
3         2020-05-19 XXXXX000000000000000000000000000000000001000022156 61
4          0
5          0
6          0
7          0
8         APPLIED ABC CORPORATION MGMNT LTD ABC 37
9         ?? 2
10         0
11         0
12        THIS is hex>>>> 25>>> 21
13         0
14         0
15        VP 2022-05-20221BGHJU5215875415240 2020-05-20 FINS TRN000000000000000000971 B1032XXC0000000000000333
     101  9557412 UPTRAX CAN INC 122
16         0
17        MONTRAL ON H3KL 65GX 20
18        CA 2
19        KKKKLX000000000000000000000000000000000008642555330 TEST 2020-05-20DEFRTY99YYY 78
20         0
21        USA KBNMCCMKIJIMN BANK OF XYZ 29
22         0
23        PRK STREET 10
24        MONTREAL CA H3KL 658 PXD XCVFTBEBDFRTYX 39
25         0
26        TEST BNK NAME BRAZIL BR 23
27        BLVD GEN SQURE 35 17
28        2000 4
29        BXYSEZCVFG009 13
30        CA 2
31         0
32        USD 000000000000000000000000000000000001205200 000000000000000000000000000000000001205200 89
33        XXXXXXXBP PC XPZ 0.0455 3000000000000000000000000000000000000000000000000000045857512565989SD KNKNKN 100
34         0
35        00000000000000000000000000000000000000000000000000000 53
36         0
37         0
38        1 1
39         0
40         0
NOTE: 40 records were read from the infile TMPFILE1.
      The minimum record length was 0.
      The maximum record length was 153.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1602 views
  • 0 likes
  • 4 in conversation