BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
kuppusamy
Fluorite | Level 6

I have data in below format as text file

MSGID|MNEMONIC|LINENO|LEVEL|TEXT|PBUTTONS
1036|CL_DOWN|1|O|Down|SASHELP.FSP.OK.SLIST
1037|CL_UP||1|O|Up|SASHELP.FSP.OK.SLIST
1038||CL_LEFT|1|O|Left|SASHELP.FSP.OK.SLIST

 

Here, first record is correct, in second and third record there is an extra pipe symbol(after second and first column), this is data error. but i need to solve this issue in SAS while importing. please anybody help.

 

i need output like below

MSGID|MNEMONIC|LINENO|LEVEL|TEXT|PBUTTONS
1036|CL_DOWN|1|O|Down|SASHELP.FSP.OK.SLIST
1037|CL_UP|1|O|Up|SASHELP.FSP.OK.SLIST
1038|CL_LEFT|1|O|Left|SASHELP.FSP.OK.SLIST

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If the extra | only occurs in the second field then perhaps you could do this:

options parmcards=csv;
filename csv temp;
parmcards4;
MSGID|MNEMONIC|LINENO|LEVEL|TEXT|PBUTTONS
1036|CL_DOWN|1|O|Down|SASHELP.FSP.OK.SLIST
1037|CL_UP||1|O|Up|SASHELP.FSP.OK.SLIST
1038||CL_LEFT|1|O|Left|SASHELP.FSP.OK.SLIST
1036|CL_DOWN||||SASHELP.FSP.OK.SLIST
;;;;

data xx;
  infile csv dsd dlm='|' truncover firstobs=2;
  length MSGID $8 MNEMONIC $28 LINENO 8 LEVEL $1 TEXT $10 PBUTTONS $107 ;
  length dummy $20 ;
  input msgid MNEMONIC @;
  if countc(_infile_,'|')>5 then do;
    input dummy @;
    MNEMONIC=cats(MNEMONIC,'|',dummy);
  end;
  input lineno -- pbuttons;
  drop dummy ;
run;
proc print;
run;

Tom_0-1683587224428.png

 

 

View solution in original post

11 REPLIES 11
Reeza
Super User
Can you show the code you're currently using to read the file?
kuppusamy
Fluorite | Level 6

File import code,


/* --------------------------------------------------------------------
Code generated by a SAS task

Generated on Monday, May 8, 2023 at 3:20:00 PM
By task: Import Data Wizard

Source file: C:\Users\PL4K760\Documents\SAMPLE.txt
Server: Local File System

Output data: WORK.SAMPLE_0000
Server: SASApp

Note: In preparation for running the following code, the Import
Data wizard has used internal routines to transfer the source data
file from the local file system to SASApp. There is no SAS code
available to represent this action.
-------------------------------------------------------------------- */

DATA WORK.SAMPLE_0000;
LENGTH
MSGID 8
MNEMONIC $ 28
LINENO $ 7
LEVEL $ 1
TEXT $ 119
PBUTTONS $ 20
F7 $ 20 ;
FORMAT
MSGID BEST4.
MNEMONIC $CHAR28.
LINENO $CHAR7.
LEVEL $CHAR1.
TEXT $CHAR119.
PBUTTONS $CHAR20.
F7 $CHAR20. ;
INFORMAT
MSGID BEST4.
MNEMONIC $CHAR28.
LINENO $CHAR7.
LEVEL $CHAR1.
TEXT $CHAR119.
PBUTTONS $CHAR20.
F7 $CHAR20. ;
INFILE '/saswork1/SAS_work28BB00001BF6_prodsascom5l/#LN00048'
LRECL=156
ENCODING="LATIN1"
TERMSTR=CRLF
DLM='7F'x
MISSOVER
DSD ;
INPUT
MSGID : ?? BEST4.
MNEMONIC : $CHAR28.
LINENO : $CHAR7.
LEVEL : $CHAR1.
TEXT : $CHAR119.
PBUTTONS : $CHAR20.
F7 : $CHAR20. ;
RUN;

 

output im getting is 

kuppusamy_0-1683579844421.png

here 1, 2,4 and 5th record is bad, i need record like 3rd

Reeza
Super User
Copy the code from the log and use that. Remove the DSD option and change MISSOVER to TRUNCOVER.
kuppusamy
Fluorite | Level 6
Its not worked, because of extra delimiter value is shifting to next column
Reeza
Super User

That's odd, seemed to work for me.

Tom's answer is better though 🙂

 

DATA WORK.SAMPLE_0000;
    LENGTH MSGID 8 MNEMONIC $ 28 LINENO $ 7 LEVEL $ 1 TEXT $ 119 PBUTTONS $ 20 
        F7 $ 20;
    FORMAT MSGID BEST4.
MNEMONIC $CHAR28.
LINENO $CHAR7.
LEVEL $CHAR1.
TEXT $CHAR119.
PBUTTONS $CHAR20.
F7 $CHAR20.;
    INFORMAT MSGID BEST4.
MNEMONIC $CHAR28.
LINENO $CHAR7.
LEVEL $CHAR1.
TEXT $CHAR119.
PBUTTONS $CHAR20.
F7 $CHAR20.;
    INFILE cards DLM='|' TRUNCOVER;
    INPUT MSGID 
MNEMONIC 
LINENO 
LEVEL 
TEXT 
PBUTTONS 
F7 ;
    cards;
1036|CL_DOWN|1|O|Down|SASHELP.FSP.OK.SLIST
1037|CL_UP||1|O|Up|SASHELP.FSP.OK.SLIST
1038||CL_LEFT|1|O|Left|SASHELP.FSP.OK.SLIST
;
    ;
    ;;
run;
kuppusamy
Fluorite | Level 6
It is working, thank you
I have records like
1036|CL_DOWN||||SASHELP.FSP.OK.SLIST
continuous missing values also there ,
in this case what should i do ? please help
Tom
Super User Tom
Super User

As long as the extra delimiters are just appearing at the end (or beginning) of a field you can just remove the DSD option and SAS will treat the adjacent delimiters as one delimiter.  Just like it treats spaces in normal LIST MODE input.

 

First let's get your lines of text into a file we can use to play with.

options parmcards=csv;
filename csv temp;
parmcards4;
MSGID|MNEMONIC|LINENO|LEVEL|TEXT|PBUTTONS
1036|CL_DOWN|1|O|Down|SASHELP.FSP.OK.SLIST
1037|CL_UP||1|O|Up|SASHELP.FSP.OK.SLIST
1038||CL_LEFT|1|O|Left|SASHELP.FSP.OK.SLIST
;;;;

Your IMPORT task wrote some really ugly SAS code.

Do you know if it also made that ugly text file?  Is the original file a text file or did the import task make the text file for you?

 

Write your own data step and the code can be much simpler and clearer.

 

You can use your knowledge of what the variables are to define them properly. 

 

For example that last one looks like a four level SAS object name.   Since a LIBREF can only be 8 character and the maximum length for any other name is 32 characters the maximum length it could be is 8+32*3+3 = 107. 

 

The first one looks like an ID variable so it should be CHARACTER since arithmetic on an ID has no meaning.

 

LINENO is probably a number.

 

So your data step could look like:

data want;
  infile csv dlm='|' truncover firstobs=2;
  length MSGID $8 MNEMONIC $28 LINENO 8 LEVEL $1 TEXT $10 PBUTTONS $107 ;
  input msgid -- pbuttons;
run;

Result

Tom_0-1683583175335.png

 

kuppusamy
Fluorite | Level 6
I have records like
1036|CL_DOWN||||SASHELP.FSP.OK.SLIST
continuous missing values also there ,
in this case what should i do ? please help
Tom
Super User Tom
Super User

@kuppusamy wrote:
I have records like
1036|CL_DOWN||||SASHELP.FSP.OK.SLIST
continuous missing values also there ,
in this case what should i do ? please help

Find the author of the file and get them to re-create it in a usable format.

The values that contain delimiters should have been in quotes, then the DSD option would work properly.

1037|"CL_UP|"|1|O|Up|SASHELP.FSP.OK.SLIST
1038|"|CL_LEFT"|1|O|Left|SASHELP.FSP.OK.SLIST

Without that the file is not parsable.

 

 

Personally I would just fix it by hand. You are lucky in this case that the file only has a few variables.

You could use a program to split the records into those with the right number of | and those with the wrong number.  Then you only have to edit the smaller number of corrupt lines.

filename csv 'have.txt';
filename good 'good.txt';
filename bag 'bad.txt';
data _null_;
  infile csv ;
  input;
  if _n_=1 or countc(_infile_,'|')=5 then do;
     file good;
     put _infile_;
  end;
  if _n_=1 or countc(_infile_,'|') ne 5 then do;
     file bad;
     put _infile_;
  end;
run;
6774   data _null_;
6775     infile bad;
6776     input;
6777     put _infile_;
6778   run;

NOTE: The infile BAD is:
      (system-specific pathname),
      (system-specific file attributes)

MSGID|MNEMONIC|LINENO|LEVEL|TEXT|PBUTTONS
1037|CL_UP||1|O|Up|SASHELP.FSP.OK.SLIST
1038||CL_LEFT|1|O|Left|SASHELP.FSP.OK.SLIST
NOTE: 3 records were read from the infile (system-specific pathname).
      The minimum record length was 39.
      The maximum record length was 43.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


6779
6780   data _null_;
6781     infile good;
6782     input;
6783     put _infile_;
6784   run;

NOTE: The infile GOOD is:
      (system-specific pathname),
      (system-specific file attributes)

MSGID|MNEMONIC|LINENO|LEVEL|TEXT|PBUTTONS
1036|CL_DOWN|1|O|Down|SASHELP.FSP.OK.SLIST
1036|CL_DOWN||||SASHELP.FSP.OK.SLIST
NOTE: 3 records were read from the infile (system-specific pathname).
      The minimum record length was 36.
      The maximum record length was 42.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

 

Tom
Super User Tom
Super User

If the extra | only occurs in the second field then perhaps you could do this:

options parmcards=csv;
filename csv temp;
parmcards4;
MSGID|MNEMONIC|LINENO|LEVEL|TEXT|PBUTTONS
1036|CL_DOWN|1|O|Down|SASHELP.FSP.OK.SLIST
1037|CL_UP||1|O|Up|SASHELP.FSP.OK.SLIST
1038||CL_LEFT|1|O|Left|SASHELP.FSP.OK.SLIST
1036|CL_DOWN||||SASHELP.FSP.OK.SLIST
;;;;

data xx;
  infile csv dsd dlm='|' truncover firstobs=2;
  length MSGID $8 MNEMONIC $28 LINENO 8 LEVEL $1 TEXT $10 PBUTTONS $107 ;
  length dummy $20 ;
  input msgid MNEMONIC @;
  if countc(_infile_,'|')>5 then do;
    input dummy @;
    MNEMONIC=cats(MNEMONIC,'|',dummy);
  end;
  input lineno -- pbuttons;
  drop dummy ;
run;
proc print;
run;

Tom_0-1683587224428.png

 

 

kuppusamy
Fluorite | Level 6
Thank you

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2360 views
  • 7 likes
  • 3 in conversation