I would like to import text (attached example data). Id and text are 2 variables. There are spaces before id, so I could not import successfully into SAS (the actual data have about 5000 rows). Thank you.
You really need to find the process that is writing this file and fix that.
Here is method that appears to work for your example file.
Basically the first step removes the header line and all line breaks from the original file and inserts line breaks after the pipe characters.
Then the second step reads the lines and pulls the ID value from the end of the line. It uses the LAG() function to move the id from the end of the previous line to the text on the current line.
filename copy temp;
data _null_;
infile "c:\downloads\text_w2530_15.txt" recfm=f lrecl=1 end=eof;
file copy ;
* Get rid of "header" line ;
do until(x='0A'x); input x $char1.; end;
do while(not eof);
do until(x='|'); input x $char1.; if x not in ('0A'x,'0D'x) then put x $char1. @; end;
put;
end;
run;
data want;
length nextid id $30 text $1500 ;
infile copy ;
input ;
text = _infile_;
if substr(text,length(text))='|' then do;
nextid = scan(scan(text,-1,' '),1,'|');
text = substr(text,1,length(text)-lengthn(nextid)-2);
end;
id = lag(nextid);
if _n_ > 1 then output;
drop nextid ;
run;
You can then easily write it back out as a delimited file than can actually be read.
data _null_;
file 'c:\downloads\fixed.txt' dsd dlm='|';
set want;
put id text;
run;
So you appear to have 17 "lines" in that original file. Just so you can see it here are the lines truncated to just 50 bytes.
337 data _null_;
338 infile 'c:\downloads\fixed.txt' truncover;
339 input line $char50.;
340 put line $char50.;
341 run;
NOTE: The infile 'c:\downloads\fixed.txt' is:
Filename=c:\downloads\fixed.txt,
RECFM=V,LRECL=32767,File Size (bytes)=5562,
Last Modified=11Nov2020:17:20:57,
Create Time=11Nov2020:17:20:57
BC51140-00|9/4/15, BILATERAL MAMMOGRAPHY AND LEFT
BC151142-00|
BC251175-00|9/1/2015 (WHC) CT ANGIO (EVAL SOB) NO
BC351181-00|9/16/2015 (WHC) DX MAMMO/US: 7CM LOBUL
BC751182-00|7/22/15, MRI: MASS WITHIN SEGMENT VII
BC7851184-00|
BCG51185-00|08/05/15 PERIPHERAL BLOOD SMEAR: CHRON
BCT51186-00|SKELETAL SURVEY 8/13/15: DIFFUSE OSTEO
BCTY51187-00|05/27/2015-UNK LOCATION - FLEXIBLE SI
BC0951188-00|- 09/14/2015-WHC - MRI RT UPPER EXTRM
BC56651190-00|
BC56751192-00|CT A/P 9/17/15 - ENDOMETRIAL MASS W
BC56hu51196-00|09/04/2015-WHC-CT ABD/PELVIS-2.5CM
BC0955237-02|NONE
BCV55391-00|03/23/2015 (SMH) CT ABDOMEN/PELVIS: PA
BCtgy55402-00|02/12/2015: LEFT BREAST US- 1.5 CM I
BCdf55403-00|IMPRESSION 1. RESPONSE TO TREATMENT W
NOTE: 17 records were read from the infile 'c:\downloads\fixed.txt'.
The minimum record length was 12.
The maximum record length was 915.
How did you try to READ the data (you don't need to "IMPORT" a text file)?
The file you posted as 4 lines of data and one header line.
NOTE: 5 records were read from the infile "c:\downloads\text_w2530_15.txt".
The minimum record length was 7.
The maximum record length was 14508.
That second field is very long for a SAS variable. The maximum length of a character variable in SAS is 32K bytes.
None of the lines start with spaces. None of the values of TEXT start with spaces (other than the one that is totally empty). here are the first 50 characters of each line:
47 data _null_;
48 infile "c:\downloads\text_w2530_15.txt" lrecl=50 ;
49 input;
50 list;
51 run;
NOTE: A byte-order mark in the file "c:\downloads\text_w2530_15.txt" (for fileref "#LN00064") indicates
that the data is encoded in "utf-8". This encoding will be used to process the file.
NOTE: The infile "c:\downloads\text_w2530_15.txt" is:
Filename=c:\downloads\text_w2530_15.txt,
RECFM=V,LRECL=200,File Size (bytes)=17564,
Last Modified=11Nov2020:16:00:30,
Create Time=11Nov2020:16:00:29
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+--
1 ID|Text 7
2 BC51140-00|9/4/15, BILATERAL MAMMOGRAPHY AND LEFT
3 BC151142-00|
4 BC251175-00|9/1/2015 (WHC) CT ANGIO (EVAL SOB) NO
5 BC351181-00|9/16/2015 (WHC) DX MAMMO/US: 7CM LOBUL
NOTE: 5 records were read from the infile "c:\downloads\text_w2530_15.txt".
The minimum record length was 7.
The maximum record length was 50.
One or more lines were truncated.
Where did you get the file? Have you attempted to edit it in any way? If so do you still have the original?
How was it created?
You really need to find the process that is writing this file and fix that.
Here is method that appears to work for your example file.
Basically the first step removes the header line and all line breaks from the original file and inserts line breaks after the pipe characters.
Then the second step reads the lines and pulls the ID value from the end of the line. It uses the LAG() function to move the id from the end of the previous line to the text on the current line.
filename copy temp;
data _null_;
infile "c:\downloads\text_w2530_15.txt" recfm=f lrecl=1 end=eof;
file copy ;
* Get rid of "header" line ;
do until(x='0A'x); input x $char1.; end;
do while(not eof);
do until(x='|'); input x $char1.; if x not in ('0A'x,'0D'x) then put x $char1. @; end;
put;
end;
run;
data want;
length nextid id $30 text $1500 ;
infile copy ;
input ;
text = _infile_;
if substr(text,length(text))='|' then do;
nextid = scan(scan(text,-1,' '),1,'|');
text = substr(text,1,length(text)-lengthn(nextid)-2);
end;
id = lag(nextid);
if _n_ > 1 then output;
drop nextid ;
run;
You can then easily write it back out as a delimited file than can actually be read.
data _null_;
file 'c:\downloads\fixed.txt' dsd dlm='|';
set want;
put id text;
run;
So you appear to have 17 "lines" in that original file. Just so you can see it here are the lines truncated to just 50 bytes.
337 data _null_;
338 infile 'c:\downloads\fixed.txt' truncover;
339 input line $char50.;
340 put line $char50.;
341 run;
NOTE: The infile 'c:\downloads\fixed.txt' is:
Filename=c:\downloads\fixed.txt,
RECFM=V,LRECL=32767,File Size (bytes)=5562,
Last Modified=11Nov2020:17:20:57,
Create Time=11Nov2020:17:20:57
BC51140-00|9/4/15, BILATERAL MAMMOGRAPHY AND LEFT
BC151142-00|
BC251175-00|9/1/2015 (WHC) CT ANGIO (EVAL SOB) NO
BC351181-00|9/16/2015 (WHC) DX MAMMO/US: 7CM LOBUL
BC751182-00|7/22/15, MRI: MASS WITHIN SEGMENT VII
BC7851184-00|
BCG51185-00|08/05/15 PERIPHERAL BLOOD SMEAR: CHRON
BCT51186-00|SKELETAL SURVEY 8/13/15: DIFFUSE OSTEO
BCTY51187-00|05/27/2015-UNK LOCATION - FLEXIBLE SI
BC0951188-00|- 09/14/2015-WHC - MRI RT UPPER EXTRM
BC56651190-00|
BC56751192-00|CT A/P 9/17/15 - ENDOMETRIAL MASS W
BC56hu51196-00|09/04/2015-WHC-CT ABD/PELVIS-2.5CM
BC0955237-02|NONE
BCV55391-00|03/23/2015 (SMH) CT ABDOMEN/PELVIS: PA
BCtgy55402-00|02/12/2015: LEFT BREAST US- 1.5 CM I
BCdf55403-00|IMPRESSION 1. RESPONSE TO TREATMENT W
NOTE: 17 records were read from the infile 'c:\downloads\fixed.txt'.
The minimum record length was 12.
The maximum record length was 915.
Something is messed up with the last line in that file. Why is it so much longer than the others? Why does it have multiple pipe characters?
data check;
infile "c:\downloads\text_w2530_15.txt" length=ll;
input;
line+1;
line_length=ll;
length=lengthn(_infile_);
pipes=countc(_infile_,'|');
run;
proc print;
run;
line_ Obs line length length pipes 1 1 7 7 1 2 2 1012 638 1 3 3 1013 12 1 4 4 1013 514 1 5 5 14508 14024 14
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.