I would like to import text (attached example data). Id and text are 2 variables. There are spaces before id, so I could not import successfully into SAS (the actual data have about 5000 rows). Thank you.
You really need to find the process that is writing this file and fix that.
Here is method that appears to work for your example file.
Basically the first step removes the header line and all line breaks from the original file and inserts line breaks after the pipe characters.
Then the second step reads the lines and pulls the ID value from the end of the line. It uses the LAG() function to move the id from the end of the previous line to the text on the current line.
filename copy temp;
data _null_;
infile "c:\downloads\text_w2530_15.txt" recfm=f lrecl=1 end=eof;
file copy ;
* Get rid of "header" line ;
do until(x='0A'x); input x $char1.; end;
do while(not eof);
do until(x='|'); input x $char1.; if x not in ('0A'x,'0D'x) then put x $char1. @; end;
put;
end;
run;
data want;
length nextid id $30 text $1500 ;
infile copy ;
input ;
text = _infile_;
if substr(text,length(text))='|' then do;
nextid = scan(scan(text,-1,' '),1,'|');
text = substr(text,1,length(text)-lengthn(nextid)-2);
end;
id = lag(nextid);
if _n_ > 1 then output;
drop nextid ;
run;
You can then easily write it back out as a delimited file than can actually be read.
data _null_;
file 'c:\downloads\fixed.txt' dsd dlm='|';
set want;
put id text;
run;
So you appear to have 17 "lines" in that original file. Just so you can see it here are the lines truncated to just 50 bytes.
337 data _null_; 338 infile 'c:\downloads\fixed.txt' truncover; 339 input line $char50.; 340 put line $char50.; 341 run; NOTE: The infile 'c:\downloads\fixed.txt' is: Filename=c:\downloads\fixed.txt, RECFM=V,LRECL=32767,File Size (bytes)=5562, Last Modified=11Nov2020:17:20:57, Create Time=11Nov2020:17:20:57 BC51140-00|9/4/15, BILATERAL MAMMOGRAPHY AND LEFT BC151142-00| BC251175-00|9/1/2015 (WHC) CT ANGIO (EVAL SOB) NO BC351181-00|9/16/2015 (WHC) DX MAMMO/US: 7CM LOBUL BC751182-00|7/22/15, MRI: MASS WITHIN SEGMENT VII BC7851184-00| BCG51185-00|08/05/15 PERIPHERAL BLOOD SMEAR: CHRON BCT51186-00|SKELETAL SURVEY 8/13/15: DIFFUSE OSTEO BCTY51187-00|05/27/2015-UNK LOCATION - FLEXIBLE SI BC0951188-00|- 09/14/2015-WHC - MRI RT UPPER EXTRM BC56651190-00| BC56751192-00|CT A/P 9/17/15 - ENDOMETRIAL MASS W BC56hu51196-00|09/04/2015-WHC-CT ABD/PELVIS-2.5CM BC0955237-02|NONE BCV55391-00|03/23/2015 (SMH) CT ABDOMEN/PELVIS: PA BCtgy55402-00|02/12/2015: LEFT BREAST US- 1.5 CM I BCdf55403-00|IMPRESSION 1. RESPONSE TO TREATMENT W NOTE: 17 records were read from the infile 'c:\downloads\fixed.txt'. The minimum record length was 12. The maximum record length was 915.
How did you try to READ the data (you don't need to "IMPORT" a text file)?
The file you posted as 4 lines of data and one header line.
NOTE: 5 records were read from the infile "c:\downloads\text_w2530_15.txt". The minimum record length was 7. The maximum record length was 14508.
That second field is very long for a SAS variable. The maximum length of a character variable in SAS is 32K bytes.
None of the lines start with spaces. None of the values of TEXT start with spaces (other than the one that is totally empty). here are the first 50 characters of each line:
47 data _null_; 48 infile "c:\downloads\text_w2530_15.txt" lrecl=50 ; 49 input; 50 list; 51 run; NOTE: A byte-order mark in the file "c:\downloads\text_w2530_15.txt" (for fileref "#LN00064") indicates that the data is encoded in "utf-8". This encoding will be used to process the file. NOTE: The infile "c:\downloads\text_w2530_15.txt" is: Filename=c:\downloads\text_w2530_15.txt, RECFM=V,LRECL=200,File Size (bytes)=17564, Last Modified=11Nov2020:16:00:30, Create Time=11Nov2020:16:00:29 RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+-- 1 ID|Text 7 2 BC51140-00|9/4/15, BILATERAL MAMMOGRAPHY AND LEFT 3 BC151142-00| 4 BC251175-00|9/1/2015 (WHC) CT ANGIO (EVAL SOB) NO 5 BC351181-00|9/16/2015 (WHC) DX MAMMO/US: 7CM LOBUL NOTE: 5 records were read from the infile "c:\downloads\text_w2530_15.txt". The minimum record length was 7. The maximum record length was 50. One or more lines were truncated.
Where did you get the file? Have you attempted to edit it in any way? If so do you still have the original?
How was it created?
You really need to find the process that is writing this file and fix that.
Here is method that appears to work for your example file.
Basically the first step removes the header line and all line breaks from the original file and inserts line breaks after the pipe characters.
Then the second step reads the lines and pulls the ID value from the end of the line. It uses the LAG() function to move the id from the end of the previous line to the text on the current line.
filename copy temp;
data _null_;
infile "c:\downloads\text_w2530_15.txt" recfm=f lrecl=1 end=eof;
file copy ;
* Get rid of "header" line ;
do until(x='0A'x); input x $char1.; end;
do while(not eof);
do until(x='|'); input x $char1.; if x not in ('0A'x,'0D'x) then put x $char1. @; end;
put;
end;
run;
data want;
length nextid id $30 text $1500 ;
infile copy ;
input ;
text = _infile_;
if substr(text,length(text))='|' then do;
nextid = scan(scan(text,-1,' '),1,'|');
text = substr(text,1,length(text)-lengthn(nextid)-2);
end;
id = lag(nextid);
if _n_ > 1 then output;
drop nextid ;
run;
You can then easily write it back out as a delimited file than can actually be read.
data _null_;
file 'c:\downloads\fixed.txt' dsd dlm='|';
set want;
put id text;
run;
So you appear to have 17 "lines" in that original file. Just so you can see it here are the lines truncated to just 50 bytes.
337 data _null_; 338 infile 'c:\downloads\fixed.txt' truncover; 339 input line $char50.; 340 put line $char50.; 341 run; NOTE: The infile 'c:\downloads\fixed.txt' is: Filename=c:\downloads\fixed.txt, RECFM=V,LRECL=32767,File Size (bytes)=5562, Last Modified=11Nov2020:17:20:57, Create Time=11Nov2020:17:20:57 BC51140-00|9/4/15, BILATERAL MAMMOGRAPHY AND LEFT BC151142-00| BC251175-00|9/1/2015 (WHC) CT ANGIO (EVAL SOB) NO BC351181-00|9/16/2015 (WHC) DX MAMMO/US: 7CM LOBUL BC751182-00|7/22/15, MRI: MASS WITHIN SEGMENT VII BC7851184-00| BCG51185-00|08/05/15 PERIPHERAL BLOOD SMEAR: CHRON BCT51186-00|SKELETAL SURVEY 8/13/15: DIFFUSE OSTEO BCTY51187-00|05/27/2015-UNK LOCATION - FLEXIBLE SI BC0951188-00|- 09/14/2015-WHC - MRI RT UPPER EXTRM BC56651190-00| BC56751192-00|CT A/P 9/17/15 - ENDOMETRIAL MASS W BC56hu51196-00|09/04/2015-WHC-CT ABD/PELVIS-2.5CM BC0955237-02|NONE BCV55391-00|03/23/2015 (SMH) CT ABDOMEN/PELVIS: PA BCtgy55402-00|02/12/2015: LEFT BREAST US- 1.5 CM I BCdf55403-00|IMPRESSION 1. RESPONSE TO TREATMENT W NOTE: 17 records were read from the infile 'c:\downloads\fixed.txt'. The minimum record length was 12. The maximum record length was 915.
Something is messed up with the last line in that file. Why is it so much longer than the others? Why does it have multiple pipe characters?
data check;
infile "c:\downloads\text_w2530_15.txt" length=ll;
input;
line+1;
line_length=ll;
length=lengthn(_infile_);
pipes=countc(_infile_,'|');
run;
proc print;
run;
line_ Obs line length length pipes 1 1 7 7 1 2 2 1012 638 1 3 3 1013 12 1 4 4 1013 514 1 5 5 14508 14024 14
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.