hi,
I typically don't deal with "unstructured" text and I know there is a solution to this, I'd appreciate for any guidance and help.
I have a comma delimited file read as such, string variable will be between two quotes, and numeric will not have quotes.
The first line is the variable name, y1, y2, y3, where y1 and y2 are strings and y3 is numeric.
as you can notice the second row, there is a human error of "enter" to a new line
v1: (with human "enter")
"y1","y2","y3"
"ASLG","SDF",5
"asldhl", "ser,g
sdfj",3
v2: (this is how it's supposed to be)
"y1","y2","y3"
"ASLG","SDF",5
"asldhl", "ser,gsdfj",3
v3:(output desired)
y1 | y2 | y3 |
ASLG | SDF | 5 |
asldhl | ser,gsdfj | 3 |
Would someone please help me, is there anyway I can read the data from v1 to v3?
Thanks!
Joanne
You could do:
data test;
infile datalines firstobs=2 truncover;
length y1 $12 y2 $12 y3 8 q $200;
do until (mod(countc(q,'"'),2)=0);
input line $200.;
q = cats(q, line);
end;
y1 = scan(q,1,",","qr");
y2 = scan(q,2,",","qr");
y3 = input(scan(q,3,",","qr"), ?? best.);
drop line q;
datalines;
"y1","y2","y3"
"ASLG","SDF",5
"asldhl", "ser,g
sdfj",3
;
proc print data=test; run;
If this is a once-off excercise and you don't have to many data then I would either send the data back to the sender and ask to fix it or then just read in the data into a single string, count the double quotes and then go into the lines with odd counts and fix it manually.
IF the manual enters are not end of line delimiters - eg. under Windows end of line would be CRLF but a "manual" enter could just be an LF - then the issue wouldn't be that big and SAS would still treat the record as being on a single line. You would then just have to remove the LF programmatically.
I normally use Notepad ++ to determine how things really look like. Just open the text file and under View/Show Symbol select "show all characters".
Ideally: Post an attachment with your data and also tell us under which OS you're running SAS.
What happens when you import now?
Post your code please, if using a data step make sure to specify the DSD option
I would consider a FIND-REPLACE ALL in a text editor such as Notepad++
You could do:
data test;
infile datalines firstobs=2 truncover;
length y1 $12 y2 $12 y3 8 q $200;
do until (mod(countc(q,'"'),2)=0);
input line $200.;
q = cats(q, line);
end;
y1 = scan(q,1,",","qr");
y2 = scan(q,2,",","qr");
y3 = input(scan(q,3,",","qr"), ?? best.);
drop line q;
datalines;
"y1","y2","y3"
"ASLG","SDF",5
"asldhl", "ser,g
sdfj",3
;
proc print data=test; run;
data want; infile 'c:\temp\test.txt' recfm=n dsd dlm='0A2C'x; input (y1 y2 y3) (:$20.) @@; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.