Hi I'm getting weird output after running the below program. Basically I've excel CSV file and importing the data using infile with defined variables. But I'm getting the comma in the data values as mentioned below.
data A_dsn;
infile filename delimiter = ' ' truncover DSD lrecl=32767 firstobs=2;
input Claim_Reference $ 15. Sent_From $ 50. Sent_To $ 50. Subject $ 50. Document_Name $ 50.;
run;
reference | from | to | subject | Document_name |
454/154963, | example@mail.com | ,example1@mail.com | ,sample subject | ,sample document name |
Is there any chance I can improve the program ?
If you use the right delimiter, bang! - it works:
data want;
infile '$HOME/sascommunity/sample.csv' dlm=',' truncover firstobs=2;
input
claim_ref :$15.
sent_from :$50.
sent_to :$50.
subject :$30.
doc_name :$24.
;
run;
proc print data=want noobs;
run;
Result:
claim_ref sent_from sent_to AA/1/ /123456 AAAAAAAmotorserviceclaims@AAAAAAA.co.uk claimsdocumentation@aaaa.co.uk AA/1/BB/234567 AAAAAAAmotorserviceclaims@AAAAAAA.co.uk Mashh.aaaaa@aaaaaaaaa.com subject doc_name AAAAAAA Claim Reference AA/1/ I12345-01.12.17-05:43:01 AAAAAAA Claim Reference AA/1/B I23456-01.12.17-05:45:09
You might consider post-processing to get the timestamps out of doc_name, for instance.
Why are you reading a Comma Speparated Variable file, and yet setting the delimiter to a space? The delimiter in a CSV is a comma - its in the name! The simplest method to import a CSV is via proc import, however that removes some of the control. Better to read in via a datastep as you are doing, and specify what the input/output formats should be, lengths, labels etc. This gives you full control over the import and makes it repeatable (as proc import is a guessing procedure).
Then what is the question? I have answered the question as presented in your first post which stated:
"Hi I'm getting weird output after running the below program. Basically I've excel CSV file and importing the data using infile with defined variables. But I'm getting the comma in the data values as mentioned below."
So what would you like us to answer?
Hi @RW9,
I agree but I'm getting comma(,) along with actual values and I want to get rid of this and expecting only values.
Regards,
Reddi.
Post the code you are actually using. In the code you have posted using the wrong delimiter is most likely causing the problem.
infile filename delimiter = ' ' truncover DSD lrecl=32767 firstobs=2;
^ ^
I highlight the issue above, a delimeter tells SAS where to start and stop reading the data, and drops the delimiter. If you specify space as delimiter, then SAS will start and stop reading each time it encounters a space, and will drop the space. Therefore if you read the file in with delimiter=' ', commas will not be automatically dropped as being the delimiter, only space. You can correct this action by presenting the correct delimiter which is a comma:
infile filename delimiter = ',' truncover DSD lrecl=32767 firstobs=2;
If you use the right delimiter, bang! - it works:
data want;
infile '$HOME/sascommunity/sample.csv' dlm=',' truncover firstobs=2;
input
claim_ref :$15.
sent_from :$50.
sent_to :$50.
subject :$30.
doc_name :$24.
;
run;
proc print data=want noobs;
run;
Result:
claim_ref sent_from sent_to AA/1/ /123456 AAAAAAAmotorserviceclaims@AAAAAAA.co.uk claimsdocumentation@aaaa.co.uk AA/1/BB/234567 AAAAAAAmotorserviceclaims@AAAAAAA.co.uk Mashh.aaaaa@aaaaaaaaa.com subject doc_name AAAAAAA Claim Reference AA/1/ I12345-01.12.17-05:43:01 AAAAAAA Claim Reference AA/1/B I23456-01.12.17-05:45:09
You might consider post-processing to get the timestamps out of doc_name, for instance.
Right on Sir,
Thanks for the support everyone.
Regards,
Reddi.
Don't use formatted input with a delimited file. Use list mode input instead. You can either define your variables first (always a good habit to get into) and then just list them in the INPUT statement.
data a_dsn;
infile filename truncover DSD lrecl=32767 firstobs=2;
length Claim_Reference $15 Sent_From $50 Sent_To $50 Subject $50 Document_Name $50;
input Claim_Reference -- Document_Name ;
run;
Or add the : (colon) modifier before any informats that you have in-line in the INPUT statement. That way you are still using list mode input even though you have included informats. And the presence of the informats at the first reference to the variable names will let SAS guess that you wanted to define the variables with a length that matches the width of the informat specification instead of using the default length of $8 for character variables.
data a_dsn;
infile filename truncover DSD lrecl=32767 firstobs=2;
input Claim_Reference :$15. Sent_From :$50. Sent_To :$50. Subject :$50. Document_Name :$50;
run;
Also note that the $ is part of the informat name. If you code it the way you have you are telling SAS to read a character variable, because of the stand alone $, and then also telling it to use a numeric format. SAS will kindly recognize that it cannot read a character variable using the 10. informat and so will convert it to use the $10. informat instead. But other places where you can specify an informat will not be so accepting of the extra space.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.