BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Abelp9
Quartz | Level 8

Hello everyone, I'm new to SAS programming so forgive my possible ignorance in advance.

I would like to create a program that takes a csv file that has a delimiter of '||' and I want to change it to '|""|' with two double quotes between the slashes (|).

 

And then, once the csv is transformed export it to the same path where it was.

The part of exporting the file is clear to me, what I fail to do is import the csv file and change its delimiter. This is what i tried:

 

*IMPORT;

proc import file="/XX/XX/XX/XX/X/XX/XX/XXX.csv"
out=work.test
dbms=csv
replace;
DELIMITER= '|""|';
run;

 

*EXPORT;

proc export data=work.test
outfile="/xx/xx/xx/xx/CA/xx/xx/xx.csv"
dbms=csv
replace;
run;

 

The export works for me but there are errors in the import, it returns this error: ERROR: Quotes (' or ") have been detected in the delimiter (DLM=) string. Quotes cannot be used as delimiter characters when the DSD option is also specified. If someone could help me / explain how to do it I would be extremely grateful, thank you very much in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So the DO loop is not detecting the last empty value.

There are ways to fix that. You could just hard code the number of fields into the DO loop instead of trying to have it detect the number of fields.  If you don't know in advance how many fields there are try checking if the last character on the line is the delimiter and add one more field when it is.  Or try reading the whole file to get a count of the maximum number of fields found.

data _null_;
  infile original dlm='|' dsd truncover ;
  file new dlm='|' ;
  do col=1 to 8 until(cc>ll);
    input value ~ :$char32767. @;
    if cc<ll then input nextch $char1. +(-1) @;
    if missing(value) then value ='""';
    put value :$char32767. @;
  end;
  put;
run;

Results:

823   data _null_;
824     infile new;
825     input;
826     n=countc(_infile_,'|');
827     put n= _infile_;
828   run;

NOTE: The infile NEW is:
      Filename=...\#LN00107,
      RECFM=V,LRECL=32767,File Size (bytes)=56,
      Last Modified=09Feb2022:08:41:17,
      Create Time=09Feb2022:08:41:17

n=7 "Hello"|"what"|""|"hello"|""|""|""|""
n=7 1|2|3|4|5|6|7|8
NOTE: 2 records were read from the infile NEW.
      The minimum record length was 15.
      The maximum record length was 37.

 

Another method you could use when the lines (after inserting the extra quotes) are shorter than 32,767 bytes is to replicate your manual process.

data _null_;
  infile original truncover ;
  file new dlm='|' ;
  input line $char32767.;
  line=tranwrd(line,'||','|""|');
  length = lengthn(line);
  put line $varying32767. length; 
run;

Results:

840   data _null_;
841     infile new;
842     input;
843     n=countc(_infile_,'|');
844     put n= _infile_;
845   run;

NOTE: The infile NEW is:
      Filename=...\#LN00107,
      RECFM=V,LRECL=32767,File Size (bytes)=52,
      Last Modified=09Feb2022:08:41:17,
      Create Time=09Feb2022:08:41:17

n=7 "Hello"|"what"|""|"hello"|""||""|
n=7 1|2|3|4|5|6|7|8
NOTE: 2 records were read from the infile NEW.
      The minimum record length was 15.
      The maximum record length was 33.

 

Notice how there is a slight difference.

"Hello"|"what"|""|"hello"|""||""|
"Hello"|"what"|""|"hello"|""|""|""|""

You can adjust to repeat the replacing until all of the || have had "" inserted.

data _null_;
  infile original truncover ;
  file new dlm='|' ;
  input line $char32767.;
  do while (index(line,'||'));
    line=tranwrd(line,'||','|""|');
  end;
  length = lengthn(line);
  put line $varying32767. length; 
run;

Results:

874   data _null_;
875     infile new;
876     input;
877     n=countc(_infile_,'|');
878     put n= _infile_;
879   run;

NOTE: The infile NEW is:
      Filename=...\#LN00107,
      RECFM=V,LRECL=32767,File Size (bytes)=54,
      Last Modified=09Feb2022:08:56:14,
      Create Time=09Feb2022:08:41:17

n=7 "Hello"|"what"|""|"hello"|""|""|""|
n=7 1|2|3|4|5|6|7|8
NOTE: 2 records were read from the infile NEW.
      The minimum record length was 15.
      The maximum record length was 35.

Here is your photograph:

CSV-que-quiero.png

 

You still have not explained why a NORMAL delimited file without all of those extra quotes is not sufficient.  What are you going to DO with the new file?  What happens when you use a normal CSV file?

 

You should also try other types of lines.

What about when the last field is not empty?
What about when one or more of the fields is numeric instead of character?  Do you still need the quotes then? Or do they cause trouble?  An example of where they can cause trouble is if you are using PROC IMPORT to read the file.  SAS will (mistakenly) convert any field that has quotes around it on every line into a character string.  Even if it just has quotes because the way the numbers are displayed they include commas. So adding unneeded quotes can have negative side effects.

 

PS: Please learn how to use the forum editor.  Use the Insert Code icon to insert text.  Use the Insert SAS Code icon to insert SAS code.  Use the Insert Photos icon to insert photographs.

View solution in original post

17 REPLIES 17
Tom
Super User Tom
Super User

If you just want to read it in and write it out then don't bother trying to interpret it. Just copy it.

 

data _null_;
  infile original dlmstr='||' length=ll column=cc truncover ;
  file new dlmstr='|""|' ;
  do until(cc>ll);
    input value :$32767. @;
    put value @;
  end;
  put;
run;
Abelp9
Quartz | Level 8
what does it means cc>11? can you explain me the code please?? thank you very much
Tom
Super User Tom
Super User

Check the INFILE statement.

LL is the variable that SAS is setting to the LENGTH of the current line in the file being read.

CC is the variable that SAS is setting to the COLUMN that will next be read from the current line.

 

So if CC is larger than LL you have read past the end of the line.

Abelp9
Quartz | Level 8
no consigo visualizar lo que me dices :(...
¿Dónde podría estar mi fallo? parece que no esta leyendo la linea entera?
Tom
Super User Tom
Super User

@Abelp9 wrote:
no consigo visualizar lo que me dices :(...
¿Dónde podría estar mi fallo? parece que no esta leyendo la linea entera?

I can't visualize what you say :(...
Where could my fault be? It seems that you are not reading the entire line?

Exactly.  It is reading and writing each field not each line.  The DO loop make sure it reads all of the line and stops when there is nothing left.

This solves two problems.

1) There is a limit of 32.767 bytes that can be used for a character variable.  But a line of a file could have millions of bytes.

2) The DSD option will handle reading lines where the values contain the delimiters (as long as those values are properly quoted).  

 

Abelp9
Quartz | Level 8

It's almost perfect, thank you very much, the only problem I have is that for example:

In the original csv the first row would be like this:

 


"Hello"|"what"||"hello"||||

And I want it like this:
"Hello"|"what"|""|"hello"|""|""|""|

And with the solution that you has given to me, it returns the following:
"Hello"|"what"|""|"hello"|""|

It eliminates the last 2 fields, which I want to simply be empty, but they are, I have a total of 11 fields.

Kurt_Bremser
Super User

So your delimiter is in fact not

|""|

but a single pipe (|) character, and your fields are enclosed in quotes.

See this for starters:

filename out temp;

data _null_;
infile datalines dlm="|" truncover;
file out dlm="|";
input (text1-text8) (:$10.);
array text {*} text1-text8;
do i = 1 to 8;
  if text{i} = '' then text{i} = '""';
end;
put text1-text8;
datalines;
"Hello"|"what"||"hello"||||
;

data _null_;
infile out;
input;
put _infile_;
run;
Tom
Super User Tom
Super User

That sounds different than your original explanation. 

That input file appears to be using | as a delimter and not || as a delimiter string.  Let's try telling SAS that.

options parmcards=original;
filename original temp;
filename new temp;
parmcards4;
"Hello"|"what"||"hello"||||
;;;;

data _null_;
  infile original dlm='|' dsd length=ll column=cc truncover ;
  file new dlmstr='|""|' ;
  do until(cc>ll);
    input value :$32767. @;
    put value @;
  end;
  put;
run;

Results;

259   data _null_;
260     infile new;
261     input;
262     put _infile_;
263   run;

NOTE: The infile NEW is:
      Filename=...\#LN00067,
      RECFM=V,LRECL=32767,File Size (bytes)=44,
      Last Modified=08Feb2022:12:12:12,
      Create Time=08Feb2022:12:12:12

Hello|""|what|""| |""|hello|""| |""| |""|
NOTE: 1 record was read from the infile NEW.
      The minimum record length was 42.
      The maximum record length was 42.

Your output example is also a problem.  If you use DMLSTR then you will not automatically have missing values not generate any text into the file.  That is because you cannot use the DSD option with a DLMSTR value that includes quotes.  That is because the DSD option will add quotes around values that contain the delimiter so it garfs if try to use a quote as part of the delimiter string.  (Note the SAS error message mistakenly will mention DLM instead of DLMSTR.)

 

If you really did need to not write anything for missing values you might need to add more logic to handle that.

If you really want the around the values written (why?) then you might want to use the ~ modifier?

 

If the lines are short enough and there are no | characters in the values of any of the fields it might be easier to just read the whole line and modify it yourself and re-write it.

options parmcards=original;
filename original temp;
filename new temp;
parmcards4;
"Hello"|"what"||"hello"||||
1|2|3|4|5|6|7|8
;;;;

data _null_;
  infile original truncover ;
  file new ;
  input line $char32767.;
  line=tranwrd(line,'|','|""|');
  len=lengthn(line);
  put line $varying32767. len;
run;

data _null_;
  infile new;
  input;
  put _infile_;
run;

Results

"Hello"|""|"what"|""||""|"hello"|""||""||""||""|
1|""|2|""|3|""|4|""|5|""|6|""|7|""|8
Abelp9
Quartz | Level 8

your result is: "Hello"|""|"what"|""||""|"hello"|""||""||""||""|
1|""|2|""|3|""|4|""|5|""|6|""|7|""|8

 

and I would need:
"Hello"|""|"what"|""|""|""|"hello"|""|""|""|""|""|""|""|
1|""|2|""|3|""|4|""|5|""|6|""|7|""|8

 

there can be no slashes without quotes

thank you

Tom
Super User Tom
Super User

@Abelp9 wrote:

It's almost perfect, thank you very much, the only problem I have is that for example:

In the original csv the first row would be like this:

 


"Hello"|"what"||"hello"||||

And I want it like this:
"Hello"|"what"|""|"hello"|""|""|""|

And with the solution that you has given to me, it returns the following:
"Hello"|"what"|""|"hello"|""|

It eliminates the last 2 fields, which I want to simply be empty, but they are, I have a total of 11 fields.


As @Kurt_Bremser has posted it looks like your original file is just using | as the delimiter. But it has some unneeded extra quotes around the values.   The more natural way to represent those values in a | delimited file would be as:

Hello|what||hello||||

Since none of the three non-empty values contain either the delimiter | nor any quote characters.

 

Why do you feel a need to add even more quotes to the file?

Is it not working for some purpose?  What purpose?

Are you trying to read it with some other software?  What software?  Why does it need extra quotes?

 

Here is a modified data step that uses ~ modifier to preserve any existing quotes in the fields and also replaces empty fields with "".

data _null_;
  infile original dlm='|' dsd length=ll column=cc truncover ;
  file new dlm='|' ;
  do until(cc>ll);
    input value ~ :$char32767. @;
    if missing(value) then value ='""';
    put value :$char32767. @;
  end;
  put;
run;
Abelp9
Quartz | Level 8

I think you are not understanding me well or I am not explaining myself well, my delimiter is not "|", it is "||". They send me csv files that in the cells that are empty they put them as ||, instead of |""|. So I always have to open notepad and replace || by |""|.

What I want to do is automate this, and create a program that directly takes these csvs and transforms them by changing the || by |""|.

The solution that they gave me at the beginning worked for me, the only problem is that I was missing variables

Source CSV:
|"Hello"|"what"|"hello"|||| (6 variables)

CSV I want:
|"Hello"|"what"|"hello"|""|""|""| (6 variables with "" between the bars)

Thank you very much for your help, and I'm sorry if I'm explaining myself wrong, I don't have much experience in SAS

Kurt_Bremser
Super User

You contradict yourself:

First you say

my delimiter is not "|", it is "||"

but then you say

|"Hello"|"what"|"hello"|||| (6 variables)

If you had double pipes as delimiters, you would have three variables and not 6.

And keep in mind that those quotes are around the values, they are not part of the values, so they are not needed for missing values.

 

If you really think you need quotes as part of value, explain why.

Abelp9
Quartz | Level 8

Excuse me, I'm new to SAS programming and I'm still explaining myself wrong, I'm going to share screenshots of what they give me and how I want it.

I receive the csv in this way (it is opened with notepad):

https://ibb.co/202PwDM

 

and then with notepad I do search < replace < and replace || by |""|

this way I get this final csv (this is what i want):

https://ibb.co/PZ3KT7x

 

This is how I want the csv, and I want to do this process of importing the csv, transforming it and exporting it in SAS,

with this code that you have kindly provided me:

data _null_;
infile "xxx/xxx.csv" dlm='|' dsd length=ll column=cc truncover ;

file "xxx/xxx.csv" dlm='|' ;
do until(cc>ll);
input value ~ :$char32767. @;
if missing(value) then value ='""';
put value :$char32767. @;
end;
put;
run;


i got this:

https://ibb.co/ZK4KHTj

 

but I am missing the last bar on the right (look at the csv I want), why is this happening? how can i fix it to get the csv i want (exactly like this but with a trailing slash too)

Thank you very much for all the help you give me

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 2800 views
  • 11 likes
  • 3 in conversation