Exporting character variables that contain quotation marks

Reply
Contributor
Posts: 38

Exporting character variables that contain quotation marks

Hi,

I have to read and write character variables that contain quotation marks. Reading isn't the problem, but writing is (please see example below).

1) To read the data "as it is" I had to put a tilde before the informat of the character variable (variable "text").

2) In order to avoid additional quotation marks when exporting the character variable the option "dsd" needed to be removed from the FILE Statement.

Unfortunately, the last measure caused missing numeric values to be written as a dot. How can this be avoided? Or is there a simpler way to read and write the character variables "as they are"?

Thanks!

Options Parmcards=FT15F001;

Filename FT15F001 temp;

Data abc;

    Infile FT15F001 Dsd Dlm=',' Missover;

    Input

        value1 :best1.

        text ~:$52.

        value2 :best1.;

    File "D:\Test.csv" /*Dsd*/ Dlm=',';

    Put

        value1

        text

        value2;

Parmcards;

1,abc " def,1

,"abc def,2

3,"abcd def"

4,abc def ",4

Run;

Super User
Super User
Posts: 6,500

Re: Exporting character variables that contain quotation marks

You can use OPTIONS MISSING=' ' to have the dots appear as spaces instead.

But to force put nothing between the commas would be harder.

You do realize that without treating the quotes as meaningful your file structure will have an even larger problem of what to do when the value contains the delimiter (comma).  What do you want the text file to contain when the value of variable FILE is 'my.sas,v' ?

Contributor
Posts: 38

Re: Exporting character variables that contain quotation marks

Hi Tom,

Of course, that's a valid point. I don't like the way my input data are organized either: Commas are actually only used to separate variables. Quotation marks in character strings indicate that the following element named in the character string is empty:

" [PL] 2[CFstat] "

=> not relevant to PL; of type 2 in CFstat; not relevant for anything else

I've talked to the people in charge and I'm allowed to change the character variables which are read like this from now on (I think, this should be fine):

Options Parmcards=FT15F001;

Filename FT15F001 temp;

Data abc;

    Infile FT15F001 Dsd Dlm=',' Missover;

    Input

        value1 :best1.

        text ~:$52.

        value2 :best1.;

    File "D:\Test.csv" Dsd Dlm=',';

    Put

        value1

        text

        value2;

Parmcards;

1,abc "" def,1

,"abc def,2

3,"abcd def"

4,abc def ",4

Run;

/*

    From now on, character variables are read like this:

*/

Data def;

    Infile "D:\Test.csv" Dsd Dlm=',' Missover;

    Input

        value1b :best1.

        textb :$52.

        value2b :best1.;

Run;

Super User
Super User
Posts: 6,500

Re: Exporting character variables that contain quotation marks

A couple of suggestions to simplify the code.

1) Don't use MISSOVER option, instead use the TRUNCOVER option. It does not matter in this case because of the way you have coded the input statement, but you will get a rude surprise in other cases. With MISSOVER if you try to read five characters and there are only three you get nothing. With TRUNCOVER you get the three characters that are there.

2) I find it is much clearer to define the variable lengths and then read them. It makes your input statement simpler. Plus it gives you more flexibility when the order in the text file is not the ideal variable order for your dataset.

data def;

   infile "D:\Test.csv" dsd dlm=',' truncover;

    length value1b 8 textb $52 value2b 8;

   input value1b textb value2b ;

run;


You could create the output you want with programming. Here is one way that is flexible.


  length varname $32 value $200 ;

  do varname='NAME','AGE','SEX','HEIGHT';

    value = left(vvaluex(varname));

    if value not in ('.',' ') then put value +(-1) @;

    if varname ne 'HEIGHT' then put ',' @;

  end;

Contributor
Posts: 38

Re: Exporting character variables that contain quotation marks

Thank you very much for going the extra mile and giving additional advice!

I'm very interested in learning how to use SAS properly, so any suggestions are highly appreciated.

Your example is clearly simpler, but in case of dates I'd probably still have to use informats (:date11., :time8....). Also, I'm curious why the LENGTH statement should be prefered to the FORMAT statement (you suggested this before).

The FORMAT statement seems to work fine when using character variables (at least for my purposes). And after having a look at two papers on the LENGTH statement (http://www.nesug.org/proceedings/nesug02/at/at002.pdf, http://support.sas.com/resources/papers/proceedings11/275-2011.pdf) it doesn't seem advisable to me to reduce the length in case of non-integer values. Therefore, I assume I should stick with the default value (except for date values allowing the length to be reduced to 4 bytes). Using the FORMAT statement also provides flexibility when arranging the variable order - plus scientific notations can be avoided when displaying large (or small) values.

Super User
Super User
Posts: 6,500

Re: Exporting character variables that contain quotation marks

The FORMAT statement is for associating special display methods you want to use with the variable.  It does NOT define the variables type or length, other than in the indirect way that any reference to a previously undefined variable would.  You can combine LENGTH/FORMAT/INFORMAT/LABEL statements into single ATTRIB statement if you prefer.

For date variables (or other variables that require them) if you have attached an informat (using an INFORMAT statement or ATTRIB statement) to tell SAS how to convert incoming text then your will not need to specify one in the INPUT statement.

Permanently attaching $xxx formats to character variables is useless and potentially dangerous.  It is useless because it does not give SAS any real information on how to display the variable that it doesn't already know.  It is dangerous when misused because it can cause truncation of values used in analysis procedures.  I have seen summary reports that under reported the distinct values of a variable because a 20 character variable had a format of $8. attached to it.

Contributor
Posts: 38

Re: Exporting character variables that contain quotation marks

Thank you for the in-depth explanation!

(Learning SAS by trial and error mostly is really not advisable. For example, using LENGTH after FORMAT results in a warning when they contradict each other - which led me to believe that FORMAT is more powerful and does not only attach formats but also lengths and therefore makes the LENGTH statement redundant.)

Ask a Question
Discussion stats
  • 6 replies
  • 372 views
  • 3 likes
  • 2 in conversation