<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS excluding quotation marks from raw data in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/SAS-excluding-quotation-marks-from-raw-data/m-p/895989#M39855</link>
    <description>&lt;P&gt;Looks like SAS is doing exactly what you asked. And you asked it to do the right thing.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since the first place you reference&amp;nbsp;Census_tract is in the INPUT statement SAS will use the WIDTH of the INFORMAT used in the INPUT to help it GUESS what LENGTH to define the variable.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since you use the DSD option in the INFILE statement the quotes that had to be added around the value&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;Census Tract 9790, Fremont County, Colorado&lt;/PRE&gt;
&lt;P&gt;so that the line can be properly parsed (because the value contains the delimiter character) will be removed so they do not accidentally become part of the value.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The LENGTH() function returns the length of the string excluding the trailing spaces that are added to pad the value to the full storage length.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 27 Sep 2023 03:58:13 GMT</pubDate>
    <dc:creator>Tom</dc:creator>
    <dc:date>2023-09-27T03:58:13Z</dc:date>
    <item>
      <title>SAS excluding quotation marks from raw data</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-excluding-quotation-marks-from-raw-data/m-p/895978#M39853</link>
      <description>&lt;P&gt;I have a dataset I am working on for a class project in which I am looking for a character variable that is 50 in length. Here is the code:&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;DATA	CoImpt.Race_Ethnicity;
	INFILE	"&amp;amp;CourseRoot/CDPHE Study/Data/1_Source/ct_race_ethn.csv" FIRSTOBS=2 DSD MISSOVER;
	INPUT	Census_tract			:$50.
			Population_density
			N_White
			Pctn_White
			N_Africanamerican
			Pctn_Africanamerican
			N_AIAN
			Pctn_AIAN
			N_Asian
			Pctn_Asian
			N_NH_OPI
			Pctn_NH_OPI
			N_Other
			Pctn_Other
			N_Hispanic_Latino
			Pctn_Hispanic_Latino
			N_NHL
			Pctn_NHL
			a
			b					;
			
	PROC CONTENTS DATA=CoImpt.Race_Ethnicity;
	RUN;&lt;/PRE&gt;&lt;P&gt;When I run proc contents, I only get 48 length. My data looks like this:&lt;/P&gt;&lt;P&gt;"Census Tract 9790, Fremont County, Colorado",3.4,3366.00,97.00,4.00,0.10,19.00,0.50,115.00,3.30,0.00,0.00,50.00,1.40,157.00,4.50,3176.00,91.50,295.00,8.50&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And the quotation marks do not show up in the output data. Is SAS excluding them? How do I get SAS to include them? End goal is variable census_tract with a length of 50. Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Sep 2023 23:44:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-excluding-quotation-marks-from-raw-data/m-p/895978#M39853</guid>
      <dc:creator>BLT2023</dc:creator>
      <dc:date>2023-09-26T23:44:21Z</dc:date>
    </item>
    <item>
      <title>Re: SAS excluding quotation marks from raw data</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-excluding-quotation-marks-from-raw-data/m-p/895982#M39854</link>
      <description>&lt;P&gt;First thing: End your data step with a RUN; statement.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your code should create the variable with a length of 50. Are you sure you didn't have some error in your code and Proc Contents lists a table version that you created in an earlier run with different code?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below sample code based on what you've shared creates the variable with a length of 50. That SAS removes the quotes is imho a good thing as that's just how in a .csv strings get stored so that you can have a comma within such a string.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* create csv file under WORK path */
%let csv_file=%sysfunc(pathname(work))/ct_race_ethn.csv;
data _null_;
  file "&amp;amp;csv_file";
  infile datalines;
  input;
  put _infile_;
  datalines;
header header header
"Census Tract 9790, Fremont County, Colorado",3.4,3366.00,97.00,4.00,0.10,19.00,0.50,115.00,3.30,0.00,0.00,50.00,1.40,157.00,4.50,3176.00,91.50,295.00,8.50
;

/* read csv into SAS dataset */
data work.Race_Ethnicity;
  infile  "&amp;amp;csv_file" firstobs=2 dsd missover;
  input 
    Census_tract        :$50.
    Population_density
  ;
run;

proc contents data=work.race_ethnicity;
run;
proc print data=work.race_ethnicity;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Patrick_0-1695777400844.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/88363iDE304478A333FBBF/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Patrick_0-1695777400844.png" alt="Patrick_0-1695777400844.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Sep 2023 01:19:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-excluding-quotation-marks-from-raw-data/m-p/895982#M39854</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2023-09-27T01:19:53Z</dc:date>
    </item>
    <item>
      <title>Re: SAS excluding quotation marks from raw data</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-excluding-quotation-marks-from-raw-data/m-p/895989#M39855</link>
      <description>&lt;P&gt;Looks like SAS is doing exactly what you asked. And you asked it to do the right thing.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since the first place you reference&amp;nbsp;Census_tract is in the INPUT statement SAS will use the WIDTH of the INFORMAT used in the INPUT to help it GUESS what LENGTH to define the variable.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since you use the DSD option in the INFILE statement the quotes that had to be added around the value&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;Census Tract 9790, Fremont County, Colorado&lt;/PRE&gt;
&lt;P&gt;so that the line can be properly parsed (because the value contains the delimiter character) will be removed so they do not accidentally become part of the value.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The LENGTH() function returns the length of the string excluding the trailing spaces that are added to pad the value to the full storage length.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Sep 2023 03:58:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-excluding-quotation-marks-from-raw-data/m-p/895989#M39855</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2023-09-27T03:58:13Z</dc:date>
    </item>
    <item>
      <title>Re: SAS excluding quotation marks from raw data</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-excluding-quotation-marks-from-raw-data/m-p/895990#M39856</link>
      <description>&lt;P&gt;Why did you use $50. informat if you wanted the variable to defined as length $48?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Did you let PROC IMPORT guess how to read the file?&amp;nbsp; &amp;nbsp;PROC IMPORT will overestimate the length needed when any of the values have had quotes added to protect embedded delimiters or quotes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let's run an experiment so we can see the issue.&amp;nbsp; Let's make a CSV file and add quotes around one of the variables.&amp;nbsp; Then let PROC IMPORT try to GUESS how to read it.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;filename csv temp;
data _null_;
  file csv dsd ;
  set sashelp.class(obs=3);
  if _n_=1 then put 'Name,Age,Sex';
  put name ~ age sex;
run;

proc import file=csv out=test replace dbms=csv;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;So the CSV file look like this:&lt;/P&gt;
&lt;PRE&gt;RULE:     ----+----1----+----2----+----3----+----4-
1         Name,Age,Sex 12
2         "Alfred",14,M 13
3         "Alice",13,F 12
4         "Barbara",13,F 14
&lt;/PRE&gt;
&lt;P&gt;The longest value of Name is Barbara which is only 7 characters long.&amp;nbsp; But with the quotes it takes 9 bytes in the line of text.&lt;/P&gt;
&lt;P&gt;And the code generated by PROC IMPORT sets the length of the variable $9 instead of $7.&lt;/P&gt;
&lt;PRE&gt;153  proc import file=csv out=test replace dbms=csv;
154  run;

155   /**********************************************************************
156   *   PRODUCT:   SAS
157   *   VERSION:   9.4
158   *   CREATOR:   External File Interface
159   *   DATE:      26SEP23
160   *   DESC:      Generated SAS Datastep Code
161   *   TEMPLATE SOURCE:  (None Specified.)
162   ***********************************************************************/
163      data WORK.TEST    ;
164      %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
165      infile CSV delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
166         informat Name $9. ;
167         informat Age best32. ;
168         informat Sex $1. ;
169         format Name $9. ;
170         format Age best12. ;
171         format Sex $1. ;
172      input
173                  Name  $
174                  Age
175                  Sex  $
176      ;
177      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
178      run;

NOTE: The infile CSV is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 3 records were read from the infile (system-specific pathname).
      The minimum record length was 12.
      The maximum record length was 14.
NOTE: The data set WORK.TEST has 3 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

3 rows created in WORK.TEST from CSV.

NOTE: WORK.TEST data set was successfully created.
NOTE: The data set WORK.TEST has 3 observations and 3 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.05 seconds
      cpu time            0.03 seconds
&lt;/PRE&gt;
&lt;P&gt;If you want a tool that will not overestimate the length needed for character variable use &lt;A href="https://github.com/sasutils/macros/blob/master/csv2ds.sas" target="_self"&gt;%CSVDS()&lt;/A&gt; instead of PROC IMPORT.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;filename url url "https://raw.githubusercontent.com/sasutils/macros/master/csv2ds.sas";
%include url;
filename url url "https://raw.githubusercontent.com/sasutils/macros/master/parmv.sas";
%include url;
filename url;
%csv2ds(csv,replace=Y);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Generated code:&lt;/P&gt;
&lt;PRE&gt;NOTE: %INCLUDE (level 1) file _CODE_ is (system-specific pathname).
1029 +data fromcsv;
1030 +  infile CSV dlm=',' dsd truncover firstobs=2 ;
1031 +  length Name $7 Age 8 Sex $1 ;
1032 +  input Name -- Sex ;
1033 +run;

NOTE: The infile CSV is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 3 records were read from the infile (system-specific pathname).
      The minimum record length was 12.
      The maximum record length was 14.
NOTE: The data set WORK.FROMCSV has 3 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds
&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Sep 2023 04:13:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-excluding-quotation-marks-from-raw-data/m-p/895990#M39856</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2023-09-27T04:13:57Z</dc:date>
    </item>
    <item>
      <title>Re: SAS excluding quotation marks from raw data</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-excluding-quotation-marks-from-raw-data/m-p/895992#M39857</link>
      <description>&lt;P&gt;I closed SAS and reopened it, and the code ran as it was supposed to. Not exactly sure what happened, but thank you for guiding me in the right direction!&lt;/P&gt;</description>
      <pubDate>Wed, 27 Sep 2023 04:16:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-excluding-quotation-marks-from-raw-data/m-p/895992#M39857</guid>
      <dc:creator>BLT2023</dc:creator>
      <dc:date>2023-09-27T04:16:06Z</dc:date>
    </item>
    <item>
      <title>Re: SAS excluding quotation marks from raw data</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-excluding-quotation-marks-from-raw-data/m-p/896162#M39873</link>
      <description>&lt;P&gt;Is there a way to do this without using data lines? I have well over 1,000 lines of data.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Sep 2023 20:46:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-excluding-quotation-marks-from-raw-data/m-p/896162#M39873</guid>
      <dc:creator>BLT2023</dc:creator>
      <dc:date>2023-09-27T20:46:39Z</dc:date>
    </item>
  </channel>
</rss>

