BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
BLT2023
Calcite | Level 5

I have a dataset I am working on for a class project in which I am looking for a character variable that is 50 in length. Here is the code: 

DATA	CoImpt.Race_Ethnicity;
	INFILE	"&CourseRoot/CDPHE Study/Data/1_Source/ct_race_ethn.csv" FIRSTOBS=2 DSD MISSOVER;
	INPUT	Census_tract			:$50.
			Population_density
			N_White
			Pctn_White
			N_Africanamerican
			Pctn_Africanamerican
			N_AIAN
			Pctn_AIAN
			N_Asian
			Pctn_Asian
			N_NH_OPI
			Pctn_NH_OPI
			N_Other
			Pctn_Other
			N_Hispanic_Latino
			Pctn_Hispanic_Latino
			N_NHL
			Pctn_NHL
			a
			b					;
			
	PROC CONTENTS DATA=CoImpt.Race_Ethnicity;
	RUN;

When I run proc contents, I only get 48 length. My data looks like this:

"Census Tract 9790, Fremont County, Colorado",3.4,3366.00,97.00,4.00,0.10,19.00,0.50,115.00,3.30,0.00,0.00,50.00,1.40,157.00,4.50,3176.00,91.50,295.00,8.50

 

And the quotation marks do not show up in the output data. Is SAS excluding them? How do I get SAS to include them? End goal is variable census_tract with a length of 50. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

First thing: End your data step with a RUN; statement. 

Your code should create the variable with a length of 50. Are you sure you didn't have some error in your code and Proc Contents lists a table version that you created in an earlier run with different code?

 

Below sample code based on what you've shared creates the variable with a length of 50. That SAS removes the quotes is imho a good thing as that's just how in a .csv strings get stored so that you can have a comma within such a string.

/* create csv file under WORK path */
%let csv_file=%sysfunc(pathname(work))/ct_race_ethn.csv;
data _null_;
  file "&csv_file";
  infile datalines;
  input;
  put _infile_;
  datalines;
header header header
"Census Tract 9790, Fremont County, Colorado",3.4,3366.00,97.00,4.00,0.10,19.00,0.50,115.00,3.30,0.00,0.00,50.00,1.40,157.00,4.50,3176.00,91.50,295.00,8.50
;

/* read csv into SAS dataset */
data work.Race_Ethnicity;
  infile  "&csv_file" firstobs=2 dsd missover;
  input 
    Census_tract        :$50.
    Population_density
  ;
run;

proc contents data=work.race_ethnicity;
run;
proc print data=work.race_ethnicity;
run;

Patrick_0-1695777400844.png

 

View solution in original post

5 REPLIES 5
Patrick
Opal | Level 21

First thing: End your data step with a RUN; statement. 

Your code should create the variable with a length of 50. Are you sure you didn't have some error in your code and Proc Contents lists a table version that you created in an earlier run with different code?

 

Below sample code based on what you've shared creates the variable with a length of 50. That SAS removes the quotes is imho a good thing as that's just how in a .csv strings get stored so that you can have a comma within such a string.

/* create csv file under WORK path */
%let csv_file=%sysfunc(pathname(work))/ct_race_ethn.csv;
data _null_;
  file "&csv_file";
  infile datalines;
  input;
  put _infile_;
  datalines;
header header header
"Census Tract 9790, Fremont County, Colorado",3.4,3366.00,97.00,4.00,0.10,19.00,0.50,115.00,3.30,0.00,0.00,50.00,1.40,157.00,4.50,3176.00,91.50,295.00,8.50
;

/* read csv into SAS dataset */
data work.Race_Ethnicity;
  infile  "&csv_file" firstobs=2 dsd missover;
  input 
    Census_tract        :$50.
    Population_density
  ;
run;

proc contents data=work.race_ethnicity;
run;
proc print data=work.race_ethnicity;
run;

Patrick_0-1695777400844.png

 

BLT2023
Calcite | Level 5

I closed SAS and reopened it, and the code ran as it was supposed to. Not exactly sure what happened, but thank you for guiding me in the right direction!

BLT2023
Calcite | Level 5

Is there a way to do this without using data lines? I have well over 1,000 lines of data.

Tom
Super User Tom
Super User

Looks like SAS is doing exactly what you asked. And you asked it to do the right thing.

 

Since the first place you reference Census_tract is in the INPUT statement SAS will use the WIDTH of the INFORMAT used in the INPUT to help it GUESS what LENGTH to define the variable.

 

Since you use the DSD option in the INFILE statement the quotes that had to be added around the value 

Census Tract 9790, Fremont County, Colorado

so that the line can be properly parsed (because the value contains the delimiter character) will be removed so they do not accidentally become part of the value.

 

The LENGTH() function returns the length of the string excluding the trailing spaces that are added to pad the value to the full storage length.

 

 

Tom
Super User Tom
Super User

Why did you use $50. informat if you wanted the variable to defined as length $48?

 

Did you let PROC IMPORT guess how to read the file?   PROC IMPORT will overestimate the length needed when any of the values have had quotes added to protect embedded delimiters or quotes.

 

Let's run an experiment so we can see the issue.  Let's make a CSV file and add quotes around one of the variables.  Then let PROC IMPORT try to GUESS how to read it.

filename csv temp;
data _null_;
  file csv dsd ;
  set sashelp.class(obs=3);
  if _n_=1 then put 'Name,Age,Sex';
  put name ~ age sex;
run;

proc import file=csv out=test replace dbms=csv;
run;

So the CSV file look like this:

RULE:     ----+----1----+----2----+----3----+----4-
1         Name,Age,Sex 12
2         "Alfred",14,M 13
3         "Alice",13,F 12
4         "Barbara",13,F 14

The longest value of Name is Barbara which is only 7 characters long.  But with the quotes it takes 9 bytes in the line of text.

And the code generated by PROC IMPORT sets the length of the variable $9 instead of $7.

153  proc import file=csv out=test replace dbms=csv;
154  run;

155   /**********************************************************************
156   *   PRODUCT:   SAS
157   *   VERSION:   9.4
158   *   CREATOR:   External File Interface
159   *   DATE:      26SEP23
160   *   DESC:      Generated SAS Datastep Code
161   *   TEMPLATE SOURCE:  (None Specified.)
162   ***********************************************************************/
163      data WORK.TEST    ;
164      %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
165      infile CSV delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
166         informat Name $9. ;
167         informat Age best32. ;
168         informat Sex $1. ;
169         format Name $9. ;
170         format Age best12. ;
171         format Sex $1. ;
172      input
173                  Name  $
174                  Age
175                  Sex  $
176      ;
177      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
178      run;

NOTE: The infile CSV is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 3 records were read from the infile (system-specific pathname).
      The minimum record length was 12.
      The maximum record length was 14.
NOTE: The data set WORK.TEST has 3 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

3 rows created in WORK.TEST from CSV.

NOTE: WORK.TEST data set was successfully created.
NOTE: The data set WORK.TEST has 3 observations and 3 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.05 seconds
      cpu time            0.03 seconds

If you want a tool that will not overestimate the length needed for character variable use %CSVDS() instead of PROC IMPORT.

filename url url "https://raw.githubusercontent.com/sasutils/macros/master/csv2ds.sas";
%include url;
filename url url "https://raw.githubusercontent.com/sasutils/macros/master/parmv.sas";
%include url;
filename url;
%csv2ds(csv,replace=Y);

Generated code:

NOTE: %INCLUDE (level 1) file _CODE_ is (system-specific pathname).
1029 +data fromcsv;
1030 +  infile CSV dlm=',' dsd truncover firstobs=2 ;
1031 +  length Name $7 Age 8 Sex $1 ;
1032 +  input Name -- Sex ;
1033 +run;

NOTE: The infile CSV is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 3 records were read from the infile (system-specific pathname).
      The minimum record length was 12.
      The maximum record length was 14.
NOTE: The data set WORK.FROMCSV has 3 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 628 views
  • 1 like
  • 3 in conversation