BookmarkSubscribeRSS Feed
viola
Obsidian | Level 7

I am having trouble importing a CSV file using the DATA step where some of the column names have spaces. I have specified that a comma is the delimiter, but SAS is still breaking up the variable names into multiple columns. I am aware that using PROC IMPORT would solve this problem, but for various reasons, is not an option in my case. What am I missing? 

 

data &&sasname&i;
	infile "&&path&i" delimiter="," dsd firstobs=2 truncover;
	input &varlist; 
run;

My CSV looks like the following:

Visit Repeat Key,Form Repeat Key,Item Group Repeat Key,Scheduled Date,Completed Date
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T18:11:17.000Z
1,1,,2018-08-02T07:00:00.000Z,2018-08-02T23:14:02.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:11:16.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:00:21.000Z
1,1,,2018-08-21T07:00:00.000Z,
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T19:59:46.000Z
1,1,,2018-08-02T07:00:00.000Z,2018-08-02T16:58:20.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:12:31.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:53:10.000Z

 

11 REPLIES 11
Reeza
Super User

If you're using a data step you would specify the variable list in your INPUT statement. 

If you're trying to do this dynamically you could either consider NLITERAL() function or somehow add an underscore before hand. 

 

How is &varlist being created? 

 


@viola wrote:

I am having trouble importing a CSV file using the DATA step where some of the column names have spaces. I have specified that a comma is the delimiter, but SAS is still breaking up the variable names into multiple columns. I am aware that using PROC IMPORT would solve this problem, but for various reasons, is not an option in my case. What am I missing? 

 

data &&sasname&i;
	infile "&&path&i" delimiter="," dsd firstobs=2 truncover;
	input &varlist; 
run;

My CSV looks like the following:

Visit Repeat Key,Form Repeat Key,Item Group Repeat Key,Scheduled Date,Completed Date
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T18:11:17.000Z
1,1,,2018-08-02T07:00:00.000Z,2018-08-02T23:14:02.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:11:16.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:00:21.000Z
1,1,,2018-08-21T07:00:00.000Z,
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T19:59:46.000Z
1,1,,2018-08-02T07:00:00.000Z,2018-08-02T16:58:20.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:12:31.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:53:10.000Z

 


 

 

viola
Obsidian | Level 7
The varlist is being created in a previous data _null_ step using the header row. Code does need to be dynamic.

data _null_;
infile "&&path&i" obs=1 delimiter=",";
input;
call symputx('varlist',tranwrd(compress(_infile_,'"'),',',' '));
run;
Reeza
Super User

You can either compress out the spaces in the variable names, which is probably a good idea, Or you can use NLITERAL like I mentioned to wrap the variable with quotes and add the N at the end to give you a valid SAS name. 

 

If you're compressing them out, you likely want to use COUNTW to count the number of commas which will indicate the number of variables. Then use SCAN() to parse it out and apply the COMPRESS() or NLITERAL function as desired.

 


@viola wrote:
The varlist is being created in a previous data _null_ step using the header row. Code does need to be dynamic.

data _null_;
infile "&&path&i" obs=1 delimiter=",";
input;
call symputx('varlist',tranwrd(compress(_infile_,'"'),',',' '));
run;

 

 

viola
Obsidian | Level 7
@Tom, in this scenario, this creates the varlist - but then when I go to read the file in, how do I make it so that the header row is read in again with no spaces?
Tom
Super User Tom
Super User

@viola wrote:
@Tom, in this scenario, this creates the varlist - but then when I go to read the file in, how do I make it so that the header row is read in again with no spaces?

I don't understand the question. The code you posted was using FIRSTOBS=2 to skip the header row.  So it doesn't matter whether the header row has spaces or not since it is being skipped.

viola
Obsidian | Level 7
Please excuse my brain...trying to think through this on a Friday afternoon was not happening. You are right, of course.
Tom
Super User Tom
Super User

SAS variable names must be 32 characters or less and can use only letter, digits and underscores. They cannot start with a number.

You could try just turning the spaces into underscores.

data _null_;
  infile "&&path&i" obs=1 ;
  input;
  call symputx('varlist',translate(trim(_infile_),' _',', '));
run;

If you set the option VALIDVARNAME=ANY then you can have spaces and other characters, but then you need to use name literals.

data _null_;
 infile "&&path&i" dsd obs=1 eof=save ;
 length varlist $30000 name $32 ;
 input name @@ ;
 varlist =catx(' ',varlist,nliteral(name));
 retain varlist;
return;
save:
call symputx('varlist',varlist);
run;
Ksharp
Super User

Why not ?

 

options validvarname=any;

proc import datafile=..............

viola
Obsidian | Level 7
@Ksharp, not an option here. It would take me too long to explain why but I need to use a data step (see my original post).
ballardw
Super User

@viola wrote:
@Ksharp, not an option here. It would take me too long to explain why but I need to use a data step (see my original post).

I routinely use Proc import to build a skeleton of a data step to read files. When you use proc import for a delimited file then the LOG will contain the data step code generated by SAS to read the file. Best is make sure you use a large value for the GUESSINGROWS option to get reasonable results.

 

Copy that code from the log into the editor and go to town.

Search and replace variable names as needed, adjust lengths, add labels.

If you remove the generated statements such as

303   /**********************************************************************
304   *   PRODUCT:   SAS
305   *   VERSION:   9.4
306   *   CREATOR:   External File Interface
307   *   DATE:      31AUG18
308   *   DESC:      Generated SAS Datastep Code
309   *   TEMPLATE SOURCE:  (None Specified.)
310   ***********************************************************************/

and the ones referenciging _EFFIERR_ no one else may ever know that proc import was used.

 

 

This will generate variable names with _ instead of imbedded spaces or special characters or leading numeric characters if your headers have such. If you have variables that have the same column header for more than 32 characters you may get some VARx.

If you have multiple column heading with the same text that are shorter such as "TOTAL" you will get different variables such as TOTAL2 or TOTAL127 depending on a number of factors I've never spent a lot of time worrying about. I generally make more descriptive names for the such as "Total_income" Total_hours" or what ever.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 7861 views
  • 6 likes
  • 6 in conversation