I am having trouble importing a CSV file using the DATA step where some of the column names have spaces. I have specified that a comma is the delimiter, but SAS is still breaking up the variable names into multiple columns. I am aware that using PROC IMPORT would solve this problem, but for various reasons, is not an option in my case. What am I missing?
data &&sasname&i;
infile "&&path&i" delimiter="," dsd firstobs=2 truncover;
input &varlist;
run;
My CSV looks like the following:
Visit Repeat Key,Form Repeat Key,Item Group Repeat Key,Scheduled Date,Completed Date
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T18:11:17.000Z
1,1,,2018-08-02T07:00:00.000Z,2018-08-02T23:14:02.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:11:16.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:00:21.000Z
1,1,,2018-08-21T07:00:00.000Z,
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T19:59:46.000Z
1,1,,2018-08-02T07:00:00.000Z,2018-08-02T16:58:20.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:12:31.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:53:10.000Z
If you're using a data step you would specify the variable list in your INPUT statement.
If you're trying to do this dynamically you could either consider NLITERAL() function or somehow add an underscore before hand.
How is &varlist being created?
@viola wrote:
I am having trouble importing a CSV file using the DATA step where some of the column names have spaces. I have specified that a comma is the delimiter, but SAS is still breaking up the variable names into multiple columns. I am aware that using PROC IMPORT would solve this problem, but for various reasons, is not an option in my case. What am I missing?
data &&sasname&i; infile "&&path&i" delimiter="," dsd firstobs=2 truncover; input &varlist; run;
My CSV looks like the following:
Visit Repeat Key,Form Repeat Key,Item Group Repeat Key,Scheduled Date,Completed Date
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T18:11:17.000Z
1,1,,2018-08-02T07:00:00.000Z,2018-08-02T23:14:02.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:11:16.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:00:21.000Z
1,1,,2018-08-21T07:00:00.000Z,
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T19:59:46.000Z
1,1,,2018-08-02T07:00:00.000Z,2018-08-02T16:58:20.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:12:31.000Z
1,1,,2018-08-01T07:00:00.000Z,2018-08-02T17:53:10.000Z
You can either compress out the spaces in the variable names, which is probably a good idea, Or you can use NLITERAL like I mentioned to wrap the variable with quotes and add the N at the end to give you a valid SAS name.
If you're compressing them out, you likely want to use COUNTW to count the number of commas which will indicate the number of variables. Then use SCAN() to parse it out and apply the COMPRESS() or NLITERAL function as desired.
@viola wrote:
The varlist is being created in a previous data _null_ step using the header row. Code does need to be dynamic.
data _null_;
infile "&&path&i" obs=1 delimiter=",";
input;
call symputx('varlist',tranwrd(compress(_infile_,'"'),',',' '));
run;
@viola wrote:
@Tom, in this scenario, this creates the varlist - but then when I go to read the file in, how do I make it so that the header row is read in again with no spaces?
I don't understand the question. The code you posted was using FIRSTOBS=2 to skip the header row. So it doesn't matter whether the header row has spaces or not since it is being skipped.
SAS variable names must be 32 characters or less and can use only letter, digits and underscores. They cannot start with a number.
You could try just turning the spaces into underscores.
data _null_;
infile "&&path&i" obs=1 ;
input;
call symputx('varlist',translate(trim(_infile_),' _',', '));
run;
If you set the option VALIDVARNAME=ANY then you can have spaces and other characters, but then you need to use name literals.
data _null_;
infile "&&path&i" dsd obs=1 eof=save ;
length varlist $30000 name $32 ;
input name @@ ;
varlist =catx(' ',varlist,nliteral(name));
retain varlist;
return;
save:
call symputx('varlist',varlist);
run;
I strongly recommend @Tom's suggestion #1, as working without named literals is much easier in future code based on theis dataset.
Why not ?
options validvarname=any;
proc import datafile=..............
@viola wrote:
@Ksharp, not an option here. It would take me too long to explain why but I need to use a data step (see my original post).
I routinely use Proc import to build a skeleton of a data step to read files. When you use proc import for a delimited file then the LOG will contain the data step code generated by SAS to read the file. Best is make sure you use a large value for the GUESSINGROWS option to get reasonable results.
Copy that code from the log into the editor and go to town.
Search and replace variable names as needed, adjust lengths, add labels.
If you remove the generated statements such as
303 /********************************************************************** 304 * PRODUCT: SAS 305 * VERSION: 9.4 306 * CREATOR: External File Interface 307 * DATE: 31AUG18 308 * DESC: Generated SAS Datastep Code 309 * TEMPLATE SOURCE: (None Specified.) 310 ***********************************************************************/
and the ones referenciging _EFFIERR_ no one else may ever know that proc import was used.
This will generate variable names with _ instead of imbedded spaces or special characters or leading numeric characters if your headers have such. If you have variables that have the same column header for more than 32 characters you may get some VARx.
If you have multiple column heading with the same text that are shorter such as "TOTAL" you will get different variables such as TOTAL2 or TOTAL127 depending on a number of factors I've never spent a lot of time worrying about. I generally make more descriptive names for the such as "Total_income" Total_hours" or what ever.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.