Hi,
When I am importing a CSV file, it is changing some of the variables name from say e.g 01/01/00 to var1. Please note that my variables are all dates. Although some of the variables names are imported as _01/01/00 which is fine but the rest as var xyz. Any suggestions on how to handle this issue?
Hi,
Please share the code you are using for importing the CSV and a representative sample of the data, including the first record.
Regards,
Amir.
Please find below,
proc import datafile="C:\Users\amalik\Desktop\dailydata\HAVE.csv" out=have1 dbms=csv replace; GETNAMES=YES; run;
;
@Amalik wrote:
Please find below,
proc import datafile="C:\Users\amalik\Desktop\dailydata\HAVE.csv" out=have1 dbms=csv replace; GETNAMES=YES; run;
;
One column per date is likely going to present other difficulties later in processing this data.
Also getting variable names like VAR3634 indicates that you have thousands of variables. While SAS will handle such it may well be worth considering what kind of code you will have to write to handle thousands of variable names.
How are you importing the csv file? I guess you are using proc import yes. That is not the best idea as proc import is a guessing procedure. Now 01/01/00 is not a valid variable name hence why sas adds a underscore to the name. However, it is likely not guessing all your variables correctly. To handle this properly (well, I will get onto "properly" shortly) take the code that is generated - you will find it in the log after running the proc import, then put this in a code window and change the names to what they should be. Then run that datastep import. Datastep import allows full control over the file coming in, how the data is to be read, and what the output should look like - this is what proc import guesses at, but as you know your data better, you should code this.
Now, lets get onto to "properly" importing the data. Having variable names like 01/01/00 is a really bad idea for a number of reasons. Firstly it is not easily readable by software - which is your issue here. Secondly it is very hard to write good code which can process a non-fixed datastructure (i.e. more dates could be added, or removed etc.). It is also not good to put partial data anywhere, 01/01/00, could mean any number of things - dates are an abstract concept. Finally, data as columns will really cause you headaches. How will you program with these possibly hundreds or thousands of columns? Yes, you will write endless macro code to try to work out variable names and loop over these, wasting your time, and storage/processing time. A simple change to the data from:
... 01/0/1/00 02/0/1/00 03/01/00...
... a b c...
...
To:
... Date_var result
... 01Jan2000 a
... 02Jan2000 b
... 03Jan2000 c
...
Will result in a far simpler import, far simpler programming to work with it, and most likely will be a smaller file to work with.
That is my suggestion on "how to handle this issue". In fact it is my suggestion whenever talking about data from Excel.
add system option before running PROC IMPORT.
options validvarname=any ;
proc import ..........
and refer to these special variable name via '01/12/2018'n
Doesn't that just ignore the initial problem, and then increase the future coding even more - i.e. having to refer to '01/12/18'n each time, rather than a proper SAS name? Just thinking, no lists or shortcuts then, possible unmatched quotes, impossible to read code etc.
Yeah. You are right . But you are standing on English language side .
If your mother language is not English, like me (from China), make a variable name as a Chinese name that would make code more readable due to most Chinese can barely read English .
Thanks, this also solves how to automatically process dynamic data sets where the owners may add and drop variables. Added code to automatically clean up into a usable variable name.
options validvarname=any;
proc import datafile = 'Participants.xlsx'
out = participants
dbms = xlsx
replace;
getnames=YES;
run;
proc contents data=participants out=contents(keep=name) noprint; run;
data new_names;
set contents;
length name_new $32.;
lead_val=rank(substr(strip(name),1,1));
if ((lead_val>=65 & lead_val<=90) | (lead_val>=97 & lead_val<=122)) then lead_ok=1;
name_new=prxchange("s/[^a-zA-Z0-9]/_/", -1, strip(name));
if (missing(lead_ok)) then name_new="_"||name_new;
name_new=tranwrd(name_new,"___","_");
name_new=tranwrd(name_new,"__","_");
run;
proc sql noprint;
select nliteral(name)||"="||name_new
into :renam_list separated by ' '
from new_names;
quit;
data participants_new;
set participants;
rename &renam_list;
run;
@Amalik wrote:
Hi,
When I am importing a CSV file, it is changing some of the variables name from say e.g 01/01/00 to var1. Please note that my variables are all dates. Although some of the variables names are imported as _01/01/00 which is fine but the rest as var xyz. Any suggestions on how to handle this issue?
That's because 01/01/00 is not a a valid SAS name. Read the SAS documentation about that (google "SAS valid names").
Keeping data (dates) in structure (variable names) is to be avoided. After import, transpose your dataset and use the _name_ field for converting into valid SAS dates:
data have;
input '_11/27/09'n '_11/28/09'n '_11/29/09'n;
cards;
-2.17 0 0
-2.26 0 0
-2.64 0 0
;
run;
proc transpose data=have out=int;
var _numeric_;
run;
data want;
format date yymmddd10.;
set int;
date = input(substr(_name_,2),mmddyy9.);
drop _name_;
run;
Add by-group processing to the transpose step to create separate rows instead of COL1-Coln.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.