BookmarkSubscribeRSS Feed
Amalik
Calcite | Level 5

Hi,

 

When I am importing a CSV file, it is changing some of the variables name from say e.g 01/01/00 to var1. Please note that my variables are all dates. Although some of the variables names are imported as _01/01/00 which is fine but the rest as var xyz. Any suggestions on how to handle this issue?

8 REPLIES 8
Amir
PROC Star

Hi,

 

Please share the code you are using for importing the CSV and a representative sample of the data, including the first record.

 

Regards,

Amir.

Amalik
Calcite | Level 5

Please find below,

proc import datafile="C:\Users\amalik\Desktop\dailydata\HAVE.csv"
        out=have1
        dbms=csv
        replace;
		GETNAMES=YES;
		run;
;Untitled.png
ballardw
Super User

@Amalik wrote:

Please find below,

proc import datafile="C:\Users\amalik\Desktop\dailydata\HAVE.csv"
        out=have1
        dbms=csv
        replace;
		GETNAMES=YES;
		run;
;Untitled.png

One column per date is likely going to present other difficulties later in processing this data.

Also getting variable names like VAR3634 indicates that you have thousands of variables. While SAS will handle such it may well be worth considering what kind of code you will have to write to handle thousands of variable names.

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

How are you importing the csv file?  I guess you are using proc import yes.  That is not the best idea as proc import is a guessing procedure.  Now 01/01/00 is not a valid variable name hence why sas adds a underscore to the name.  However, it is likely not guessing all your variables correctly.  To handle this properly (well, I will get onto "properly" shortly) take the code that is generated - you will find it in the log after running the proc import, then put this in a code window and change the names to what they should be.  Then run that datastep import.  Datastep import allows full control over the file coming in, how the data is to be read, and what the output should look like - this is what proc import guesses at, but as you know your data better, you should code this.

 

Now, lets get onto to "properly" importing the data.  Having variable names like 01/01/00 is a really bad idea for a number of reasons.  Firstly it is not easily readable by software - which is your issue here.  Secondly it is very hard to write good code which can process a non-fixed datastructure (i.e. more dates could be added, or removed etc.).  It is also not good to put partial data anywhere, 01/01/00, could mean any number of things - dates are an abstract concept.  Finally, data as columns will really cause you headaches.  How will you program with these possibly hundreds or thousands of columns?  Yes, you will write endless macro code to try to work out variable names and loop over these, wasting your time, and storage/processing time.  A simple change to the data from:

... 01/0/1/00  02/0/1/00  03/01/00...

... a               b               c...

...

 

To:

...   Date_var    result

...   01Jan2000 a

...   02Jan2000 b

...   03Jan2000 c

...

Will result in  a far simpler import, far simpler programming to work with it, and most likely will be a smaller file to work with.

That is my suggestion on "how to handle this issue".  In fact it is my suggestion whenever talking about data from Excel.

Ksharp
Super User

add system option before running PROC IMPORT.

 

options validvarname=any ;

proc import ..........

 

and refer to these special variable name via '01/12/2018'n  

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Doesn't that just ignore the initial problem, and then increase the future coding even more - i.e. having to refer to '01/12/18'n each time, rather than a proper SAS name?  Just thinking, no lists or shortcuts then, possible unmatched quotes, impossible to read code etc.

Ksharp
Super User

Yeah. You are right . But you are standing on English language side .

If your mother language is not English, like me (from China), make a variable name as a Chinese name that would make code more readable due to most Chinese can barely read English .

Kurt_Bremser
Super User

@Amalik wrote:

Hi,

 

When I am importing a CSV file, it is changing some of the variables name from say e.g 01/01/00 to var1. Please note that my variables are all dates. Although some of the variables names are imported as _01/01/00 which is fine but the rest as var xyz. Any suggestions on how to handle this issue?


That's because 01/01/00 is not a a valid SAS name. Read the SAS documentation about that (google "SAS valid names").

Keeping data (dates) in structure (variable names) is to be avoided. After import, transpose your dataset and use the _name_ field for converting into valid SAS dates:

data have;
input '_11/27/09'n '_11/28/09'n '_11/29/09'n;
cards;
-2.17 0 0
-2.26 0 0
-2.64 0 0
;
run;

proc transpose data=have out=int;
var _numeric_;
run;

data want;
format date yymmddd10.;
set int;
date = input(substr(_name_,2),mmddyy9.);
drop _name_;
run;

Add by-group processing to the transpose step to create separate rows instead of COL1-Coln.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 4174 views
  • 5 likes
  • 6 in conversation