BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ybz12003
Rhodochrosite | Level 12

Hello:

 

I imported CSV file into SAS 9.3.  I have 2311 columns,  I found out that SAS renamed my column names after column 1771.   For example, my previous column 1771 name is C1, it changed to var 1771.   Then it changed to var 1172, var 1773 etc.  How to keep my oringinal names?  Thanks.

 

proc import out=test

datafile="C:\My Documents\myfiles\test.csv"

dbms=csv replace;

getnames=yes;

datarow=2;

run;

quit;

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Since I have to deal with some data sources that will change file layouts without documentation occasionally I have a few processes that may help. The first does not require SAS (Gasp!!).

Open the first CSV using a spreadsheet program. Copy the cells of the header row and paste-transpose into a COLUMN in a spreadsheet.

Do the same with the second header row pasting into another column in the same spreadsheet. The columns are for readibility, you only have to make 2 rows wide enough to display 50 or so characters instead of 2000+. In a third column put a formula to see if the other two columns have the same value.

Unfortunately this may only find the first mismatch if a variable has been added, deleted or moved. Sometimes you can see that a minor alignment works.

 

Why do I go through this nonsense? I 1) identify "new" names to ask the data source about 2) identify "removed" values to ask about and 3) can modify the datastep I program I saved from reading the first CSV file to account for those changes. This is important if I need to combine the data. I am only changing a few items instead of writing from scratch.

 

Something else to consider is look for common terms in your csv header. Your snippet of column heading looks entirely too much like an entire question such as "if lngth cm greater than 75". If you have 500 variables that all start with "If_" meaning that the header had "if " (there is a space there) then the "if" bit may not actually provide much information. So in the csv search and replace "if " with nothing. Other repeated strings may be replaced by a standardized shorter version, such as "Ln" instead of "lngth".

For example I have one data source that had something like 40 fields with "Primary caregiver" and another 40 with "Secondary Caregiver". Replacing those with PCG and SCG shortened a bunch of variable names.

View solution in original post

14 REPLIES 14
PaigeMiller
Diamond | Level 26

Can you show us a screen capture of the appropriate part of the CSV file, showing columns 1770, 1771, etc.?

--
Paige Miller
ballardw
Super User

@ybz12003 wrote:

Hello:

 

I imported CSV file into SAS 9.3.  I have 2311 columns,  I found out that SAS renamed my column names after column 1771.   For example, my previous column 1771 name is C1, it changed to var 1771.   Then it changed to var 1172, var 1773 etc.  How to keep my oringinal names?  Thanks.

 


If you are ever dealing with multiple files of the same file layout then you should only use Proc Import one time. The procedure generates data step code that you can copy from the log and save in the editor. And then modify as needed such as compare the variable types against the file description you should have and also assign consistent lengths, again using that file description, for character variables.

Then you change the the Infile information and the name of the output data set when reading another another file.

 

Note that the behavior you mentions makes me suspect that the content of the header row is not the same so you need to check that very carefully between the two CSV files.

ybz12003
Rhodochrosite | Level 12

The partial columns comparing pre-import and post-import are shown in the attachment.


Pre vs post.png
Reeza
Super User

Check if the column names are unique. Perhaps those are duplicate? 

Also, what does the log say? 

ballardw
Super User

And compared with the previous CSV file?

 

At least make the columns wide enough to show the entire field in the CSV. It is obvious that they are cut off.

 

One reason that SAS will name a variable VAR123 is that the text in the column is identical for the first 32 characters with another field.

Since you show two partial columns that are both "If_wght_k" or  "If_lngth_c" or "if_hc_cm_" it seems that could well be the case here.

 

 

Do you have a file description as to what should be in the file, the widths and column types? I would never let a procedure guess, which is what proc import does, characteristics for 2000+ fields.

ybz12003
Rhodochrosite | Level 12

You are right,  Some column names is over 50 characters.  How to set width long enough to contain the variable name?

Reeza
Super User

@ybz12003 wrote:

You are right,  Some column names is over 50 characters.  How to set width long enough to contain the variable name?


You can't. Variable names are maximum of 32 characters. You're going to have to manually correct these names to the correct value.


MikeZdeb
Rhodochrosite | Level 12

Hi. How about changing the long variable names into labels and renaming your variables somthing simple like VAR1-VAR3. I used the attached CSV file. Yes, I assumed everything is numeric but you can tweak that INPUT statement in the last data step or add a LENGTH statement prior to the INPUT statement to designate character variables (or add an INFORMAT statement iif some of the variables are dates).

 

data names;
infile 'z:\test.csv' obs=1 dsd;
input (lab1-lab3) (:$100.);
run;

 

filename labels temp;

 

data _null_;
file labels;
set names;
array lab(3);
do j=1 to 3;
   put "var" j "='" lab(j) +(-1) "'";
end;
run;

 

data want;
infile 'z:\test.csv' firstobs=2 dsd;
input var1-var3;
label
%include labels;
;
run;

 

DATA SET: want

Alphabetic List of Variables and Attributes

# Variable Type Len Label

1 var1     Num   8  variable name 1 that is longer than 32 characters
2 var2     Num   8  variable name 2
3 var3     Num   8  variable name 2 that is longer than 32 characters

 

ybz12003
Rhodochrosite | Level 12

Thanks for your suggestion.  That was a great idea.  Unfornaturely, I am not only importing the CSV test file.  After the importing, I would like to compare the test file variables with the standard  SAS file variables to see any change on the naming.  If I use the label, I can't figure what the difference between those two files.

MikeZdeb
Rhodochrosite | Level 12

Not quite sure what all that means. The label names are the old variable names and will be exactly the same. Labels can be up to 200 characters. Will that accomodate your current variable names?

 

ps Long variable names (even 32 characters) are not easy to deal with ... as in you have to type in that long name every time you use a variable in a PROC or data step.

ballardw
Super User

Since I have to deal with some data sources that will change file layouts without documentation occasionally I have a few processes that may help. The first does not require SAS (Gasp!!).

Open the first CSV using a spreadsheet program. Copy the cells of the header row and paste-transpose into a COLUMN in a spreadsheet.

Do the same with the second header row pasting into another column in the same spreadsheet. The columns are for readibility, you only have to make 2 rows wide enough to display 50 or so characters instead of 2000+. In a third column put a formula to see if the other two columns have the same value.

Unfortunately this may only find the first mismatch if a variable has been added, deleted or moved. Sometimes you can see that a minor alignment works.

 

Why do I go through this nonsense? I 1) identify "new" names to ask the data source about 2) identify "removed" values to ask about and 3) can modify the datastep I program I saved from reading the first CSV file to account for those changes. This is important if I need to combine the data. I am only changing a few items instead of writing from scratch.

 

Something else to consider is look for common terms in your csv header. Your snippet of column heading looks entirely too much like an entire question such as "if lngth cm greater than 75". If you have 500 variables that all start with "If_" meaning that the header had "if " (there is a space there) then the "if" bit may not actually provide much information. So in the csv search and replace "if " with nothing. Other repeated strings may be replaced by a standardized shorter version, such as "Ln" instead of "lngth".

For example I have one data source that had something like 40 fields with "Primary caregiver" and another 40 with "Secondary Caregiver". Replacing those with PCG and SCG shortened a bunch of variable names.

ybz12003
Rhodochrosite | Level 12

Thanks for all of your great advice.  As a beginner,  the fast and easy way for me was use Excel transpose function to check the difference.  It's very similar to Ballardw's suggestion.   I posted the question here is trying to seek if another way to do in SAS.    

Reeza
Super User

Your question wasn't about differences. If it was, someone would have suggested PROC COMPARE which lists variables in datasets automatically. 

 

I also have a simple routine for comparing datasets and identifying variables in each dataset.

 

https://gist.github.com/statgeek/3b57ae085d9f7a36a2d95c15f04e72e6

ybz12003
Rhodochrosite | Level 12
Thanks for your codes, Mike. Unfortunately, the codes are too advanced for me to understand.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 7357 views
  • 7 likes
  • 5 in conversation