DATA Step, Macro, Functions and more

Variables name?

Accepted Solution Solved
Reply
Super Contributor
Posts: 297
Accepted Solution

Variables name?

Hello:

 

I imported CSV file into SAS 9.3.  I have 2311 columns,  I found out that SAS renamed my column names after column 1771.   For example, my previous column 1771 name is C1, it changed to var 1771.   Then it changed to var 1172, var 1773 etc.  How to keep my oringinal names?  Thanks.

 

proc import out=test

datafile="C:\My Documents\myfiles\test.csv"

dbms=csv replace;

getnames=yes;

datarow=2;

run;

quit;


Accepted Solutions
Solution
‎05-31-2017 08:28 PM
Super User
Posts: 10,460

Re: Variables name?

Since I have to deal with some data sources that will change file layouts without documentation occasionally I have a few processes that may help. The first does not require SAS (Gasp!!).

Open the first CSV using a spreadsheet program. Copy the cells of the header row and paste-transpose into a COLUMN in a spreadsheet.

Do the same with the second header row pasting into another column in the same spreadsheet. The columns are for readibility, you only have to make 2 rows wide enough to display 50 or so characters instead of 2000+. In a third column put a formula to see if the other two columns have the same value.

Unfortunately this may only find the first mismatch if a variable has been added, deleted or moved. Sometimes you can see that a minor alignment works.

 

Why do I go through this nonsense? I 1) identify "new" names to ask the data source about 2) identify "removed" values to ask about and 3) can modify the datastep I program I saved from reading the first CSV file to account for those changes. This is important if I need to combine the data. I am only changing a few items instead of writing from scratch.

 

Something else to consider is look for common terms in your csv header. Your snippet of column heading looks entirely too much like an entire question such as "if lngth cm greater than 75". If you have 500 variables that all start with "If_" meaning that the header had "if " (there is a space there) then the "if" bit may not actually provide much information. So in the csv search and replace "if " with nothing. Other repeated strings may be replaced by a standardized shorter version, such as "Ln" instead of "lngth".

For example I have one data source that had something like 40 fields with "Primary caregiver" and another 40 with "Secondary Caregiver". Replacing those with PCG and SCG shortened a bunch of variable names.

View solution in original post


All Replies
Trusted Advisor
Posts: 1,606

Re: Variables name?

[ Edited ]

Can you show us a screen capture of the appropriate part of the CSV file, showing columns 1770, 1771, etc.?

Super User
Posts: 10,460

Re: Variables name?


ybz12003 wrote:

Hello:

 

I imported CSV file into SAS 9.3.  I have 2311 columns,  I found out that SAS renamed my column names after column 1771.   For example, my previous column 1771 name is C1, it changed to var 1771.   Then it changed to var 1172, var 1773 etc.  How to keep my oringinal names?  Thanks.

 


If you are ever dealing with multiple files of the same file layout then you should only use Proc Import one time. The procedure generates data step code that you can copy from the log and save in the editor. And then modify as needed such as compare the variable types against the file description you should have and also assign consistent lengths, again using that file description, for character variables.

Then you change the the Infile information and the name of the output data set when reading another another file.

 

Note that the behavior you mentions makes me suspect that the content of the header row is not the same so you need to check that very carefully between the two CSV files.

Super Contributor
Posts: 297

Re: Variables name?

The partial columns comparing pre-import and post-import are shown in the attachment.


Pre vs post.png
Super User
Posts: 17,737

Re: Variables name?

Check if the column names are unique. Perhaps those are duplicate? 

Also, what does the log say? 

Super User
Posts: 10,460

Re: Variables name?

And compared with the previous CSV file?

 

At least make the columns wide enough to show the entire field in the CSV. It is obvious that they are cut off.

 

One reason that SAS will name a variable VAR123 is that the text in the column is identical for the first 32 characters with another field.

Since you show two partial columns that are both "If_wght_k" or  "If_lngth_c" or "if_hc_cm_" it seems that could well be the case here.

 

 

Do you have a file description as to what should be in the file, the widths and column types? I would never let a procedure guess, which is what proc import does, characteristics for 2000+ fields.

Super Contributor
Posts: 297

Re: Variables name?

You are right,  Some column names is over 50 characters.  How to set width long enough to contain the variable name?

Super User
Posts: 17,737

Re: Variables name?


ybz12003 wrote:

You are right,  Some column names is over 50 characters.  How to set width long enough to contain the variable name?


You can't. Variable names are maximum of 32 characters. You're going to have to manually correct these names to the correct value.


Valued Guide
Posts: 765

Re: Variables name?

Hi. How about changing the long variable names into labels and renaming your variables somthing simple like VAR1-VAR3. I used the attached CSV file. Yes, I assumed everything is numeric but you can tweak that INPUT statement in the last data step or add a LENGTH statement prior to the INPUT statement to designate character variables (or add an INFORMAT statement iif some of the variables are dates).

 

data names;
infile 'z:\test.csv' obs=1 dsd;
input (lab1-lab3) (:$100.);
run;

 

filename labels temp;

 

data _null_;
file labels;
set names;
array lab(3);
do j=1 to 3;
   put "var" j "='" lab(j) +(-1) "'";
end;
run;

 

data want;
infile 'z:\test.csv' firstobs=2 dsd;
input var1-var3;
label
%include labels;
;
run;

 

DATA SET: want

Alphabetic List of Variables and Attributes

# Variable Type Len Label

1 var1     Num   8  variable name 1 that is longer than 32 characters
2 var2     Num   8  variable name 2
3 var3     Num   8  variable name 2 that is longer than 32 characters

 

Attachment
Super Contributor
Posts: 297

Re: Variables name?

Thanks for your suggestion.  That was a great idea.  Unfornaturely, I am not only importing the CSV test file.  After the importing, I would like to compare the test file variables with the standard  SAS file variables to see any change on the naming.  If I use the label, I can't figure what the difference between those two files.

Valued Guide
Posts: 765

Re: Variables name?

[ Edited ]

Not quite sure what all that means. The label names are the old variable names and will be exactly the same. Labels can be up to 200 characters. Will that accomodate your current variable names?

 

ps Long variable names (even 32 characters) are not easy to deal with ... as in you have to type in that long name every time you use a variable in a PROC or data step.

Solution
‎05-31-2017 08:28 PM
Super User
Posts: 10,460

Re: Variables name?

Since I have to deal with some data sources that will change file layouts without documentation occasionally I have a few processes that may help. The first does not require SAS (Gasp!!).

Open the first CSV using a spreadsheet program. Copy the cells of the header row and paste-transpose into a COLUMN in a spreadsheet.

Do the same with the second header row pasting into another column in the same spreadsheet. The columns are for readibility, you only have to make 2 rows wide enough to display 50 or so characters instead of 2000+. In a third column put a formula to see if the other two columns have the same value.

Unfortunately this may only find the first mismatch if a variable has been added, deleted or moved. Sometimes you can see that a minor alignment works.

 

Why do I go through this nonsense? I 1) identify "new" names to ask the data source about 2) identify "removed" values to ask about and 3) can modify the datastep I program I saved from reading the first CSV file to account for those changes. This is important if I need to combine the data. I am only changing a few items instead of writing from scratch.

 

Something else to consider is look for common terms in your csv header. Your snippet of column heading looks entirely too much like an entire question such as "if lngth cm greater than 75". If you have 500 variables that all start with "If_" meaning that the header had "if " (there is a space there) then the "if" bit may not actually provide much information. So in the csv search and replace "if " with nothing. Other repeated strings may be replaced by a standardized shorter version, such as "Ln" instead of "lngth".

For example I have one data source that had something like 40 fields with "Primary caregiver" and another 40 with "Secondary Caregiver". Replacing those with PCG and SCG shortened a bunch of variable names.

Super Contributor
Posts: 297

Re: Variables name?

Thanks for all of your great advice.  As a beginner,  the fast and easy way for me was use Excel transpose function to check the difference.  It's very similar to Ballardw's suggestion.   I posted the question here is trying to seek if another way to do in SAS.    

Super User
Posts: 17,737

Re: Variables name?

Your question wasn't about differences. If it was, someone would have suggested PROC COMPARE which lists variables in datasets automatically. 

 

I also have a simple routine for comparing datasets and identifying variables in each dataset.

 

https://gist.github.com/statgeek/3b57ae085d9f7a36a2d95c15f04e72e6

Super Contributor
Posts: 297

Re: Variables name?

Thanks for your codes, Mike. Unfortunately, the codes are too advanced for me to understand.
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 190 views
  • 7 likes
  • 5 in conversation