BookmarkSubscribeRSS Feed
abak
Obsidian | Level 7

Hi,

 

I have a huge data set that was given to me in 'inline cards' of all things from a different program that exported it to SAS infile cards format. The problem is that the column names got generic non-descript names and the labels were applied specific names that help me parse the data as needed. 

 

I've googled around but can't find a solution that works (one macro online written in SAS documentation didn't work) to change the labels into column names. 

 

Can anyone help me understand how to change variable labels into variable names. 

 

Thank you!

14 REPLIES 14
ballardw
Super User

Example of the code may help.

 

If the column labels run more than 10 or so characters you might reconsider as it doesn't take very long repeatedly coding with "Somesortoflongishvariablename" to get tired.

You might be better off with search and replace in the editor to create actual meaningful short(er) variable names.

 

If the issue is that you will be getting multiple files like this with same layout then write your code to read an external file and save the date portion of the supplied file to an external file and read that.

abak
Obsidian | Level 7
The code is way too big to attach.

Is there a way to take variable labels and change them into variable names, in general? The data set has 10k columns and about 5k rows give or take.

I have a dataset with approx. 10k columns. I can't 'hand' re-write the columns.
SuryaKiran
Meteorite | Level 14

Hi,

 

Dictionary.columns will hold both variable names and their label names. You can construct a rename statement out of it. Make sure that SAS accepts the names. 

 

proc sql noprint;
select cats(name,'=', case when label is not null then nliteral(label ) 
							else name end) into: rename separated by ' '
from dictionary.columns
where libname='SASHELP' and memname='CARS';
quit;

data cars;
set cars(rename=(&rename));
run;

I used NLITERAL() to specify as SAS name literal. 

 

 

Thanks,
Suryakiran
Astounding
PROC Star

This is an excellent approach, but not for a data set with 10K columns.  The size of a single macro variable is limited to 64K characters, and couldn't possibly hold the number of characters you will need for renaming 10K variables.

PGStats
Opal | Level 21

An alternative, along tha same lines would be:

 

filename rename temp;

data _null_;
set sashelp.vcolumn( where=(libname='SASHELP' and memname='CARS'));
length line $64;
if not missing(label) then do;
    line = cats(name, '=', nliteral(label));
    file rename;
    put line;
    end;
run;;

option validvarname=any;

data cars;
set sashelp.cars;
rename
%include rename;
;
run;
PG
Reeza
Super User

You can if you have systematic naming, but if its naming such as 'Var26' goes to IncomeVar and var27 goes to Expenditures, there's no way to know that.

 

Labels also do not follow SAS naming conventions, so it's not as simple as replacing the name with the label, which is actually a relatively trivial task. Issues you can run into, not an exhaustive list:

  • names longer than 32 chars
  • spaces in names
  • punctuation in names
  • labels that are the same for different variables

How do you want to deal with these type of issues?

 

Please post a small representative sample of what you're trying to do, say 5 obs and 10 variables and we can likely help from there. 

Instructions on attaching data as data lines is here:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 


@abak wrote:
The code is way too big to attach.

Is there a way to take variable labels and change them into variable names, in general? The data set has 10k columns and about 5k rows give or take.

I have a dataset with approx. 10k columns. I can't 'hand' re-write the columns.

 

Reeza
Super User
https://gist.github.com/statgeek/f18931085f6a0009185c

Here's a quick example of automatically assigning labels to a data set, where the data set has the name and label. You can create such a data set using PROC CONTENTS and then modify the code to do a rename rather than apply a label. But you do have to consider all the issues I mentioned above as well.
error_prone
Barite | Level 11
It is very unlikely that the labels can be used as names without pre-processing them. Such a step includes the removing all chars not allowed in names, making sure that the name starts with letter or underscore and that length of the fixed label name is <= 32 chars. Reducing length must be done very, very carefully, because you must be sure not to generate duplicate names. Good luck!

In a perfect world, you would, of course, not have a dataset with 10k variables, at all.
abak
Obsidian | Level 7
Agreed. Thank you everyone for your help. I'm going to try these out today a bit, though I'm working with a DB person to see if I can get a better copy of the data in a SAS format/SAS importable format.

Unfortunately my job routinely has to use such extremely wide datasets.
Ksharp
Super User
data have;
 set sashelp.class;
 label sex='M_F' age='Year' Name='NN' weight='WW' height='HH';
run;

options validvarname=any;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
data _null_;
 set temp end=last;
 if _n_=1 then call execute('proc datasets library=work nodetails nolist;modify have;rename ');
 call execute(cats(_name_,'=',nliteral(_label_)));
 if last then call execute(';quit;');
run;
LeonidBatkhan
Lapis Lazuli | Level 10

I have written a SAS macro that solves an opposite problem of creating column labels from column names. The key there was to replace "_" in column name into " " and apply proper case as this:

 

lbl = propcase(translate(name,' ','_'));

If your labels are no longer than 32 characters than it's easy to reverse this solution by replacing " " with "_" in column names, as:

 

name = translate(lbl,'_',' '));

 

If your labels contain other special characters besides " ", such as ":", "-" you might need to translate them into some other characters that a valid in SAS column names.

 

For detail information and complete code see my blog post Automating SAS variable labels creation .

 

Tom
Super User Tom
Super User

Not sure how replacing the names with the labels will help you any.  Are the names that generic that they are unusable?

 

If so why did they create generic variable names to begin with?  Perhaps you are lucky and the reason the code is using generic names and labels is because they had names longer than 8 characters and they needed to get it to work with SAS version 6 or earlier so they made generic variable names and put the real name in to the label.

 

You might have better luck converting from wide to long format.  Do you have unique ID variable (or some small set of variables that uniquely identify each observations)?  If not you can add one easily.

 

If you run code like this you will get one dataset with 10K*5K observations that will have both the NAME, LABEL and the VALUE.  

proc transpose data=have out=tall(rename=(col1=VALUE)) ;
  by id ;
  var _all_;
run;

Note that if you have any character variables then all variable (including the numeric variables) will be converted to character.  If you have a mix of numeric and character variables you might want to make two datasets instead.

proc transpose data=have out=tall_numbers(rename=(col1=NUM_VALUE)) ;
  by id ;
  var _numeric_;
run;
proc transpose data=have out=tall_chars(rename=(col1=CHAR_VALUE)) ;
  by id ;
  var _character_;
run;

Now if you want you can reverse the transpose and just tell it to use the _LABEL_ variable instead of the _NAME_ variable when naming the variables.

proc transpose data=tall_numeric out=want ;
  by id;
  id _label_;
  var num_value;
run;
LeonidBatkhan
Lapis Lazuli | Level 10

I am not replacing names with labels, I am creating labels out of names for reporting purposes. For example, when you extract a table from a database using one of the SAS' Access Engines, SAS will create a SAS table with column names inherited from the database table, and labels assigned the same values as column names. For example: CUSTOMER_NAME, CUSTOMER_ADDRESS. If you use such a table in reports (e.g. Visual Analytics report) SAS will use these column labels as descriptors of data elements, which is quite ugly. Having created labels as 'Customer Name' and 'Customer Address' will look much nicer and professional, as they will break at the space characters. If you load multiple tables in SAS VA having an automatic way of assigning nice labels saves a lot of time. Please read the blog post that describes this in detail: Automating SAS variable labels creation .

Reeza
Super User
It probably doesn't make senes to post this on a 2 year old thread. Just a thought to consider, only the original participants will see it.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 16384 views
  • 12 likes
  • 10 in conversation