Help using Base SAS procedures

Duplicate variable names after PROC SUMMARY?

Accepted Solution Solved
Reply
Contributor
Posts: 24
Accepted Solution

Duplicate variable names after PROC SUMMARY?

I have a dataset (let's calll it "OLDDATA") with two variables, "Var1" and "Var2". I am trying to create a new dataset ("NEWDATA"), with Var1, Var2, and a third variable called Var3. Var3 is to be calculated by summing up all values in Var1 and multiplying each value in Var2 by this figure. I was told somewhere that the way to do it is as follows:

 

proc summary data = sasdata.olddata;
var var1;
output out=column1_summary sum=total_var1;
run;

data sasdata.newdata;
set sasdata.olddata;
if _n_=1 then set column1_summary;
var3 = var2 * total_var1;
run;

 

The code runs without errors and produces the Var3 variable I expect. However, the resulting dataset, NEWDATA, has two columns named Var1, one from OLDDATA and the other populated exclusively by the output of PROC SUMMARY, i.e., that same value for every observation. I feel that having duplicate column names would not be helpful in later calculations. What is the syntax for suppressing the second Var1 column, or giving it some other name to avoid confusion?


Accepted Solutions
Solution
‎11-11-2015 11:30 AM
Super User
Super User
Posts: 7,432

Re: Duplicate variable names after PROC SUMMARY?

There are not two variables with the same name in any dataset, as the SAS system does not allow this.  What you have is two variables which have the same label.  

 

Run a proc contents and look at varname and varlabel, label will be the same, name will not.  It you still think it does, post the output of the proc contents.

View solution in original post


All Replies
Solution
‎11-11-2015 11:30 AM
Super User
Super User
Posts: 7,432

Re: Duplicate variable names after PROC SUMMARY?

There are not two variables with the same name in any dataset, as the SAS system does not allow this.  What you have is two variables which have the same label.  

 

Run a proc contents and look at varname and varlabel, label will be the same, name will not.  It you still think it does, post the output of the proc contents.

Contributor
Posts: 24

Re: Duplicate variable names after PROC SUMMARY?

I knew the system wouldn't allow duplicate names, so that, too, surprised me. Is it safe, then, to perform a calculation that calls my original Var1 by name--the system won't mix it up with the other variable that it labeled Var1?
Super User
Super User
Posts: 7,432

Re: Duplicate variable names after PROC SUMMARY?

Without running the code, I can't think off the top of my head, but from memory, the original variable remains, and new variables are created with an additional prefix/suffix (you can probably alter that as well through options).  Just open the dataset, and go up to (assuming you use SAS not UE or VA or something) View->Column Names, and you will see what each variable is called.

Contributor
Posts: 24

Re: Duplicate variable names after PROC SUMMARY?

You are right. I ran PROC CONTENTS on the resulting dataset and saw that, while labels are the same, names are not. Am I right, then, that it does not matter if labels are the same as long as names are different?

Super User
Super User
Posts: 7,432

Re: Duplicate variable names after PROC SUMMARY?

Yes, that is correct.  Column names have to be unique within a dataset.  Column labels can be anything you want.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 320 views
  • 0 likes
  • 2 in conversation