Data Step Merge - variable declared as both character and numeric

Accepted Solution Solved
Reply
Contributor
Posts: 33
Accepted Solution

Data Step Merge - variable declared as both character and numeric

[ Edited ]

Hello everyone,

 

I tried merging two data sets by gvkey but I kept getting an error stating that the variable has been defined as both character and numeric. I did some search and was finally able to merge them. I'm sharing my codes here and I would be very grateful if someone would tell me if I did the right thing. I used these codes:

data data_set_1;

set data_set_1 (rename=(gvkey=gvkey_Character)) ;

gvkey = put(gvkey_Character, 1.) ;

drop gvkey_Character ;

run ;

data data_set_2;

set data_set_2 (rename=(gvkey=gvkey_Character)) ;

gvkey = put(gvkey_Character, 1.) ;

drop gvkey_Character ;

run ;

proc sort data=data_set_1;

by gvkey;

run;

proc sort data=data_set_2;

by gvkey;

run;

data merged;

merge data_set_1 data_set_2;

by GVKEY;

run;

 

Thank you very much.


Accepted Solutions
Solution
‎02-11-2017 07:36 AM
Grand Advisor
Posts: 17,396

Re: Expert Opinion

The correct methodology is to ensure both variables are the same type, either character or numeric. In general, it's easier if ID's are character. However, since you got a warning, you shouldn't have to change both datasets only one. Your code changes both.

 

Another concern is that you've assigned it to a single character. As long as GVKEY never has more than a single digit this is fine. Otherwise you should use a different format, perhaps Best12

Yours:
gvkey = put(gvkey_Character, 1.) ;
Probably what you need: gvkey = put(gvkey_Character, best12.) ;

 

View solution in original post


All Replies
Esteemed Advisor
Posts: 6,692

Re: Expert Opinion

To exactly determine the reason for your initial problem, we would need to see the variable list from proc contents for both datasets, or the log of your code execution, as the NOTEs would reveal the implicit type conversion(s) in your data steps.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 33

Re: Expert Opinion

Thank you sir, for the reply.

Below is the note I get from the log execution. I don't if this was what you meant. If meant I'm sorry; I started using SAS only recently.

Thank you once again.

 

data theo.tenure;
19 set theo.tenure;
20 run;

NOTE: There were 2134412 observations read from the data set THEO.TENURE.
NOTE: The data set THEO.TENURE has 2134412 observations and 13 variables.
NOTE: DATA statement used (Total process time):
real time 4.63 seconds
cpu time 0.71 seconds


21 data theo.debt;
22 set theo.debt;

NOTE: There were 506856 observations read from the data set THEO.DEBT.
NOTE: The data set THEO.DEBT has 506856 observations and 22 variables.
NOTE: DATA statement used (Total process time):
real time 6.95 seconds
cpu time 0.61 seconds


23 proc sort data=theo.tenure;
24 by gvkey;
25 run;

NOTE: There were 2134412 observations read from the data set THEO.TENURE.
NOTE: The data set THEO.TENURE has 2134412 observations and 13 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 6.24 seconds
cpu time 1.98 seconds


26 proc sort data=theo.debt;
27 by GVKEY;
28 run;

NOTE: There were 506856 observations read from the data set THEO.DEBT.
NOTE: The data set THEO.DEBT has 506856 observations and 22 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 7.49 seconds
cpu time 1.18 seconds


29 data merged;
30 merge theo.tenure theo.debt;
ERROR: Variable gvkey has been defined as both character and numeric.
31 by GVKEY;
32 run;

 

 

Esteemed Advisor
Posts: 6,692

Re: Expert Opinion

Sorry, I meant the log of your conversion data steps.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 33

Re: Expert Opinion

I'm so sorry,sir. I don't really get what you mean by my conversion data steps. So, I have provided the proc contents of the two variable here.

Thank you.

 

 

 

SAS Output

 

The SAS System

The CONTENTS Procedure

 

Data Set Name Observations Member Type Variables Engine Indexes Created Observation Length Last Modified Deleted Observations Protection Compressed Data Set Type Sorted Label   Data Representation   Encoding  
THEO.DEBT506856
DATA22
V90
02/11/2017 05:31:48424
02/11/2017 05:31:480
 NO
 NO
  
WINDOWS_64 
wlatin1 Western (Windows) 

Engine/Host Dependent Information Data Set Page Size Number of Data Set Pages First Data Page Max Obs per Page Obs in First Data Page Number of Data Set Repairs ExtendObsCounter Filename Release Created Host Created
65536
3292
1
154
146
0
YES
C:\Users\toshiba-pc\Desktop\sas_stuff\debt.sas7bdat
9.0401M3
X64_8HOME

Alphabetic List of Variables and Attributes # Variable Type Len Format Informat 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 4
CurrencyChar11$11.$11.
GVKEYNum8BEST12.BEST32.
USDChar22$22.$22.
VAR5Char16$16.$16.
VAR6Char28$28.$28.
VAR7Char34$34.$34.
VAR8Char21$21.$21.
VAR9Char18$18.$18.
VAR10Char38$38.$38.
VAR11Char16$16.$16.
VAR12Char19$19.$19.
VAR13Char16$16.$16.
VAR14Char15$15.$15.
VAR15Char15$15.$15.
VAR16Char15$15.$15.
VAR17Char15$15.$15.
VAR18Char15$15.$15.
VAR19Char30$30.$30.
VAR20Char25$25.$25.
VAR21Char1$1.$1.
VAR22Char18$18.$18.
_Char22$22.$22.

 


The SAS System

The CONTENTS Procedure

 

Data Set Name Observations Member Type Variables Engine Indexes Created Observation Length Last Modified Deleted Observations Protection Compressed Data Set Type Sorted Label   Data Representation   Encoding  
THEO.TENURE2134412
DATA13
V90
02/11/2017 04:48:5396
02/11/2017 04:48:530
 NO
 YES
  
WINDOWS_64 
wlatin1 Western (Windows) 

Engine/Host Dependent Information Data Set Page Size Number of Data Set Pages First Data Page Max Obs per Page Obs in First Data Page Number of Data Set Repairs ExtendObsCounter Filename Release Created Host Created
65536
3135
1
681
651
0
YES
C:\Users\toshiba-pc\Desktop\sas_stuff\tenure.sas7bdat
9.0401M3
X64_8HOME

Alphabetic List of Variables and Attributes # Variable Type Len Format Informat Label 1 8 10 12 4 3 7 6 9 11 2 5 13
COMPANYIDNum811.11.This is a unique indentifier denoting a company inside many tables of Capital IQ data feeds
ENDYEARNum811.11.End year of this record
FSTYEARNum8   
LnTenureNum8   
PERIODENDDATENum4YYMMDDN8.DATETIME22.3Period End Date
PERSONIDNum811.11.Unique identifier for a Person in the Capital IQ database
PROFUNCTIONIDNum811.11.This is the identifier of a given proFunction. A proFunction is a standardized way to describe titles because titles can be different across firms.
PROIDNum811.11.Unique identifier for a professional . A professional is the instance of a person being associated to a company (or educational institution, etc.)
STARTYEARNum811.11.Start year of this record
TENURENum8   
YEARNum411.11.Fiscal Year
YEARBORNNum811.11.Year the professional was born
gvkeyChar1   

Sort Information Sortedby Validated Character Set
gvkey
YES
ANSI
 
Grand Advisor
Posts: 17,396

Re: Expert Opinion

You realize the first two steps do nothing?

 

While writing code as a newbie, comment it. Before each step, write a statement of what you expect to be occuring in the data step. 

It'll help you organize your code and future you when you need to read it again in 6 months.

Grand Advisor
Posts: 17,396

Re: Expert Opinion

Are you sure your data is correct? GVKEY being a single character seems weird to me, It's usually a longer character string. 

Your code is now referencing Tenure and Debt but your conversion code using dataset 1/dataset 2. We can't know how they map to each other.

 

Your code should be:

 

1. Convert numeric to character for GVKEY (data step).

2. Sort by relevant fields, once for each data step

3. Merge data sets.

 

 

 

Contributor
Posts: 33

Re: Expert Opinion

Thank you, sir for your reply.

 Sir, dataset 1 and 2 are tenure and debt, respectively. I just used them as examples. As to the codes I used, as I said before, I searched online and saw a solution to a similar problem so I edited the codes and used it. I think the gvkey appearing as a single character in my codes reflect my limited knowledge of SAS and that's why I wanted to make sure I was doing the right thing.

 

Once again, thank you.

Grand Advisor
Posts: 17,396

Re: Expert Opinion

Not a sir. 

You need to re-import or fix your original data first - not sure how you imported it but clearly GVKEY was not imported properly. 

 

Remember to review your data after each step. Open the dataset and visually examine it. You should do this regardless of the software you use, but as a step in your analytical process.

 

Its also better if you make sure when importing that it's character in both to start off with, rather than mess around with trying to change it after. 

Solution
‎02-11-2017 07:36 AM
Grand Advisor
Posts: 17,396

Re: Expert Opinion

The correct methodology is to ensure both variables are the same type, either character or numeric. In general, it's easier if ID's are character. However, since you got a warning, you shouldn't have to change both datasets only one. Your code changes both.

 

Another concern is that you've assigned it to a single character. As long as GVKEY never has more than a single digit this is fine. Otherwise you should use a different format, perhaps Best12

Yours:
gvkey = put(gvkey_Character, 1.) ;
Probably what you need: gvkey = put(gvkey_Character, best12.) ;

 

Respected Advisor
Posts: 4,993

Re: Data Step Merge - variable declared as both character and numeric

Notice from the proc contents results that you posted:  GVKEY is numeric in your first data set, but character in your second data set.  To make them match, you could convert GVKEY from numeric to character, but only do that for your first data set.

 

Looking at your program, you could get rid of the second DATA step entirely.  The remaining steps in combination should work.

Contributor
Posts: 33

Re: Data Step Merge - variable declared as both character and numeric

Thank you everybody for time and patience.

I'm new to SAS and research in general and as has been pointed out to me, I think the mistake is coming from me. I'm sorry.

 

I think the correct identifier to use for merging my data is COMPANYID and not GVKEY. I have gone back and I'm compiling my data again. I'll tell you what happens.

 

Once again, thank you for your time and patience. I really appreciate it.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 188 views
  • 1 like
  • 4 in conversation