BookmarkSubscribeRSS Feed
Mystik
Obsidian | Level 7

Hello

I am trying to merger 2 datasets(tab delimited) each 11,234,890;

 

data jdid_all; 

merge jdid_1 jdid_2;

by id:

run;

 

however, i get error messages;

Variable set1, set3, set4, set6, set8, set9 has been defined as both as character and numeric.

Can someone please help 

 

Emij.

7 REPLIES 7
r_behata
Barite | Level 11

HI @Mystik Apparently there is a mismatch of  datatypes between the variables with the same names  between the two datasets that you are trying to merge. If you are ok with the common variables in the jdid_2 in your output dataset then you need to do an explicit datatype conversion to overcome this error.  else you need to rename the common variables in one of the dataset you are trying to merge. 

 

Hope this answers your question.

Mystik
Obsidian | Level 7
yes there's a mismatch
Data 1 Data2
Numeric Character
Numeric Character
Character Numeric
Numeric Character
Numeric Character
Numeric Character

How do i overcome this mismatch with the appropriate sas code?
ballardw
Super User

@Mystik wrote:

Hello

I am trying to merger 2 datasets(tab delimited) each 11,234,890;

 

data jdid_all; 

merge jdid_1 jdid_2;

by id:

run;

 

however, i get error messages;

Variable set1, set3, set4, set6, set8, set9 has been defined as both as character and numeric.

Can someone please help 

 

Emij.


You have two basic choices an both of them involve YOU deciding which variables should actually be numeric and which character.

One choice is to go back to how you created those data sets in the first place. If that involved Proc Import then that is the likely reason this occurred if external files were in the "same" layout. Proc Import guesses what type and other characteristics variables should have and depending on the content of files will make different guesses leading to this issue. Depending on the source of the file this may involve writing a data step or modifying a data step based on Proc Import generated code.

 

The other is for you to use a data step to create a new variable of the correct type. A generic approach is to something like this:

 

data want;

    set have (rename=(somevar=oldvar) );

    /* to create a numeric version of the variable somevar now temporarily named OLDVAR*/

   /* use an input. The informat below is just an example. If the value should be date or time

      it gets much more interesting as you should know the format , we can't without details*/

   somevar = input(oldvar, best26.);

/* if want a character variable then it would be and the format is likely moderately important

   if you want similar looking values*/

   somevar = put(oldvar, someformat. -L);

run;

Mystik
Obsidian | Level 7

The variables are; numbers (BEST12), data (MMDDYY10), period ($10.), blank ($1.)

ballardw
Super User

@Mystik wrote:

The variables are; numbers (BEST12), data (MMDDYY10), period ($10.), blank ($1.)


And you want us to do what with this?

without variable names, which formats/values and in which sets this doesn't help and we couldn't provide any specific code at all.

 

The blank ($1.)  is frequently going to mean that you need to re-read the data source in a different manner as that happens when using import and the first few records are missing any value for the variable. Then later records get truncated to one character even if they have 100's of characters in the source file.

Mystik
Obsidian | Level 7

Okay. Is there a way convert a variable to accept both character and numeric. This is because my variable set1, has '150', '569.8999', '.' , ' ' as well as '17/04/2011' as inputs in set1 column? 

ballardw
Super User

@Mystik wrote:

Okay. Is there a way convert a variable to accept both character and numeric. This is because my variable set1, has '150', '569.8999', '.' , ' ' as well as '17/04/2011' as inputs in set1 column? 


What is your variable set1 supposed to represent? Are you going to do arithmetic with the values? If so then your field should be numeric. Which brings up what numeric value you want for '17/04/2011'.

If you are not doing arithmetic then character should be fine and the earlier response using PUT is the way to convert numeric to text.

 

 

I would say that any variable that has a value that is almost certainly a date and other sort of random numeric values means that something was done poorly at a much earlier step as dates should be separate variables.

 

I would also be tempted to go back to the very source of the data and make sure that the columns actually align correctly with purpose and variable names. Perhaps Set1 is one data set means something else in the other. In which case they probably should not have the same variable name if you are going to combine the data.

Consider, I have two data sets. In the first set Name means the name of a person. In the second set Name is the name of city.

If I combine the two data sets with a set statement  I could something like:

John Smith

Mary Jones

Boston, Ma

Chicago, Il

 

Which makes little sense. Perhaps something similar has happened with your data that is attempting to mix values that shouldn't even though they have the same variable name.

 

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 3391 views
  • 2 likes
  • 3 in conversation