DATA Step, Macro, Functions and more

how to merge data sets with different coding system

Accepted Solution Solved
Reply
Super Contributor
Posts: 345
Accepted Solution

how to merge data sets with different coding system

I have two datasets, and I would like to merge with one code. but the coding system is different. the dataset1's code is three numbers, say, 123, the dataset2's code is four numbers, say, 0123

 

so I would like first create a new variable

icode2=input (substr(code,2,3), 3.);

 

but it generates . missing values for all obs

 

i also tried

icode2= substr (code, 2,3);

 

it generates blank for all obs

would you please let me know how to fix it

thanks

 


Accepted Solutions
Solution
‎05-04-2016 12:04 PM
Super User
Posts: 19,865

Re: how to merge data sets with different coding system

data one;
input var1;
cards;
123
456
789
234
;
run;

data two;
input var2 $;
cards;
0123
0456
0789
234
;
run;

data two_formatted;
set two;
var1=input(var2, best12.);
run;

proc sort data=one; by var1; run;
proc sort data=two_formatted; by var1; run;

data merged;
merge one two_formatted;
by var1;
run;

View solution in original post


All Replies
Respected Advisor
Posts: 4,932

Re: how to merge data sets with different coding system

Translate both strings to numbers

 

input(code, best.) = input(code2, best.)

PG
Super Contributor
Posts: 345

Re: how to merge data sets with different coding system

Thank you. But I do not get that.

Can you explain in more detail? Step by step, if there are more than two steps?

Super User
Posts: 19,865

Re: how to merge data sets with different coding system

1. What are the types of each variable? They need to be the same, so is one numeric and one character, or are they formatted to appear as character with the leading zero. Can you post the proc contents output for each of the variables? The goal is to make them the same, either numeric or character - character is probably better since you don't have to worry about rounding issues. 

 

 

Super Contributor
Posts: 345

Re: how to merge data sets with different coding system

both are numerical variables

num 8

 

I am doing the first step. I want to create a new variable, named as icode2

icode2= substr(code1, 2,3)

 

database1 has code 1, it is like 0123,0231,0234

database2 has code 2, like 123, 231, 234

 

so I thought of substr to do this, to create a new varaible icode2, then merge it based on

code2=icode2

Super User
Posts: 19,865

Re: how to merge data sets with different coding system


wenling wrote:

both are numerical variables

num 8

 



If both are numerical, how does it have leading zeroes? Is there a format attached?  

 

If it's numeric you shouldn't need to do anything to merge.  You can remove the format if it makes it easier for you to understand.  

 

SUBSTR() is a character function, it works on character strings, but you've stated both are numeric so it's not valid to use SUBSTR(). 

 

 

Super Contributor
Posts: 345

Re: how to merge data sets with different coding system

I am sorry I have not made it clear.

 

The two exsisted datasets have their already generated code system.

 

The dataset1  code is 0123, 0125, 0156

 

correspoindingly, the dataset 2 code is 123, 125, 156

 

they are just same, but becuase of different versions of coding system

i really just need to delete the dataset1  first zero so that I can merge, I need a multiple match merge

 

so the dataset 1 is a reference dataset with 100 obs, it explains all those meanings of 0123, 0125, 156  and their realted different cateogries

the dataset2 is a  dataset with over 10,000 obs. then the expect merged datasets with have all those obs with different explaination

i need to remove 0 from the dataset1 so that I can merge

Super User
Posts: 19,865

Re: how to merge data sets with different coding system


wenling wrote:

I am sorry I have not made it clear.

 

The two exsisted datasets have their already generated code system.

 

The dataset1  code is 0123, 0125, 0156

 

 


You've stated the variable is numeric. Is it numeric with a format? A numeric variable CANNOT have a leading 0, unless it's formatted. However, when merging the underlying value is used not the formatted value. 

 

Post a proc contents to clarify this, because what you're saying contradicts itself. 

 

 

Respected Advisor
Posts: 4,932

Re: how to merge data sets with different coding system

If both are numerical, as you state, and just look different because of their format, then they already match. No conversion needed. Data step MERGE and SAS/SQL joins match values based on internal representation. 

PG
Super Contributor
Posts: 345

Re: how to merge data sets with different coding system

They are different. 

For example, one would be 0235, the other in dataset2 should be 235,

what is an option to get rid of 0

That is why I tried to substr 

but as you said, substr is for character variable

so can you provide a sample code, an option to remove the first zero?

Super User
Posts: 11,343

Re: how to merge data sets with different coding system

Show us the PROC CONTENTS results on both datasets.

Super User
Posts: 19,865

Re: how to merge data sets with different coding system

What's the format on that variable? You say numeric so I'm guessing z4? Remove the leading zero by stripping the format. Then they'll 'look' the same. They're the same regardless, based on the information you've provided. 

 

format var2 8.;
Super Contributor
Posts: 345

Re: how to merge data sets with different coding system

I still do not get that, how to remove the leading zero, these two variables are both numerical

 

Super User
Posts: 19,865

Re: how to merge data sets with different coding system

Did you run the code? 

 

Post your proc contents - I have no suggestions without this.  

 

Also, try the merge - does it work? 

Super Contributor
Posts: 345

Re: how to merge data sets with different coding system

the dataset has three numbers, the variable is a charc varil, length is 3,

the dataset has the variable with four numbers (including the leading zero), is a numberical variable, length is 8 (so I cannot use substr for this one to remove zero, but I do not know how to use

sorry for my previous mistake that I thought both were numerical

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 366 views
  • 1 like
  • 5 in conversation