I have two datasets, and I would like to merge with one code. but the coding system is different. the dataset1's code is three numbers, say, 123, the dataset2's code is four numbers, say, 0123
so I would like first create a new variable
icode2=input (substr(code,2,3), 3.);
but it generates . missing values for all obs
i also tried
icode2= substr (code, 2,3);
it generates blank for all obs
would you please let me know how to fix it
thanks
data one;
input var1;
cards;
123
456
789
234
;
run;
data two;
input var2 $;
cards;
0123
0456
0789
234
;
run;
data two_formatted;
set two;
var1=input(var2, best12.);
run;
proc sort data=one; by var1; run;
proc sort data=two_formatted; by var1; run;
data merged;
merge one two_formatted;
by var1;
run;
Translate both strings to numbers
input(code, best.) = input(code2, best.)
Thank you. But I do not get that.
Can you explain in more detail? Step by step, if there are more than two steps?
1. What are the types of each variable? They need to be the same, so is one numeric and one character, or are they formatted to appear as character with the leading zero. Can you post the proc contents output for each of the variables? The goal is to make them the same, either numeric or character - character is probably better since you don't have to worry about rounding issues.
both are numerical variables
num 8
I am doing the first step. I want to create a new variable, named as icode2
icode2= substr(code1, 2,3)
database1 has code 1, it is like 0123,0231,0234
database2 has code 2, like 123, 231, 234
so I thought of substr to do this, to create a new varaible icode2, then merge it based on
code2=icode2
@Bal23 wrote:
both are numerical variables
num 8
If both are numerical, how does it have leading zeroes? Is there a format attached?
If it's numeric you shouldn't need to do anything to merge. You can remove the format if it makes it easier for you to understand.
SUBSTR() is a character function, it works on character strings, but you've stated both are numeric so it's not valid to use SUBSTR().
I am sorry I have not made it clear.
The two exsisted datasets have their already generated code system.
The dataset1 code is 0123, 0125, 0156
correspoindingly, the dataset 2 code is 123, 125, 156
they are just same, but becuase of different versions of coding system
i really just need to delete the dataset1 first zero so that I can merge, I need a multiple match merge
so the dataset 1 is a reference dataset with 100 obs, it explains all those meanings of 0123, 0125, 156 and their realted different cateogries
the dataset2 is a dataset with over 10,000 obs. then the expect merged datasets with have all those obs with different explaination
i need to remove 0 from the dataset1 so that I can merge
@Bal23 wrote:
I am sorry I have not made it clear.
The two exsisted datasets have their already generated code system.
The dataset1 code is 0123, 0125, 0156
You've stated the variable is numeric. Is it numeric with a format? A numeric variable CANNOT have a leading 0, unless it's formatted. However, when merging the underlying value is used not the formatted value.
Post a proc contents to clarify this, because what you're saying contradicts itself.
If both are numerical, as you state, and just look different because of their format, then they already match. No conversion needed. Data step MERGE and SAS/SQL joins match values based on internal representation.
They are different.
For example, one would be 0235, the other in dataset2 should be 235,
what is an option to get rid of 0
That is why I tried to substr
but as you said, substr is for character variable
so can you provide a sample code, an option to remove the first zero?
Show us the PROC CONTENTS results on both datasets.
What's the format on that variable? You say numeric so I'm guessing z4? Remove the leading zero by stripping the format. Then they'll 'look' the same. They're the same regardless, based on the information you've provided.
format var2 8.;
I still do not get that, how to remove the leading zero, these two variables are both numerical
Did you run the code?
Post your proc contents - I have no suggestions without this.
Also, try the merge - does it work?
the dataset has three numbers, the variable is a charc varil, length is 3,
the dataset has the variable with four numbers (including the leading zero), is a numberical variable, length is 8 (so I cannot use substr for this one to remove zero, but I do not know how to use
sorry for my previous mistake that I thought both were numerical
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.