BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Bal23
Lapis Lazuli | Level 10

I have two datasets, and I would like to merge with one code. but the coding system is different. the dataset1's code is three numbers, say, 123, the dataset2's code is four numbers, say, 0123

 

so I would like first create a new variable

icode2=input (substr(code,2,3), 3.);

 

but it generates . missing values for all obs

 

i also tried

icode2= substr (code, 2,3);

 

it generates blank for all obs

would you please let me know how to fix it

thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

data one;
input var1;
cards;
123
456
789
234
;
run;

data two;
input var2 $;
cards;
0123
0456
0789
234
;
run;

data two_formatted;
set two;
var1=input(var2, best12.);
run;

proc sort data=one; by var1; run;
proc sort data=two_formatted; by var1; run;

data merged;
merge one two_formatted;
by var1;
run;

View solution in original post

20 REPLIES 20
PGStats
Opal | Level 21

Translate both strings to numbers

 

input(code, best.) = input(code2, best.)

PG
Bal23
Lapis Lazuli | Level 10

Thank you. But I do not get that.

Can you explain in more detail? Step by step, if there are more than two steps?

Reeza
Super User

1. What are the types of each variable? They need to be the same, so is one numeric and one character, or are they formatted to appear as character with the leading zero. Can you post the proc contents output for each of the variables? The goal is to make them the same, either numeric or character - character is probably better since you don't have to worry about rounding issues. 

 

 

Bal23
Lapis Lazuli | Level 10

both are numerical variables

num 8

 

I am doing the first step. I want to create a new variable, named as icode2

icode2= substr(code1, 2,3)

 

database1 has code 1, it is like 0123,0231,0234

database2 has code 2, like 123, 231, 234

 

so I thought of substr to do this, to create a new varaible icode2, then merge it based on

code2=icode2

Reeza
Super User

@Bal23 wrote:

both are numerical variables

num 8

 



If both are numerical, how does it have leading zeroes? Is there a format attached?  

 

If it's numeric you shouldn't need to do anything to merge.  You can remove the format if it makes it easier for you to understand.  

 

SUBSTR() is a character function, it works on character strings, but you've stated both are numeric so it's not valid to use SUBSTR(). 

 

 

Bal23
Lapis Lazuli | Level 10

I am sorry I have not made it clear.

 

The two exsisted datasets have their already generated code system.

 

The dataset1  code is 0123, 0125, 0156

 

correspoindingly, the dataset 2 code is 123, 125, 156

 

they are just same, but becuase of different versions of coding system

i really just need to delete the dataset1  first zero so that I can merge, I need a multiple match merge

 

so the dataset 1 is a reference dataset with 100 obs, it explains all those meanings of 0123, 0125, 156  and their realted different cateogries

the dataset2 is a  dataset with over 10,000 obs. then the expect merged datasets with have all those obs with different explaination

i need to remove 0 from the dataset1 so that I can merge

Reeza
Super User

@Bal23 wrote:

I am sorry I have not made it clear.

 

The two exsisted datasets have their already generated code system.

 

The dataset1  code is 0123, 0125, 0156

 

 


You've stated the variable is numeric. Is it numeric with a format? A numeric variable CANNOT have a leading 0, unless it's formatted. However, when merging the underlying value is used not the formatted value. 

 

Post a proc contents to clarify this, because what you're saying contradicts itself. 

 

 

PGStats
Opal | Level 21

If both are numerical, as you state, and just look different because of their format, then they already match. No conversion needed. Data step MERGE and SAS/SQL joins match values based on internal representation. 

PG
Bal23
Lapis Lazuli | Level 10

They are different. 

For example, one would be 0235, the other in dataset2 should be 235,

what is an option to get rid of 0

That is why I tried to substr 

but as you said, substr is for character variable

so can you provide a sample code, an option to remove the first zero?

ballardw
Super User

Show us the PROC CONTENTS results on both datasets.

Reeza
Super User

What's the format on that variable? You say numeric so I'm guessing z4? Remove the leading zero by stripping the format. Then they'll 'look' the same. They're the same regardless, based on the information you've provided. 

 

format var2 8.;
Bal23
Lapis Lazuli | Level 10

I still do not get that, how to remove the leading zero, these two variables are both numerical

 

Reeza
Super User

Did you run the code? 

 

Post your proc contents - I have no suggestions without this.  

 

Also, try the merge - does it work? 

Bal23
Lapis Lazuli | Level 10

the dataset has three numbers, the variable is a charc varil, length is 3,

the dataset has the variable with four numbers (including the leading zero), is a numberical variable, length is 8 (so I cannot use substr for this one to remove zero, but I do not know how to use

sorry for my previous mistake that I thought both were numerical

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 1296 views
  • 1 like
  • 5 in conversation