DATA Step, Macro, Functions and more

Merging one-to-many creates missing values

Accepted Solution Solved
Reply
Contributor
Posts: 27
Accepted Solution

Merging one-to-many creates missing values

Thank everybody for the previous help. 
I have one more question. 
 

I am trying to merge two datasets: 

 
data dataSet1;
 type1 something1
 type1 something2
 type1 something3 
 type2 something1
 type2 something2
 type2 something3 
 type3 something1
 type3 something2
 type3 something3 
 
data types;
  input myText textType;
  datalines;
   a  type1 
   b  type2 
   c type3
run;
 
I would like to merge these two to get
 
 type1 something1 a
 type1 something2 a
 type1 something3 a 
 type2 something1 b
 type2 something2 b
 type2 something3 b 
 type3 something1 c
 type3 something2 c
 type3 something3 c
 
After this I can get rid of Type variable completely. So, basically, I would like to replace type1 by the corresponding 'a', type2 by the corresponding 'b' etc.
 
The simple code: 
 
 data dataset1;
  merge dataset1 types;
  BY textType;
RUN;  
 
creates a strange result: 
 
 type1 something1 a
 type1 something2 
 type1 something3  
 type2 something1 b
 type2 something2 
 type2 something3  
 type3 something1 c
 type3 something2 
 type3 something3 
 
(so, textType appears only  first time and all the rest are missing values. What is wrong here? What is the most efficient way to do it?  
 
Thank you very much
 
Julia

Accepted Solutions
Solution
‎01-25-2016 11:56 PM
Super User
Posts: 5,082

Re: Merging one-to-many creates missing values

It's a combination of two factors.  First, it's likely you tested and re-tested, trying to get this to work.  And second, you are reusing the name DATASET1 as both the input and the output.

 

All of that is legal.  However, it likely created the situation where DATASET1 already contains MYTEXT (possibly with all missing values at some point along the way).  Get rid of it.

 

data dataset1;
  merge dataset1 (drop=MYTEXT) types;
  BY textType;
RUN;  
 
That should fix the problem.

View solution in original post


All Replies
Super User
Posts: 10,500

Re: Merging one-to-many creates missing values

[ Edited ]

Order of sets on the MERGE statement makes a difference:

data combined;
   merge types dataset1;
   by texttype;
run;

But Many to many is not going to work as desired many types.

 

 

What you are doing is a JOIN. Merge is doing exactly what the documentation says it will.

Another way that is more flexible is something like this. Also the JOIN using SQL does not require sorting the data first.

proc sql;
   create table combined as 
   select dataset1.*, types.mytext
   from dataset1 left join types
      on dataset1.texttype = types.texttype;
quit;

BTW you dataset code for TYPES has an error as MYTEXT is considered numeric as you haven't defined it as character and will generate missing valures for MYTEXT and error messages of invalid data.

 

Contributor
Posts: 27

Re: Merging one-to-many creates missing values

Thank you for the explanation! Got it!
Contributor
Posts: 38

Re: Merging one-to-many creates missing values

 data dataset1;
  merge dataset1 types;
  BY textType something;
RUN;  

You need to specify the key of the somethingX field.

Super Contributor
Posts: 490

Re: Merging one-to-many creates missing values

Why!!??

Super Contributor
Posts: 490

Re: Merging one-to-many creates missing values

[ Edited ]

Your code is fine and should work...

 

data dataSet1;
input textType $ 1-5  textSome $ 7-17 ;
cards;
type1 something1
type1 something2
type1 something3 
type2 something1
type2 something2
type2 something3 
type3 something1
type3 something2
type3 something3 
 ;run;
 
data types;
  input myText $ textType $3-7;
  datalines;
a type1 
b type2 
c type3
   ;
run;
proc sort data=dataset1; by textType; run;
proc sort data=types; by textType; run;

data dataset1;
  merge dataset1 types;
  BY textType;
  drop textType;
RUN;  

Try run  the code above, you did not show your full code.

Solution
‎01-25-2016 11:56 PM
Super User
Posts: 5,082

Re: Merging one-to-many creates missing values

It's a combination of two factors.  First, it's likely you tested and re-tested, trying to get this to work.  And second, you are reusing the name DATASET1 as both the input and the output.

 

All of that is legal.  However, it likely created the situation where DATASET1 already contains MYTEXT (possibly with all missing values at some point along the way).  Get rid of it.

 

data dataset1;
  merge dataset1 (drop=MYTEXT) types;
  BY textType;
RUN;  
 
That should fix the problem.
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 338 views
  • 1 like
  • 5 in conversation