Contributor
Posts: 33

# Merging rows by ID which isn't unique

Hi

I have a following problem, what I have right now is:

ID | V1 | V2 | V3 |
----------------------
1   | x   |       |      |

1   | x   |       |      |

1   | x   |       |      |

1   |      | x    |      |

1   |      | x    |      |

1   |      | x    |      |

1   |      |       |  x  |

1   |      |       |  x  |

1   |      |       |  x  |

2   | x   |       |      |

2   |      | x    |      |

2   |      |       |  x  |

3   | x   |       |      |

3   | x   |       |      |

3   |      | x    |      |

3   |      | x    |      |

3   |      |       | x   |

3   |      |       | x   |

4   | x   |       |      |

4   | x   |       |      |

4   |      | x    |      |

4   |      | x    |      |

4   |      |       | x   |

4   |      |       | x   |

and other variants.

What I would like to have:

1   | x   | x    | x    |

1   | x   | x    | x    |

1   | x   | x    | x    |

2   | x   | x    | x    |

3   | x   | x    | x    |

3   | x   | x    | x    |

4   | x   | x    | x    |

4   | x   | x    | x    |

I'd be grateful!

New Contributor
Posts: 2

## Re: Merging rows by ID which isn't unique

[ Edited ]

You can try the following code, hope it can help you.

Assuming your datatset is called "a":

data V1;
set a;
where V1 is not missing;
keep ID V1;
run;

data V2;
set a;
where V2 is not missing;
keep ID V2;
run;

data V3;
set a;
where V3 is not missing;
keep ID V3;
run;

data a;
merge V1 V2 V3;
by ID;
run;

proc datasets nolist;

delete V1 V2 V3;

run;

-------------------------------------------------------------------------------------

If you have more variables in the dataset similar to this pattern, just create more V<n> datasets in the same way, and then merging them in the data step.

The proc dataset is used to delete those workings.

Respected Advisor
Posts: 4,802

## Re: Merging rows by ID which isn't unique

...or this way:

``````
data want;
merge
have(keep=id v1 where=(not missing(v1)))
have(keep=id v2 where=(not missing(v2)))
have(keep=id v3 where=(not missing(v3)))
;
by id;
run;``````

The code assumes that - like in your sample data - per id you've got always the same number of rows with a populated v1, v2 and v3.

If this should not be the case with your real data then please explain to us how the desired data set should look like in such a case.

Discussion stats
• 2 replies
• 338 views
• 1 like
• 3 in conversation