Issue Merging Two Datasets

Mgarret · Posted 01-07-2012 02:17 PM

Hi all--

I am trying to merge two datasets together. When the datasets are merged the variables need to go in a certain order. For dataset Have2 I have series questions which all start with the same prefix like Q15_1 Q15_2 Q15_3...... or like Q55_1 Q55_2 Q55_3 Q55_4 Q55_5....... The value _# tagged on the end of the variable can change. To take this into account im useing the wildcard ":" to generate lists of variables with these prefixes.

Please see my code below.

Here is my issue: Variables that start with Q55 or Q15 wich are also contained in Have1 are being pulled in to Want, which they should not be. These types of variables need to only be pulled in from Have2. Again, the variables brought in from Have1 and Have2 must fall into a certain order for Want.

data Want;

merge Have1 Have2;

keep

CASE_NO /*from Have1 */

Agency_Name /*from Have1 */

Program_Name /*from Have1 */

Q55: /*from Have2 */

CONNX_CaseID /*from Have1 */

Q15: /*from Have2 */

Case_Name /*from Have1 */

Case_Age /*from Have1 */

by CASE_NO;

run;

Any help is greatly aperciated. Thank you.

Linlin · Posted 01-07-2012 02:27 PM

change your code to:

data Want;

merge Have1(keep=case_: Agency_Name Program_Name CONNX_CaseID)

Have2(keep=case_no q55: q15:);

by CASE_NO;

run;

Linlin

Mgarret · Posted 01-07-2012 02:38 PM

Ok. I see... and then use another Data step to organize the variables in their correct order, right?

art297 · Posted 01-07-2012 02:47 PM

Why? As long as you aren't doing any data manipulation other than the merge, why not include a retain statement within the same datastep as the merge .. right after the initial data statement?

Mgarret · Posted 01-07-2012 02:50 PM

Ok. Thanks. I am not too familar with the retain statement so it might be easier for me to just make another datastep.

art297 · Posted 01-07-2012 03:58 PM

You will still need to use a retain, length of other statement that can affect variable order. The only time a separate datastep is needed is when you are doing other things like computes, if then computes, etc.

Its use is simply the word retain, followed by a space, followed by all of the variables you want to put at the left most side of the record, in the order that you want them to appear, separated by spaces, and ending the statement with a semicolon.

The only peculiarity, but this goes with any other statement you might use to reorder your data, is that the statement must appear BEFORE the set (or in your case merge) statement.

LinusH · Posted 01-08-2012 12:25 PM

Why do you need the variables in a certain order. Just becuase it's nice?

Any time when you query data can specify a desired variable order, or you can have a view (or information map) on top of the table.

/Linus

Data never sleeps

art297 · Posted 01-07-2012 02:29 PM

Why not simply drop them in your merge statement? I.e.,

data Want;

merge Have1 (drop=q55: q15:) Have2;