BookmarkSubscribeRSS Feed
ph7see
Calcite | Level 5

I have two datasets.

 

Dataset A:

id2000 vhouse2000 id2001 vhouse2001

1           100             3           500

2           200             4           600

3           300             

                                7           700

5           400             8           800

6           500

 

Dataset B:

id2000 vhouse2000 vstock2000 id2001 vhouse2001 vstock2001

1                                10               3                               50 

2                                20               4                               60

3                                30               5                               70

4                                40               6                               80  

5                                50               7                               90

6                                60               8                               100

 

I want to merge two datasets like this:

Merged dataset:

id2000 vhouse2000 vstock2000 id2001 vhouse2001 vstock2001

1           100                  10               3          500               50 

2           200                  20               4          600               60

3           300                  30               5                               70

4                                   40               6                               80  

5           400                  50               7          700               90              

6           500                  60               8          800               100

 

Please help me solve this problem. Thanks!

7 REPLIES 7
ballardw
Super User

@ph7see wrote:

I have two datasets.

 

Dataset A:

id2000 vhouse2000 id2001 vhouse2001

1           100             3           500

2           200             4           600

3           300             

                                7           700

5           400             8           800

6           500

 

Dataset B:

id2000 vhouse2000 vstock2000 id2001 vhouse2001 vstock2001

1                                10               3                               50 

2                                20               4                               60

3                                30               5                               70

4                                40               6                               80  

5                                50               7                               90

6                                60               8                               100

 

I want to merge two datasets like this:

Merged dataset:

id2000 vhouse2000 vstock2000 id2001 vhouse2001 vstock2001

1           100                  10               3          500               50 

2           200                  20               4          600               60

3           300                  30               5                               70

4                                   40               6                               80  

5           400                  50               7          700               90              

6           500                  60               8          800               100

 

Please help me solve this problem. Thanks!


From what you show this has a chance of working:

data merged;
   merge data_b data_a (drop=Id2000);
run;

You don't say anything about which "by variables" you might want to use from your subject line. If one of them is ID2000 you can't because it is MISSING. If you do not have a by variable merge will match the first row of the sets, then the second and so forth.

If your datasets are not in a matching order and you want to use any of those ID variables with missing values you need to fix the missing values before attempting to merge as you would need to sort them and the missing values would make the order almost certainly incorrect for a simple merge.

 

The order of datasets on the MERGE statement is important because the variables with the same names will have the values of the LAST ordered data set on the merge. Which is why I drop the ID2000 variable, the missing value would overwrite the one from data set b.

 

BTW, having what appear to be Year values in variable names for multiple similar named variables, vhouse2000 and vhouse2001 for example, often indicates a poor data structure. It would generally be better in the long run to have a separate variable with the Year and then just plain id, vhouse, vstock,

ph7see
Calcite | Level 5

Thank you so much for your help.

Id2000 variables in both data A and data B should be matched and update Vhouse2000 variable in data B based on data A. 

Similarly, Vhouse2001 variable needs to be updated from data A. 

 

The problem is that even though id2000=1 and id2001=3 are in the first low together, they are different persons.  

 

I ran your code to merge two datasets. 

The result looks like this:

id2000 vhouse2000 vstock2000 id2001 vhouse2001 vstock2001

1           100                  10               3          500               50 

2           200                  20               4          600               60

3           300                  30                                                70

4                                   40               7          700               80  

5           400                  50               8          800               90              

6           500                  60                                                100

 

Everything for year2000 (id2000, vhouse2000, and vstock2000) is good. However, two observations in id2001 are deleted (id2001 = 5 and 6) and as a result, the locations of id2001 =7 and 8 are wrong. 

 

ballardw
Super User

@ph7see wrote:

Thank you so much for your help.

Id2000 variables in both data A and data B should be matched and update Vhouse2000 variable in data B based on data A. 

Can't be done. Your data A did not have a value of 4 for Id2000. So there is not a way to match on the value of a variable that is missing. A variable to match on would have to be present to get expected results because to match BY value the values are going to need to be sorted and missing will not sort in the order you would want from the given example data.

 

Or fix your posted example.

Your data is so poor I wasn't interested in writing a data step from scratch to recreate it, so the code is not tested and was a close I could get some of the values.

 

 

Better yet provide data step that replicates your data set. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

ph7see
Calcite | Level 5

Thank you for suggestions. I will I think I forgot writing a value of 4 in the column of id2000. Thank you for pointing out. 

 

Data one;
input id2000 vhouse2000 id2001 vhouse2001;
datalines;
1 100 3 500
2 200 4 600
3 300 . .
4 . 7 700
5 400 8 800
6 500 . .
;
run;

Data two;
input id2000 vhouse2000 vstock2000 id2001 vhouse2001 vstock2001;
datalines;
1 . 10 3 . 50
2 . 20 4 . 60
3 . 30 5 . 70
4 . 40 6 . 80
5 . 50 7 . 90
6 . 60 8 . 100
;
run;

data merged;
merge two one (drop=id2000);
run;

 

ballardw
Super User

For this exact data:

data want;
   update two one (drop=id2001);
   by id2000;
run;

Update is not the same as merge. The default behavior is that only values present in the update data set (the second on the Update statement) replace values in the base data set (the first one on the update statement).

In your example two has the values of id2001 you appear to want in the output so drop them from set One to avoid overwriting the data.

 

Caution: This approach will not work if data set Two has repeats of values for Id2000. Update is intended to change values for only a master data set with one record for each value of the BY variable(s).

 

If you get third data set with 2002 values I pray for you as this kind of structure gets harder to work with when there are more similar values.

ph7see
Calcite | Level 5

I appreciate your advice. The original data's structure is poor like you mentioned. I will try to change the formatting. Thanks again.

Kurt_Bremser
Super User

Before you proceed any further, transpose the datasets so that the years have their own column alongside id, vhouse and vstock. Then it's a simple join or merge by id and year.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1737 views
  • 0 likes
  • 3 in conversation