Merging

Accepted Solution Solved
Reply
Contributor
Posts: 62
Accepted Solution

Merging

[ Edited ]

I want to merge table1 and table2 on Country_one and Time (for table1) and Country and Count (for table2).

When the table2 variables are missings, the merge have to be set on : country_two and Time (for table1)
and always Country and Count (for table2).

 

All the table1 's obs have to be in the final merged table [in= tb1 ... if tb1].

 

How to handle it as smartly as possible ?

Have a look at my attachment to see my different tables.

 

Thanks a lot !

Regards

data table1;
input (id coutry_one country_two time) ($) ;
cards;
xa1 al al 20t
xb5 bk al 5t
xc6 kz kz 10t
xd9 ca ca 1t
xe7 al al 20t
xf8 kz kz 5t
xg8 al al 5t
xf1 ca ca 10t
xi2 kz kz 5t
xk7 bk ca 1t
xf3 bk kz 10t
xf6 ca ca 20t
xz3 kz bk 5t
xw1 al bk 1t
xs9 ca bk 10t
xv1 ss al 5t
xv2 st ca 10t
xv3 sr kz 5t
;run;

data table2;
input (coutry count) ($) var1-var15;
cards; 
al 20t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
bk 5t . . . . . 6 7 8 9 10 11 12 13 14 15
bk 1t . . . . . 6 7 8 9 10 11 12 13 14 15
bk 10t . . . . . 6 7 8 9 10 11 12 13 14 15 
kz 10t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ca 1t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
al 20t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
kz 5t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
al 5t 1 2 3 4 5 6 7 8 9 10 . . . 14 15
ca 10t 1 2 3 4 5 6 7 8 9 10 . . . 14 15
kz 5t 1 2 3 4 5 6 7 8 9 10 . . . 14 15 
ca 20t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
al 8t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
kz 7t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
al 6t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
kz 6t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ca 6t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
al 8t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
kz 7t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
kz 13t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ca 12t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
al 18t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
kz 17t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
ss 5t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
st 10t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
sr 5t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
;run;

Accepted Solutions
Solution
‎05-11-2017 07:39 AM
PROC Star
Posts: 7,432

Re: Merging

You haven't answered my question, thus I have no idea what you really want to accomplish.

 

Your ask indicated that you wanted to know how to merge two files, always include all of the records from table 1 but, if no match was found for Country1, try to match on Country2.

 

That part is realtively easy. However, in your want table, you leave off one record, namely xw1 al bk 1t.

 

Also, your want table also suggests that you want to go beyond whether a match does or doesn't exist, but to use different times if a matching time has any missing values for var1-var15. Without specifying your rules, we can only guess what you are trying to do.

 

In the following, I've changed your time and data values so that I could see what my suggested code was doing. As such, I've recoded some of your times. Specifically, for time, I changed 5 to 2, 10 to 3, and 20 to 4. I also changed your Table2 var values to reflect the time value. Thus, for time 1, I used the values 1.1, 2.1 .. 15.1 and did the same for the entire table.

 

My suggestion is for you to review the following code, decide if it is doing what you want and, if not, let us know what rules you are trying to satisfy. Since no one has suggested a solution, I'd have to guess that no one on the forum yet understands exactly what you are trying to accomplish.

data table1;
  input (id country_one country_two time) ($) ;
  cards;
xa1 al al 4t
xb5 bk al 2t
xc6 kz kz 3t
xd9 ca ca 1t
xe7 al al 4t
xf8 kz kz 2t
xg8 al al 2t
xf1 ca ca 3t
xi2 kz kz 2t
xk7 bk ca 1t
xf3 bk kz 3t
xf6 ca ca 4t
xz3 kz bk 2t
xw1 al bk 1t
xs9 ca bk 3t
xv1 ss al 2t
xv2 st ca 3t
xv3 sr kz 2t
;
run;

data table2;
  input (country count) ($) var1-var15;
  cards; 
al 4t 1.4 2.4 3.4 4.4 5.4 6.4 7.4 8.4 9.4 10.4 11.4 12.4 13.4 14.4 15.4
bk 2t . . . . . 6.2 7.2 8.2 9.2 10.2 11.2 12.2 13.2 14.2 15.2
bk 1t . . . . . 6.1 7.1 8.1 9.1 10.1 11.1 12.1 13.1 14.1 15.1
bk 3t . . . . . 6.3 7.3 8.3 9.3 10.3 11.3 12.3 13.3 14.3 15.3 
kz 3t 1.3 2.3 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3 11.3 12.3 13.3 14.3 15.3
ca 1t 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1 11.1 12.1 13.1 14.1 15.1
al 4t 1.4 2.4 3.4 4.4 5.4 6.4 7.4 8.4 9.4 10.4 11.4 12.4 13.4 14.4 15.4
kz 2t 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2 11.2 12.2 13.2 14.2 15.2 
al 2t 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2 . . . 14.2 15.2
ca 3t 1.3 2.3 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3 . . . 14.3 15.3
kz 2t 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2 . . . 14.2 15.2 
ca 4t 1.4 2.4 3.4 4.4 5.4 6.4 7.4 8.4 9.4 10.4 11.4 12.4 13.4 14.4 15.4 
al 8t 1.8 2.8 3.8 4.8 5.8 6.8 7.8 8.8 9.8 10.8 11.8 12.8 13.8 14.8 15.8
kz 7t 1.7 2.7 3.7 4.7 5.7 6.7 7.7 8.7 9.7 10.7 11.7 12.7 13.7 14.7 15.7
al 6t 1.6 2.6 3.6 4.6 5.6 6.6 7.6 8.6 9.6 10.6 11.6 12.6 13.6 14.6 15.6
kz 6t 1.6 2.6 3.6 4.6 5.6 6.6 7.6 8.6 9.6 10.6 11.6 12.6 13.6 14.6 15.6
ca 6t 1.6 2.6 3.6 4.6 5.6 6.6 7.6 8.6 9.6 10.6 11.6 12.6 13.6 14.6 15.6
al 8t 1.8 2.8 3.8 4.8 5.8 6.8 7.8 8.8 9.8 10.8 11.8 12.8 13.8 14.8 15.8
kz 7t 1.7 2.7 3.7 4.7 5.7 6.7 7.7 8.7 9.7 10.7 11.7 12.7 13.7 14.7 15.7 
kz 13t 1.13 2.13 3.13 4.13 5.13 6.13 7.13 8.13 9.13 10.13 11.13 12.13 13.13 14.13 15.13
ca 12t 1.12 2.12 3.12 4.12 5.12 6.12 7.12 8.12 9.12 10.12 11.12 12.12 13.12 14.12 15.12
al 18t 1.18 2.18 3.18 4.18 5.18 6.18 7.18 8.18 9.18 10.18 11.18 12.18 13.18 14.18 15.18
kz 17t 1.17 2.17 3.17 4.17 5.17 6.17 7.17 8.17 9.17 10.17 11.17 12.17 13.17 14.17 15.17 
ss 2t 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2 11.2 12.2 13.2 14.2 15.2 
st 3t 1.3 2.3 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3 11.3 12.3 13.3 14.3 15.3 
sr 2t 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2 11.2 12.2 13.2 14.2 15.2 
;
run;

data table2a miss2;
  set table2;
  countn=input(compress(count,,'kd'),8.);
  if nmiss(of var1-var15) eq 0 then output table2a;
  else output miss2;
run;

proc sort data=table2a;
  by country countn;
run;

data table2b;
  set table2a;
  by country;
  if first.country;
  call missing(count);
run;

data miss2;
  set miss2 table2b;
run;

proc sort data=miss2;
  by country;
run;

data miss2;
  set miss2;
  by country;
  if first.country and missing(count) then delete;
run;

data miss2a;
  update miss2 (obs=0) miss2;
  by country;
run;

data table1s;
  set table1;
  country=country_one;
run;

proc sort data=table1s out=table1s;
  by country time;
run;

data table1as;
  set table1;
  country=country_two;
  if country ne country_one;
run;

proc sort data=table1as out=table1as;
  by country time;
run;

data table2s;
  set table2a miss2a;
run;

proc sort data=table2s (rename=(count=time)) out=table2s nodupkey;
  by country time;
run;

data merged;
  merge table1s (in=in1)
        table2s (in=in2);
  by country time;
  if in1;
run;

data merged2;
  merge table1as (in=in1a)
        table2s (in=in2);
  by country time;
  if in1a;
run;

data merged;
  set merged merged2;
run;

proc sort data=merged;
  by id;
run;

data want;
  update merged (obs=0) merged;
  by id;
run;

Art, CEO, AnalystFinder.com

View solution in original post


All Replies
Super User
Posts: 5,360

Re: Merging

There's no way to predict what you are searching for as the final result.  You will have to spell it out.

Contributor
Posts: 62

Re: Merging

Final data wanted, in the attachement file.

Thank you !

Regards

Attachment
Super User
Posts: 19,063

Re: Merging

Please consider embedding it directly into the post. Attachments are harder to work with, and there are often restrictions on downloading files.

PROC Star
Posts: 7,432

Re: Merging

In your final output where does the following line come from:

xb5 bk al 5t     bk 5t     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Art, CEO, AnalystFinder.com

 

Valued Guide
Posts: 947

Re: Merging

You have a few instances of duplicated keys in table2.  For instance you have two records with country=kz and count=5t.  One of those records has valid values for all vars var1-var15, and the other doesn't.  Which one should be used in the merging? 

PROC Star
Posts: 7,432

Re: Merging

I can understand the desired result for the one record I asked about if you want to use the update statement after doing merges.

However, that wouldn't explain:

xf1 ca ca 10t    ca 10t    1 2 3 4 5 6 7 8 9 10 . . . 14 15      obs replace by ===> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Please explain where the var1-var15 data comes from for that id.

 

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 62

Re: Merging

Response to mkeintz:

 

It is true, in the table2, there are these two obs.

In the second, wisch have missing values, and only on missing var poitns, the merging is done on the condition:

Country_two = Country and Time = Count.

 

Country_two and Time   ====>        are from table1

Country and Count         ====>        are from table2

 

As i use here the same values for var1 to var15 (1 2 3  ... 15) , it will conduct to a duplicate obs in the final result. Don't consider that as an issue.

Thanks !

 

Again, the difficulty i face is to handle smartly, a merging on 2 different conditions.

The first for the non missing var values in table2 : Country_one = Country and Time = Count

The second for the missing var values in table2 : Country_two = Country and Time = Count

 

Country_one, Country_two and Time      ====>  coming from table1

Country and Count                                  ====>  coming from table2

 

Thanks a lot !

Regards

Contributor
Posts: 62

Re: Merging

Response to Art297:

 

Tom (thank to him) has edited my post in order to answer to your ask.

Thanks a lot !

Regards 

PROC Star
Posts: 7,432

Re: Merging

I don't see an answer to my question. Neither the match with country_one or country_two contain match with a record that contains data for var1 thru var15 inclusive, but you show values of 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

 

For that one record, please show where (from which Table2 record(s) those values are obtained.

 

Art, CEO, AnalystFinder.com

 

Solution
‎05-11-2017 07:39 AM
PROC Star
Posts: 7,432

Re: Merging

You haven't answered my question, thus I have no idea what you really want to accomplish.

 

Your ask indicated that you wanted to know how to merge two files, always include all of the records from table 1 but, if no match was found for Country1, try to match on Country2.

 

That part is realtively easy. However, in your want table, you leave off one record, namely xw1 al bk 1t.

 

Also, your want table also suggests that you want to go beyond whether a match does or doesn't exist, but to use different times if a matching time has any missing values for var1-var15. Without specifying your rules, we can only guess what you are trying to do.

 

In the following, I've changed your time and data values so that I could see what my suggested code was doing. As such, I've recoded some of your times. Specifically, for time, I changed 5 to 2, 10 to 3, and 20 to 4. I also changed your Table2 var values to reflect the time value. Thus, for time 1, I used the values 1.1, 2.1 .. 15.1 and did the same for the entire table.

 

My suggestion is for you to review the following code, decide if it is doing what you want and, if not, let us know what rules you are trying to satisfy. Since no one has suggested a solution, I'd have to guess that no one on the forum yet understands exactly what you are trying to accomplish.

data table1;
  input (id country_one country_two time) ($) ;
  cards;
xa1 al al 4t
xb5 bk al 2t
xc6 kz kz 3t
xd9 ca ca 1t
xe7 al al 4t
xf8 kz kz 2t
xg8 al al 2t
xf1 ca ca 3t
xi2 kz kz 2t
xk7 bk ca 1t
xf3 bk kz 3t
xf6 ca ca 4t
xz3 kz bk 2t
xw1 al bk 1t
xs9 ca bk 3t
xv1 ss al 2t
xv2 st ca 3t
xv3 sr kz 2t
;
run;

data table2;
  input (country count) ($) var1-var15;
  cards; 
al 4t 1.4 2.4 3.4 4.4 5.4 6.4 7.4 8.4 9.4 10.4 11.4 12.4 13.4 14.4 15.4
bk 2t . . . . . 6.2 7.2 8.2 9.2 10.2 11.2 12.2 13.2 14.2 15.2
bk 1t . . . . . 6.1 7.1 8.1 9.1 10.1 11.1 12.1 13.1 14.1 15.1
bk 3t . . . . . 6.3 7.3 8.3 9.3 10.3 11.3 12.3 13.3 14.3 15.3 
kz 3t 1.3 2.3 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3 11.3 12.3 13.3 14.3 15.3
ca 1t 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1 11.1 12.1 13.1 14.1 15.1
al 4t 1.4 2.4 3.4 4.4 5.4 6.4 7.4 8.4 9.4 10.4 11.4 12.4 13.4 14.4 15.4
kz 2t 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2 11.2 12.2 13.2 14.2 15.2 
al 2t 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2 . . . 14.2 15.2
ca 3t 1.3 2.3 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3 . . . 14.3 15.3
kz 2t 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2 . . . 14.2 15.2 
ca 4t 1.4 2.4 3.4 4.4 5.4 6.4 7.4 8.4 9.4 10.4 11.4 12.4 13.4 14.4 15.4 
al 8t 1.8 2.8 3.8 4.8 5.8 6.8 7.8 8.8 9.8 10.8 11.8 12.8 13.8 14.8 15.8
kz 7t 1.7 2.7 3.7 4.7 5.7 6.7 7.7 8.7 9.7 10.7 11.7 12.7 13.7 14.7 15.7
al 6t 1.6 2.6 3.6 4.6 5.6 6.6 7.6 8.6 9.6 10.6 11.6 12.6 13.6 14.6 15.6
kz 6t 1.6 2.6 3.6 4.6 5.6 6.6 7.6 8.6 9.6 10.6 11.6 12.6 13.6 14.6 15.6
ca 6t 1.6 2.6 3.6 4.6 5.6 6.6 7.6 8.6 9.6 10.6 11.6 12.6 13.6 14.6 15.6
al 8t 1.8 2.8 3.8 4.8 5.8 6.8 7.8 8.8 9.8 10.8 11.8 12.8 13.8 14.8 15.8
kz 7t 1.7 2.7 3.7 4.7 5.7 6.7 7.7 8.7 9.7 10.7 11.7 12.7 13.7 14.7 15.7 
kz 13t 1.13 2.13 3.13 4.13 5.13 6.13 7.13 8.13 9.13 10.13 11.13 12.13 13.13 14.13 15.13
ca 12t 1.12 2.12 3.12 4.12 5.12 6.12 7.12 8.12 9.12 10.12 11.12 12.12 13.12 14.12 15.12
al 18t 1.18 2.18 3.18 4.18 5.18 6.18 7.18 8.18 9.18 10.18 11.18 12.18 13.18 14.18 15.18
kz 17t 1.17 2.17 3.17 4.17 5.17 6.17 7.17 8.17 9.17 10.17 11.17 12.17 13.17 14.17 15.17 
ss 2t 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2 11.2 12.2 13.2 14.2 15.2 
st 3t 1.3 2.3 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3 11.3 12.3 13.3 14.3 15.3 
sr 2t 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2 11.2 12.2 13.2 14.2 15.2 
;
run;

data table2a miss2;
  set table2;
  countn=input(compress(count,,'kd'),8.);
  if nmiss(of var1-var15) eq 0 then output table2a;
  else output miss2;
run;

proc sort data=table2a;
  by country countn;
run;

data table2b;
  set table2a;
  by country;
  if first.country;
  call missing(count);
run;

data miss2;
  set miss2 table2b;
run;

proc sort data=miss2;
  by country;
run;

data miss2;
  set miss2;
  by country;
  if first.country and missing(count) then delete;
run;

data miss2a;
  update miss2 (obs=0) miss2;
  by country;
run;

data table1s;
  set table1;
  country=country_one;
run;

proc sort data=table1s out=table1s;
  by country time;
run;

data table1as;
  set table1;
  country=country_two;
  if country ne country_one;
run;

proc sort data=table1as out=table1as;
  by country time;
run;

data table2s;
  set table2a miss2a;
run;

proc sort data=table2s (rename=(count=time)) out=table2s nodupkey;
  by country time;
run;

data merged;
  merge table1s (in=in1)
        table2s (in=in2);
  by country time;
  if in1;
run;

data merged2;
  merge table1as (in=in1a)
        table2s (in=in2);
  by country time;
  if in1a;
run;

data merged;
  set merged merged2;
run;

proc sort data=merged;
  by id;
run;

data want;
  update merged (obs=0) merged;
  by id;
run;

Art, CEO, AnalystFinder.com

Contributor
Posts: 62

Re: Merging

First af all, i apologize for my long absence and i would like to think you (Mr Art297) for your precious help to us.

You are right, your question was not responded. Here I answer to it, but as it is so late where i live, i will look deeply to your 

last post tomorrow. Thank you again.

 

First, it looks for the first merging condition and find a same
[bk 5t] in both table : Country_one and Counry are equal to (bk)
and Time and Count are equal to 5t.
Thus,it obtain the value of var6 to var15 as i noted [6 7 8 9 10 11 12 13 14 15] take from the table2.
As the values of var1 to var5 are missing in this obs, now it use
the second merging condition, that is Country_two = Counry and Time = Count.
For this condition, i find [al 5t] for table1 and the same and unique [al 5t] in table2. In this obs, i take the remaining values for var1 to var6.

 

Thanks and Regards

PROC Star
Posts: 7,432

Re: Merging

Now you still have to explain what do to if there are still missing values after the match is done using both Country values.

 

You also have to explain why one record from Table1 was totally left out of the example want table.

 

Art, CEO, AnalystFinder.com

 

Super User
Posts: 5,360

Re: Merging

It's still a little difficult to figure, but this must at least come close.  Assuming you already have your TABLE1 and TABLE2 ...

 

data table1_key1 table1_key2;

set table1;

key = country1 || time;

output table1_key1;

key = country2 || time;

output table2_key;

run;

proc sort data=table1_key1;

   by key;

run;

proc sort data=table1_key2;

   by key;

run;

 

data table2_key;

   set table2;

   key = country || count;

run;

proc sort data=table2_key;

   by key;

run;

 

That gives you enough data to find the matches based on either key:

 

data want1;

merge table1_key1 (in=keepme) table2_key;

by key;

if keepme;

run;

data want2;

merge table1_key2 (in=keepme) table2_key;

by key;

if keepme;

run;

 

Now the question is how to put them back together again.  Here is a simple way that might need some tweaking.

 

proc sort data=want1;

by id country1 country2 time;

run;

 

proc sort data=want2;

by id country1 country2 time;

run;

 

data want;

update want2 want1;

by id country1 country2 time;

drop key;

run;

 

I think this does most of what you asked for.  The piece I'm not sure about:  which values of COUNTRY and COUNT belong in the final data set when some of the numbers may have come from different observations.

 

Contributor
Posts: 62

Re: Merging

True, there is no corresponding obs for
[xw1 al bk 1t] (from table1) in my table2 example.
It is a non valontary omission When I was trying to get
something that resembled my data.

 

Its corresponding obs would have to be a [al 1t]
in table2. And it result to the absence of [xw1 al bk 1t]
in my final wanted table.

 

But nevertheless, I very much adhere to what you have proposed
[Art297 - Astounding], and this allows me to really obtain what I seek.

 

Thanking you
Regards

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 248 views
  • 0 likes
  • 5 in conversation