Join 2 tables by prioritizing the data

gandikk · Posted 10-30-2024 05:34 AM

I need to join the below 2 tables on Key. in the first table I have key1 and In the second table, I have key2, Rank and Contact data.

In the final table, I need Rank1 data in contact1 and if Rank1 is not there, I need to check next Rank for contact1 data. If Rank1 data comes to contact1, then I shouldn't consider Rank1 for contact2 and contact3. Is there a way to do using proc sql?

Table A:

Key1
1234
3456

Table B

Key2	Rank	contact
1234	1	owner
1234	2	co-owner
1234	3	CEO
3456	2	co-owner
3456	3	CEO

Final Table:

key	contact1	contact2	contact3
1234	owner	co-owner	CEO
3456	co-owner	CEO

SAS@EMMAUS

PaigeMiller · Posted 10-30-2024 05:55 AM

First, I don't think you can do this entirely in SQL (without multiple joins which could be extremely slow on real-world large data sets)

Second, since I believe strongly that you are better off with long data sets rather than wide data sets (Maxim 19), here is the solution that produces a long data set from this data. (Which is also easy in SQL)

data want;
     merge tableA tableB(rename=(key2=key1));
     by key1;
run;

--
Paige Miller

LinusH · Posted 10-30-2024 08:03 AM

I agree with @PaigeMiller here.

The only (?) reason to do it in SQL if your (large) data resides in an external databbase, so you can push the query there.

Then you need to join with table B three times, and using Rank in the join criterias.

Data never sleeps

PaigeMiller · Posted 10-30-2024 08:04 AM

@LinusH wrote:

I agree with @PaigeMiller here.

The only (?) reason to do it in SQL if your (large) data resides in an external databbase, so you can push the query there.

Then you need to join with table B three times, and using Rank in the join criterias.

That assumes there are only 3 values of rank, which may or may not be known beforehand. Anyway, if you create a long data set, you could do a PROC TRANSPOSE to get a wide data set and then you don't need to know beforehand how many values of rank there are.

--
Paige Miller

ballardw · Posted 10-30-2024 10:25 AM

Example of @PaigeMiller's suggestion.

Note the data in the form of working data steps (hint);

data work.a;
   input Key $;
datalines;
1234
3456
;

Data work.B;
  input Key $ 	Rank 	contact $;
datalines;
1234 	1 	owner
1234 	2 	co-owner
1234 	3 	CEO
3456 	2 	co-owner
3456 	3 	CEO
;

data work.merged;
   merge work.a work.b;
   by key;
run;

proc transpose data=work.merged out=work.trans (drop=_name_)
    prefix=contact;
   by key;
   var contact;
run;

If your data isn't sorted, or just in case, I would suggest sorting Work.A (or whatever you data set name is) by the Key. Then Work.B by the Key and Rank.

If your data sets do have differently named key variables a data set option rename would address that:

data work.merged;
   merge work.a work.b(rename=(key2=key));
   by key;
run;

thomas_aquas · Posted 11-01-2024 09:59 AM

You can achieve the desired result using a combination of the ROW_NUMBER() function and conditional aggregation. Here’s a SQL query that accomplishes your goal:

WITH RankedContacts AS (
    SELECT 
        Key2 AS [Key],
        Contact,
        ROW_NUMBER() OVER (PARTITION BY Key2 ORDER BY Rank) AS RowNum
    FROM 
        TableB
)
SELECT 
    [Key],
    MAX(CASE WHEN RowNum = 1 THEN Contact END) AS contact1,
    MAX(CASE WHEN RowNum = 2 THEN Contact END) AS contact2,
    MAX(CASE WHEN RowNum = 3 THEN Contact END) AS contact3
FROM 
    RankedContacts
GROUP BY 
    [Key];

Result:

This query first ranks the contacts based on their rank for each key. Then, it uses conditional aggregation to assign the appropriate contacts to contact1, contact2, and contact3.

A more extensive discussion of this issue can be found here: row number in sql server

Kurt_Bremser · Posted 11-01-2024 10:28 AM

This, of course, will only work in a remote database that understands this particular syntax. It will not work in SAS.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tom · Posted 11-01-2024 10:32 AM

Note that PROC SQL does not support windowing functions nor WITH clauses so the only way to run something like that would be via explicit pass thru to some external database system.

That syntax seems to be SQL SERVER specific, but could probably be adopted for most other external databases that support SQL windowing functions and WITH clauses.

Kurt_Bremser · Posted 11-01-2024 10:26 AM

For further analysis, a long layout is better. The wide layout can easily achieved in PROC REPORT for reporting purposes:

data a;
input Key1 $;
datalines;
1234
3456
;

data b;
infile datalines dsd dlm="09"x;
input Key2 $ Rank contact $;
datalines;
1234	1	owner
1234	2	co-owner
1234	3	CEO
3456	2	co-owner
3456	3	CEO
;

data want;
merge
  a
  b (rename=(key2=key1))
;
by key1;
if first.key1
then n = 1;
else n + 1;
cont = cats("contact",n);
run;

proc report data=want;
column key1 (contact rank),cont;
define key1 / "" group;
define contact / "" display;
define rank / noprint;
define cont / "" across;
run;

Using rank with noprint is necessary because REPORT wants at least one ANALYSIS variable under the ACROSS, and this needs to be numeric.

Result:

 	contact1	contact2	contact3
1234	owner	co-owner	CEO
3456	co-owner	CEO

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tom · Posted 11-01-2024 10:49 AM

I don't see why you would need to combine the two tables. Add the first table does not add any information that is not already in the second table. Can you explain why they need to be joined?

Just sort the data by KEY (which ever name for it you want to use) and RANK and then transpose it.

Your table appears to already be sorted.

data have;
  input Key2 Rank contact $ ;
cards;
1234 1 owner
1234 2 co-owner
1234 3 CEO
3456 2 co-owner
3456 3 CEO
;

proc transpose data=have out=want(drop=_name_) prefix=contact;
  by key2;
  var contact;
run;

Result

Obs    Key2    contact1    contact2    contact3

 1     1234    owner       co-owner      CEO
 2     3456    co-owner    CEO

Join 2 tables by prioritizing the data

Re: Join 2 tables by prioritizing the data

Re: Join 2 tables by prioritizing the data

Re: Join 2 tables by prioritizing the data

Re: Join 2 tables by prioritizing the data

Re: Join 2 tables by prioritizing the data

Re: Join 2 tables by prioritizing the data

Re: Join 2 tables by prioritizing the data

Re: Join 2 tables by prioritizing the data

Re: Join 2 tables by prioritizing the data

Register Today!

SAS Training: Just a Click Away