I need to join the below 2 tables on Key. in the first table I have key1 and In the second table, I have key2, Rank and Contact data.
In the final table, I need Rank1 data in contact1 and if Rank1 is not there, I need to check next Rank for contact1 data. If Rank1 data comes to contact1, then I shouldn't consider Rank1 for contact2 and contact3. Is there a way to do using proc sql?
Table A:
Key1 |
---|
1234 |
3456 |
Table B
Key2 | Rank | contact |
---|---|---|
1234 | 1 | owner |
1234 | 2 | co-owner |
1234 | 3 | CEO |
3456 | 2 | co-owner |
3456 | 3 | CEO |
Final Table:
key | contact1 | contact2 | contact3 |
---|---|---|---|
1234 | owner | co-owner | CEO |
3456 | co-owner | CEO |
First, I don't think you can do this entirely in SQL (without multiple joins which could be extremely slow on real-world large data sets)
Second, since I believe strongly that you are better off with long data sets rather than wide data sets (Maxim 19), here is the solution that produces a long data set from this data. (Which is also easy in SQL)
data want;
merge tableA tableB(rename=(key2=key1));
by key1;
run;
I agree with @PaigeMiller here.
The only (?) reason to do it in SQL if your (large) data resides in an external databbase, so you can push the query there.
Then you need to join with table B three times, and using Rank in the join criterias.
@LinusH wrote:
I agree with @PaigeMiller here.
The only (?) reason to do it in SQL if your (large) data resides in an external databbase, so you can push the query there.
Then you need to join with table B three times, and using Rank in the join criterias.
That assumes there are only 3 values of rank, which may or may not be known beforehand. Anyway, if you create a long data set, you could do a PROC TRANSPOSE to get a wide data set and then you don't need to know beforehand how many values of rank there are.
Example of @PaigeMiller's suggestion.
Note the data in the form of working data steps (hint);
data work.a; input Key $; datalines; 1234 3456 ; Data work.B; input Key $ Rank contact $; datalines; 1234 1 owner 1234 2 co-owner 1234 3 CEO 3456 2 co-owner 3456 3 CEO ; data work.merged; merge work.a work.b; by key; run; proc transpose data=work.merged out=work.trans (drop=_name_) prefix=contact; by key; var contact; run;
If your data isn't sorted, or just in case, I would suggest sorting Work.A (or whatever you data set name is) by the Key. Then Work.B by the Key and Rank.
If your data sets do have differently named key variables a data set option rename would address that:
data work.merged; merge work.a work.b(rename=(key2=key)); by key; run;
WITH RankedContacts AS (
SELECT
Key2 AS [Key],
Contact,
ROW_NUMBER() OVER (PARTITION BY Key2 ORDER BY Rank) AS RowNum
FROM
TableB
)
SELECT
[Key],
MAX(CASE WHEN RowNum = 1 THEN Contact END) AS contact1,
MAX(CASE WHEN RowNum = 2 THEN Contact END) AS contact2,
MAX(CASE WHEN RowNum = 3 THEN Contact END) AS contact3
FROM
RankedContacts
GROUP BY
[Key];
Result:
This query first ranks the contacts based on their rank for each key. Then, it uses conditional aggregation to assign the appropriate contacts to contact1, contact2, and contact3.
A more extensive discussion of this issue can be found here: row number in sql server
This, of course, will only work in a remote database that understands this particular syntax. It will not work in SAS.
Note that PROC SQL does not support windowing functions nor WITH clauses so the only way to run something like that would be via explicit pass thru to some external database system.
That syntax seems to be SQL SERVER specific, but could probably be adopted for most other external databases that support SQL windowing functions and WITH clauses.
For further analysis, a long layout is better. The wide layout can easily achieved in PROC REPORT for reporting purposes:
data a;
input Key1 $;
datalines;
1234
3456
;
data b;
infile datalines dsd dlm="09"x;
input Key2 $ Rank contact $;
datalines;
1234 1 owner
1234 2 co-owner
1234 3 CEO
3456 2 co-owner
3456 3 CEO
;
data want;
merge
a
b (rename=(key2=key1))
;
by key1;
if first.key1
then n = 1;
else n + 1;
cont = cats("contact",n);
run;
proc report data=want;
column key1 (contact rank),cont;
define key1 / "" group;
define contact / "" display;
define rank / noprint;
define cont / "" across;
run;
Using rank with noprint is necessary because REPORT wants at least one ANALYSIS variable under the ACROSS, and this needs to be numeric.
Result:
contact1 contact2 contact3 1234 owner co-owner CEO 3456 co-owner CEO
I don't see why you would need to combine the two tables. Add the first table does not add any information that is not already in the second table. Can you explain why they need to be joined?
Just sort the data by KEY (which ever name for it you want to use) and RANK and then transpose it.
Your table appears to already be sorted.
data have;
input Key2 Rank contact $ ;
cards;
1234 1 owner
1234 2 co-owner
1234 3 CEO
3456 2 co-owner
3456 3 CEO
;
proc transpose data=have out=want(drop=_name_) prefix=contact;
by key2;
var contact;
run;
Result
Obs Key2 contact1 contact2 contact3 1 1234 owner co-owner CEO 2 3456 co-owner CEO
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Save $200 when you sign up by March 14!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.