BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ziba
Calcite | Level 5

hey all, 

I want to merge two data sets that have a common variable called CIK, but in data set 2 I have CIK and CIK_1. However, the CIK in data set 1 could be CIK or CIK_1 which is not given separately like in data set 2. I was wondering if I can first merge them by CIK, and if there is no matching CIK, then compare CIK in data set one to CIK_1 in data set 2. 

below is the simplified data sets 1 and 2. 

 

 data set 1 (SORTED BY cik)

Company name     CIK 

A                             123456

B                             654123

 

Data set 2 (sorted by CIK)

Company name     CIK              CIK_1

A                            456789          123456

B                                                   654123

 

I want the final merged data set  looks like this:

Company name               CIK 

A                                   123456

B                                    654123

thanks in advance.

Ziba 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

proc sql;
create table want(drop=_:) as
select a.* from final.public_sort, b.* from final.ucla_sort ,coalescec(a._cik ,CIK_1) as CIK
from final.public_sort(rename=(cik=_cik)) a left join final.ucla_sort(rename=(cik=__cik)) b
on a._cik=b.__cik or a._cik=CIK_1;
quit;

 

Corrected version below:

 

proc sql;
create table want(drop=_:) as
select a.* , b.* ,coalescec(a._cik ,CIK_1) as CIK
from final.public_sort(rename=(cik=_cik)) a left join final.ucla_sort(rename=(cik=__cik)) b
on a._cik=b.__cik or a._cik=CIK_1;
quit;

 

Also, I took the sample as char values, if cik is a num num value, use coalesce instead of coalescec

View solution in original post

11 REPLIES 11
novinosrin
Tourmaline | Level 20

straight forward sql

 


data one;
input Companyname  $   CIK $;
cards;
A                             123456
B                             654123
;
data two;
input Companyname  $   CIK     $         CIK_1 $;
cards;
A                            456789          123456
B                              .             654123
;

proc sql;
create table want as
select a.companyname,coalescec(a.cik ,CIK_1) as CIK
from one a left join two b
on a.cik=b.cik or a.cik=CIK_1;
quit;
Ziba
Calcite | Level 5

hey 

I am sorry I forgot to mention in data set 1 i have more than two variables, and I need to grab them all (I need to grab all the variables in data set 2 as well). So I guess I need to use * to select all the variables?

I don`t know how to use sql and I am not sure why you have a and b in the code? should I create a and b in the first place or I dont have to?

novinosrin
Tourmaline | Level 20

Can you please expand your sample a little more with some blah blah, and i will copy/paste here and modify the sql

 

The reason is to avoid going back and forth

Ziba
Calcite | Level 5

In data set 1, I have variables up to 24 variables, and it includes company name and CIK. 

 

in data set 2, I have variables up to 35, and it includes  company name( it could be in the same format as data set 1, but I dont want them to merge into one column, so I used Company name_UCLA to differentiate), CIK, CIK_1.

so I think the desired merged data set should have 58 variables. 

Should I provide you with all the variables?

 

novinosrin
Tourmaline | Level 20

Ok, try this and let me know.

 

Just a slight adjustment to get all vars from both

 


proc sql;
create table want(drop=_:) as
select a.*,b.*,coalescec(a._cik ,CIK_1) as CIK
from one(rename=(cik=_cik)) a left join two(rename=(cik=__cik)) b
on a._cik=b.__cik or a._cik=CIK_1;
quit;
novinosrin
Tourmaline | Level 20

Just play around, and seriously if you are not a lazy person like me you can ofcourse type select a.var1,a.var2,..a.varn and so on.

Ziba
Calcite | Level 5

 

proc sql;
create table want(drop=_:) as
select a.* from final.public_sort, b.* from final.ucla_sort ,coalescec(a._cik ,CIK_1) as CIK
from final.public_sort(rename=(cik=_cik)) a left join final.ucla_sort(rename=(cik=__cik)) b
on a._cik=b.__cik or a._cik=CIK_1;
quit;

I tried this code after I changed the names one and two into my sas data sets name. 

 

But I have a syntax error pointing the star sign next to b:


433 proc sql;
434 create table want(drop=_:) as
435 select a.* from final.public_sort,   b.* from final.ucla_sort ,coalescec(a._cik ,CIK_1) as CIK
-
                                                           22
                                                           76
ERROR 22-322: Expecting a name.

ERROR 76-322: Syntax error, statement will be ignored.

 

I am sure the code is correct, but Im not getting the results. Could you tell where I did it wrong?

novinosrin
Tourmaline | Level 20

proc sql;
create table want(drop=_:) as
select a.* from final.public_sort, b.* from final.ucla_sort ,coalescec(a._cik ,CIK_1) as CIK
from final.public_sort(rename=(cik=_cik)) a left join final.ucla_sort(rename=(cik=__cik)) b
on a._cik=b.__cik or a._cik=CIK_1;
quit;

 

Corrected version below:

 

proc sql;
create table want(drop=_:) as
select a.* , b.* ,coalescec(a._cik ,CIK_1) as CIK
from final.public_sort(rename=(cik=_cik)) a left join final.ucla_sort(rename=(cik=__cik)) b
on a._cik=b.__cik or a._cik=CIK_1;
quit;

 

Also, I took the sample as char values, if cik is a num num value, use coalesce instead of coalescec

FreelanceReinh
Jade | Level 19

@novinosrin wrote:

 

Also, I took the sample as char values, if cik is a num num value, use coalesce instead of coalescec


Just a side note: In PROC SQL you can always use the (older, ANSI SQL) function COALESCE, which works with character arguments as well as with numeric arguments (not with mixed types, though). As an additional advantage over the DATA step function COALESCEC (introduced in SAS 9, together with the DATA step function COALESCE) it assigns a suitable length to its results, not the default length of 200, which is mostly too long -- and when it's too short, it can even truncate the result (in PROC SQL without notice).

Ziba
Calcite | Level 5

The code worked, I appreciated your help. Thank you very much, 

 

Patrick
Opal | Level 21

@Ziba

I believe it would help if you could post SAS datastep code which creates sample data. You don't need to create all variables but just enough so the problem to be solved becomes clear.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1176 views
  • 2 likes
  • 4 in conversation