hey all,
I want to merge two data sets that have a common variable called CIK, but in data set 2 I have CIK and CIK_1. However, the CIK in data set 1 could be CIK or CIK_1 which is not given separately like in data set 2. I was wondering if I can first merge them by CIK, and if there is no matching CIK, then compare CIK in data set one to CIK_1 in data set 2.
below is the simplified data sets 1 and 2.
data set 1 (SORTED BY cik)
Company name CIK
A 123456
B 654123
Data set 2 (sorted by CIK)
Company name CIK CIK_1
A 456789 123456
B 654123
I want the final merged data set looks like this:
Company name CIK
A 123456
B 654123
thanks in advance.
Ziba
proc sql;
create table want(drop=_:) as
select a.* from final.public_sort, b.* from final.ucla_sort ,coalescec(a._cik ,CIK_1) as CIK
from final.public_sort(rename=(cik=_cik)) a left join final.ucla_sort(rename=(cik=__cik)) b
on a._cik=b.__cik or a._cik=CIK_1;
quit;
Corrected version below:
proc sql;
create table want(drop=_:) as
select a.* , b.* ,coalescec(a._cik ,CIK_1) as CIK
from final.public_sort(rename=(cik=_cik)) a left join final.ucla_sort(rename=(cik=__cik)) b
on a._cik=b.__cik or a._cik=CIK_1;
quit;
Also, I took the sample as char values, if cik is a num num value, use coalesce instead of coalescec
straight forward sql
data one;
input Companyname $ CIK $;
cards;
A 123456
B 654123
;
data two;
input Companyname $ CIK $ CIK_1 $;
cards;
A 456789 123456
B . 654123
;
proc sql;
create table want as
select a.companyname,coalescec(a.cik ,CIK_1) as CIK
from one a left join two b
on a.cik=b.cik or a.cik=CIK_1;
quit;
hey
I am sorry I forgot to mention in data set 1 i have more than two variables, and I need to grab them all (I need to grab all the variables in data set 2 as well). So I guess I need to use * to select all the variables?
I don`t know how to use sql and I am not sure why you have a and b in the code? should I create a and b in the first place or I dont have to?
Can you please expand your sample a little more with some blah blah, and i will copy/paste here and modify the sql
The reason is to avoid going back and forth
In data set 1, I have variables up to 24 variables, and it includes company name and CIK.
in data set 2, I have variables up to 35, and it includes company name( it could be in the same format as data set 1, but I dont want them to merge into one column, so I used Company name_UCLA to differentiate), CIK, CIK_1.
so I think the desired merged data set should have 58 variables.
Should I provide you with all the variables?
Ok, try this and let me know.
Just a slight adjustment to get all vars from both
proc sql;
create table want(drop=_:) as
select a.*,b.*,coalescec(a._cik ,CIK_1) as CIK
from one(rename=(cik=_cik)) a left join two(rename=(cik=__cik)) b
on a._cik=b.__cik or a._cik=CIK_1;
quit;
Just play around, and seriously if you are not a lazy person like me you can ofcourse type select a.var1,a.var2,..a.varn and so on.
proc sql;
create table want(drop=_:) as
select a.* from final.public_sort, b.* from final.ucla_sort ,coalescec(a._cik ,CIK_1) as CIK
from final.public_sort(rename=(cik=_cik)) a left join final.ucla_sort(rename=(cik=__cik)) b
on a._cik=b.__cik or a._cik=CIK_1;
quit;
I tried this code after I changed the names one and two into my sas data sets name.
But I have a syntax error pointing the star sign next to b:
433 proc sql;
434 create table want(drop=_:) as
435 select a.* from final.public_sort, b.* from final.ucla_sort ,coalescec(a._cik ,CIK_1) as CIK
-
22
76
ERROR 22-322: Expecting a name.
ERROR 76-322: Syntax error, statement will be ignored.
I am sure the code is correct, but Im not getting the results. Could you tell where I did it wrong?
proc sql;
create table want(drop=_:) as
select a.* from final.public_sort, b.* from final.ucla_sort ,coalescec(a._cik ,CIK_1) as CIK
from final.public_sort(rename=(cik=_cik)) a left join final.ucla_sort(rename=(cik=__cik)) b
on a._cik=b.__cik or a._cik=CIK_1;
quit;
Corrected version below:
proc sql;
create table want(drop=_:) as
select a.* , b.* ,coalescec(a._cik ,CIK_1) as CIK
from final.public_sort(rename=(cik=_cik)) a left join final.ucla_sort(rename=(cik=__cik)) b
on a._cik=b.__cik or a._cik=CIK_1;
quit;
Also, I took the sample as char values, if cik is a num num value, use coalesce instead of coalescec
@novinosrin wrote:
Also, I took the sample as char values, if cik is a num num value, use coalesce instead of coalescec
Just a side note: In PROC SQL you can always use the (older, ANSI SQL) function COALESCE, which works with character arguments as well as with numeric arguments (not with mixed types, though). As an additional advantage over the DATA step function COALESCEC (introduced in SAS 9, together with the DATA step function COALESCE) it assigns a suitable length to its results, not the default length of 200, which is mostly too long -- and when it's too short, it can even truncate the result (in PROC SQL without notice).
The code worked, I appreciated your help. Thank you very much,
I believe it would help if you could post SAS datastep code which creates sample data. You don't need to create all variables but just enough so the problem to be solved becomes clear.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.