proc sql;
create table want as
select lab as a
park as b
from a b
where ((a.first_name=b.first_name and a.last_name=b.last_name)+
(a.first_name=b.first_name and a.birthdate=b.DOB)+
(a.last_name=b.last_name and b.DOB)) >1 ;
quit;
Hello,
I have a dataset A with 50 variables with 2000 rows, dataset B with 10 variables with 300 rows. Both dataset have same name, birthdate and sex variables. As long as these two dataset have same (name and birthdate) or same (sex and name) or same (sex and birthday) have same value, I link those data together first. Thank you.
Like this?
proc sql;
select A.*
from A,B
where (A.NAME=B.NAME and A.DATE = B.DATE)
or (A.NAME=B.NAME and A.SEX = B.SEX )
or (A.DATE=B.DATE and A.SEX = B.SEX );
I use VERY similar code, some how the result dataset only comes with dataset A variables, no dataset B variables.
A dataset can only one variable of a given name. So if you do select a.NAME,b.NAME you don't get two name variables, just one. In particular PROC SQL will pick the first one and ignore the second one. You need to give them unique names.
SAS evaluates boolean expressions as 1 or 0. So just sum them.
where sum( a.name=b.name, a.bday=b.bday, a.sex=b.sex ) > 1
I still only get set 1 variables.
select a.name as nameA
, b.name as nameB
...
Error: Summary functions are restricted to the SELECT and HAVING clauses only
Try using the + sign instead of the sum function.
SQL is pickier than a data step,
Otherwise, try my syntax.
@ybz12003 wrote:
Error: Summary functions are restricted to the SELECT and HAVING clauses only
Don't use SQL aggregate functions. Use the SAS sum(,) function like in the code I posted. Or just type the + operators instead.
@ybz12003 wrote:
Error: Summary functions are restricted to the SELECT and HAVING clauses only
Since your example code at the start of this thread did not include any summary functions then perhaps it is time to show the actual code you are using when generating errors.
I would advise joining by all three conditions first as that is the most reliable. The next most reliable would be birth date and name followed by sex and name as you have described. SQL is a good choice for this:
proc sql;
create table want as
select A.name
,A.birthdate
,A.sex
,B.name
,B.birthdate
,B.sex
,'1 Name, Birthdate, Sex' as jointype length = 20
from A
left join B
on A.name = B.name
and A.birthdate = B.birthdate
and A.sex = B.sex
union all
select A.name
,A.birthdate
,A.sex
,B.name
,B.birthdate
,B.sex
,'2 Name, Birthdate' as jointype length = 20
from A
left join B
on A.name = B.name
and A.birthdate = B.birthdate
union all
select A.name
,A.birthdate
,A.sex
,B.name
,B.birthdate
,B.sex
,'3 Name, Sex' as jointype length = 20
from A
left join B
on A.name = B.name
and A.sex = B.sex
order by Name, jointype
;
quit;
You will end up with multiple join types for each name so then just choose the one with the lowest number.
sum didn't help,
I have been trying use sum or +, I still only get dataset 1 variables. Not including dataset 2 variables.
Could someone let me know where my code went wrong, I keep getting dataset a variables. That's it. Where is the set b?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.