Solved: Finding unmatched variables (name/dob/enroll)?

ybz12003 · Posted 07-30-2020 03:26 PM

Hello,

I have dataset list below.

data have;
	infile datalines dsd;
	format ID 2. first_name $10. last_name $10. birthdate mmddyy10. enroll $1.;
 	input ID : 2. first_name : $10. last_name : $10. birthdate : mmddyy10. enroll : $1.;
datalines;
1,  John, Doe,   01/01/2020, Y,
12,  John, Doe,   01/01/2020, N,
2,  Joe,  Smith, 12/01/2019, Y,
22,  Joe,  Smith, 12/01/2019, Y,
27,  Joe,  Smith, 12/01/2019, Y,
32,  Kevin,  Mao, 02/06/2000, N,
36,  Kevin,  Yao, 02/06/2000, Y,
24,  Beky,   Lee, 08/23/2005, N,
41,  Beky,   Lee, 08/21/2005, N,
48,  Beky,   Lee, 08/23/2005, N,
49,  Beky,   Lee, 08/23/2005, Y,
15,	Ashley, Taylor, 09/18/2000, N,
54,	Ashley, Taylor, 09/18/2000, N,
65,	Ashley, Taylor, 09/18/2000, N,
;
run;

I would like to list all the unmatching names/dob/enroll as the result shown below. Please advice how to approach it, thanks.

data want;
	infile datalines dsd;
	format ID 2. first_name $10. last_name $10. birthdate mmddyy10. enroll $1.;
 	input ID : 2. first_name : $10. last_name : $10. birthdate : mmddyy10. enroll : $1.;
datalines;
1,  John, Doe,   01/01/2020, Y,
12,  John, Doe,   01/01/2020, N,
27,  Joe,  Smith, 12/01/2019, Y,
32,  Kevin,  Mao, 02/06/2000, N,
36,  Kevin,  Yao, 02/06/2000, Y,
24,  Beky,   Lee, 08/23/2005, N,
41,  Beky,   Lee, 08/21/2005, N,
48,  Beky,   Lee, 08/23/2005, N,
49,  Beky,   Lee, 08/23/2005, Y,
;
run;

novinosrin · Posted 07-30-2020 04:49 PM

Hi @ybz12003 Assuming I understand your requirement, the following should do?-


data have;
	infile datalines dsd;
	format ID 2. first_name $10. last_name $10. birthdate mmddyy10. enroll $1.;
 	input ID : 2. first_name : $10. last_name : $10. birthdate : mmddyy10. enroll : $1.;
datalines;
1,  John, Doe,   01/01/2020, Y,
12,  John, Doe,   01/01/2020, N,
2,  Joe,  Smith, 12/01/2019, Y,
22,  Joe,  Smith, 12/01/2019, Y,
27,  Joe,  Smith, 12/01/2019, Y,
32,  Kevin,  Mao, 02/06/2000, N,
36,  Kevin,  Yao, 02/06/2000, Y,
24,  Beky,   Lee, 08/23/2005, N,
41,  Beky,   Lee, 08/21/2005, N,
48,  Beky,   Lee, 08/23/2005, N,
49,  Beky,   Lee, 08/23/2005, Y,
15,	Ashley, Taylor, 09/18/2000, N,
54,	Ashley, Taylor, 09/18/2000, N,
65,	Ashley, Taylor, 09/18/2000, N,
;
run;


proc sql;
create table want as
select *
from have
group by first_name,last_name,birthdate,enroll
having count(*)=1;
quit;

View solution in original post

PaigeMiller · Posted 07-30-2020 03:30 PM

I would like to list all the unmatching names/dob/enroll as the result shown below.

This is a really brief description, could you please elaborate and explain the logic you want to follow?

Also:

Why is Joe Smith in the output data set?

Why is this in the output data set?

48,  Beky,   Lee, 08/23/2005, N,

--
Paige Miller

mazouz · Posted 07-30-2020 03:37 PM

PTOC SQL;
CREATE TABLE want AS
SELECT COUNT(*) AS nbr_doublon, firstname, lastname, birthdate,enroll
FROM have
GROUP BY firstname, lastname, birthdate,enroll
HAVING COUNT(*) =1;
QUIT;

ALTER TABLE want
DROP COLUMN nbr_doublon;

ybz12003 · Posted 07-30-2020 03:55 PM

Your result dataset is missing ID which I would like to include the final dataset.

ybz12003 · Posted 07-30-2020 03:58 PM

Sorry, my mistake. The '27, Joe, Smith, 12/01/2019, Y,' should not be in the final dataset.

mazouz · Posted 07-30-2020 04:11 PM

Try this, I don't have sas now to try the code

PROC SQL;
CREATE TABLE want AS
SELECT ID, firstname, lastname, birthdate,enroll,COUNT(firstname, lastname, birthdate,enroll) AS nbr_doublon
FROM have
GROUP BY firstname, lastname, birthdate,enroll
HAVING nbr_doublon =1;
QUIT;

ALTER TABLE want
DROP COLUMN nbr_doublon;

ybz12003 · Posted 07-30-2020 04:27 PM

Not working, see below.

WARNING: Function COUNT requires at most 3 argument(s). The extra one(s) will be ignored.
ERROR: Function COUNT requires a character expression as argument 3.

novinosrin · Posted 07-30-2020 04:49 PM

Hi @ybz12003 Assuming I understand your requirement, the following should do?-


data have;
	infile datalines dsd;
	format ID 2. first_name $10. last_name $10. birthdate mmddyy10. enroll $1.;
 	input ID : 2. first_name : $10. last_name : $10. birthdate : mmddyy10. enroll : $1.;
datalines;
1,  John, Doe,   01/01/2020, Y,
12,  John, Doe,   01/01/2020, N,
2,  Joe,  Smith, 12/01/2019, Y,
22,  Joe,  Smith, 12/01/2019, Y,
27,  Joe,  Smith, 12/01/2019, Y,
32,  Kevin,  Mao, 02/06/2000, N,
36,  Kevin,  Yao, 02/06/2000, Y,
24,  Beky,   Lee, 08/23/2005, N,
41,  Beky,   Lee, 08/21/2005, N,
48,  Beky,   Lee, 08/23/2005, N,
49,  Beky,   Lee, 08/23/2005, Y,
15,	Ashley, Taylor, 09/18/2000, N,
54,	Ashley, Taylor, 09/18/2000, N,
65,	Ashley, Taylor, 09/18/2000, N,
;
run;


proc sql;
create table want as
select *
from have
group by first_name,last_name,birthdate,enroll
having count(*)=1;
quit;

ybz12003 · Posted 07-30-2020 05:19 PM

Although the code is working, the 'Becky' only have two. My final 'want' dataset have four 'Becky's. I would like to have all of the IDs for 'Becky.'

mazouz · Posted 07-30-2020 06:50 PM

proc sql;
create table want1 as
select *
from have
group by first_name,last_name,birthdate,enroll
having count(*)=1 ;
quit;

proc sql;
create table want2 as
select *
from have
group by first_name,last_name,birthdate,enroll
having count(*)>1 ;
quit;

proc sql;
create table want as
select *
from want1 t1
left join want2 t2
on (t1.first_name=t2.first_name and t1.last_name=t2.last_name and t1.birthdate=t2.birthdate);
quit;

proc delete data=want1 want2 (gennum=all);
run;

PaigeMiller · Posted 07-31-2020 07:31 AM

Hello, @ybz12003, you've provided extremely brief information about the logic you are using, and we can't really understand the logic you want to use. Please take some time and describe in words, and in detail, what your logic is that takes you from the input data to the output data. These terribly brief statements of yours simply are not moving us forward.

--
Paige Miller

Ksharp · Posted 07-31-2020 08:25 AM

How do you know

32,  Kevin,  Mao, 02/06/2000, N,
36,  Kevin,  Yao, 02/06/2000, Y,

is the same person , but have different last name ?

Assuming the first name is the same for the same person .

data have;
	infile datalines dsd;
	format ID 2. first_name $10. last_name $10. birthdate mmddyy10. enroll $1.;
 	input ID : 2. first_name : $10. last_name : $10. birthdate : mmddyy10. enroll : $1.;
datalines;
1,  John, Doe,   01/01/2020, Y,
12,  John, Doe,   01/01/2020, N,
2,  Joe,  Smith, 12/01/2019, Y,
22,  Joe,  Smith, 12/01/2019, Y,
27,  Joe,  Smith, 12/01/2019, Y,
32,  Kevin,  Mao, 02/06/2000, N,
36,  Kevin,  Yao, 02/06/2000, Y,
24,  Beky,   Lee, 08/23/2005, N,
41,  Beky,   Lee, 08/21/2005, N,
48,  Beky,   Lee, 08/23/2005, N,
49,  Beky,   Lee, 08/23/2005, Y,
15,	Ashley, Taylor, 09/18/2000, N,
54,	Ashley, Taylor, 09/18/2000, N,
65,	Ashley, Taylor, 09/18/2000, N,
;
run;
proc sql;
create table want as
select *
 from have 
  group by first_name
   having count(distinct catx('|', last_name,birthdate,enroll )) ne 1;
quit;

ybz12003 · Posted 07-31-2020 10:27 AM

Sorry for the confusion, I am looking for the IDs whose any first_name/last_name/dob/enroll is different. Therefore, when 'Joe Smith' and 'Ashley Taylor' have the same name, DOB and enroll, I don't need them. All the rest are the IDs I am looking for.

mazouz · Posted 07-31-2020 10:40 AM

proc sql;
create table want1 as
select *
from have
group by first_name,last_name,birthdate,enroll
having count(*)=1 ;
quit;

proc sql;
create table want2 as
select *
from have
group by first_name,last_name,birthdate,enroll
having count(*)>1 ;
quit;

proc sql;
create table want as
select *
from want1 t1
left join want2 t2
on (t1.first_name=t2.first_name and t1.last_name=t2.last_name );
quit;

proc delete data=want1 want2 (gennum=all);
run;

Finding unmatched variables (name/dob/enroll)?

Re: Finding unmatched variables (name/dob/enroll)?

Re: Finding unmatched variables (name/dob/enroll)?

Re: Finding unmatched variables (name/dob/enroll)?

Re: Finding unmatched variables (name/dob/enroll)?

Re: Finding unmatched variables (name/dob/enroll)?

Re: Finding unmatched variables (name/dob/enroll)?

Re: Finding unmatched variables (name/dob/enroll)?

Re: Finding unmatched variables (name/dob/enroll)?

Re: Finding unmatched variables (name/dob/enroll)?

Re: Finding unmatched variables (name/dob/enroll)?

Re: Finding unmatched variables (name/dob/enroll)?

Re: Finding unmatched variables (name/dob/enroll)?

Re: Finding unmatched variables (name/dob/enroll)?

Re: Finding unmatched variables (name/dob/enroll)?

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away