DATA Step, Macro, Functions and more

How to identify a discrepancy from multiple records of a single subject?

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

How to identify a discrepancy from multiple records of a single subject?

Hi,

 

I am trying to find out a discrepancy in the data. I am expecting to see either Test2 or Test2A has to report 'Yes' for a unique subject but not both. If both (Test2 and Test2A) reported as 'Yes' then I want to provide Discreapancy as 'Yes'. 

 

Here is the data what I have:

SUBJECTTESTYN
1Test1Yes
1Test2Yes
1Test2AYes
1Test3Yes
2Test1Yes
2Test2Yes
2Test2ANo
2Test3Yes
2Test4Yes
3Test1Yes
3Test2Yes
3Test2AYes
3Test3Yes

 

I would like to see the output like below:

SUBJECTTESTYNDiscrepancy 
1Test1Yes 
1Test2YesYes
1Test2AYesYes
1Test3Yes 
2Test1Yes 
2Test2Yes 
2Test2ANo 
2Test3Yes 
2Test4Yes 
3Test1Yes 
3Test2YesYes
3Test2AYesYes
3Test3Yes 

 

I am new to SAS programming, can you please provide any suggestions how to get the output?

 

Thanks in advance!

 

Pakala


Accepted Solutions
Solution
‎09-20-2017 06:39 PM
PROC Star
Posts: 1,308

Re: How to identify a discrepancy from multiple records of a single subject?

Posted in reply to Astounding

data have;

input SUBJECT   TEST $     YN $;

datalines;

1    Test1 Yes

1    Test2 Yes

1    Test2A     Yes

1    Test3 Yes

2    Test1 Yes

2    Test2 Yes

2    Test2A     No

2    Test3 Yes

2    Test4 Yes

3    Test1 Yes

3    Test2 Yes

3    Test2A     Yes

3    Test3 Yes

;

 

 

 

data want;

merge have have(firstobs=2 rename=(TEST=_TEST yn=_yn));

retain discrepancy;

if test='Test2' and _test='Test2A' and YN='Yes' and _yn='Yes' then do;

discrepancy="Yes";

output;

end;

else if test='Test2A' and discrepancy="Yes" then do;

discrepancy="Yes";

output;

call missing(discrepancy);

end;

else output;

drop _:;

run;

View solution in original post


All Replies
Super User
Posts: 6,537

Re: How to identify a discrepancy from multiple records of a single subject?

Since you're new to SAS programming, here are a few items to consider.  First, there is not a really simple solution.  Second, the exact spelling of variable values matters.  These are all different to SAS:  Yes, yes, YES.  So I'll be perhaps too careful and apply the UPCASE function in making comparisons.

 

This solution assumes that your data set is already sorted BY SUBJECT:

 

data want;

n_yesses=0;

do until (last.subject);

   set have;

   by subject;

   if upcase(test) in ("TEST2", "TEST2A") and upcase(YN) = "YES" then n_yesses + 1;

end;

do until (last.subject);

   set have;

   by subject;

   if upcase(test) in ("TEST2", "TEST2A") and n_yesses = 2 then Discrepancy="Yes";

   output;

end;

drop n_yesses;

run;

 

The top loop examines all observations for a SUBJECT, and counts the number of yesses.  Then the bottom loop works with the exact same observations, assigns DISCREPANCY, and outputs the results.

Solution
‎09-20-2017 06:39 PM
PROC Star
Posts: 1,308

Re: How to identify a discrepancy from multiple records of a single subject?

Posted in reply to Astounding

data have;

input SUBJECT   TEST $     YN $;

datalines;

1    Test1 Yes

1    Test2 Yes

1    Test2A     Yes

1    Test3 Yes

2    Test1 Yes

2    Test2 Yes

2    Test2A     No

2    Test3 Yes

2    Test4 Yes

3    Test1 Yes

3    Test2 Yes

3    Test2A     Yes

3    Test3 Yes

;

 

 

 

data want;

merge have have(firstobs=2 rename=(TEST=_TEST yn=_yn));

retain discrepancy;

if test='Test2' and _test='Test2A' and YN='Yes' and _yn='Yes' then do;

discrepancy="Yes";

output;

end;

else if test='Test2A' and discrepancy="Yes" then do;

discrepancy="Yes";

output;

call missing(discrepancy);

end;

else output;

drop _:;

run;

Super Contributor
Super Contributor
Posts: 260

Re: How to identify a discrepancy from multiple records of a single subject?

[ Edited ]

Use @Astounding's or @novinosrin''s solution because they are better, but I had to go for a pure SQL solution because that is what I do.

 

data test_scores;
   input 
	@1 SUBJECT $1.
	@3 TEST $7. 
	@10 YN $3.;
datalines;
1 Test1  Yes
1 Test2  Yes
1 Test2A Yes
1 Test3  Yes
2 Test1  Yes
2 Test2  Yes
2 Test2A No
2 Test3  Yes
2 Test4  Yes
3 Test1  Yes
3 Test2  Yes
3 Test2A Yes
3 Test3  Yes
;
run;


*we can identify the problem cases like this;
proc sql;
	create table problems as
	select a.*, 'Yes' as Discrepancy from 
	(
	select subject, test, yn
	from test_scores
	where test = 'Test2' and YN = 'Yes'
 	union
	select subject, test, yn
	from test_scores
	where test = 'Test2A' and YN = 'Yes'
	) as a
	group by subject
	having count(subject) > 1;
quit;

*and mark them in the whole dataset like this;
proc sql;
	create table all_with_marked_cases as
	select a.subject, a.test, a.yn, b.discrepancy 
	from test_scores a left join problems b
	on a.subject = b.subject and a.test = b.test and a.yn = b.yn;
quit; 

Gives us

The SAS System			
			
SUBJECT	TEST	YN	Discrepancy
1	Test1	Yes	
1	Test2	Yes	Yes
1	Test2A	Yes	Yes
1	Test3	Yes	
2	Test1	Yes	
2	Test2	Yes	
2	Test2A	No	
2	Test3	Yes	
2	Test4	Yes	
3	Test1	Yes	
3	Test2	Yes	Yes
3	Test2A	Yes	Yes
3	Test3	Yes	

You may now return to your regularly scheduled programming.

 

 

New Contributor
Posts: 2

Re: How to identify a discrepancy from multiple records of a single subject?

Thank you @Astounding@novinosrin and @HB!

 

Regards,

Pakala

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 163 views
  • 3 likes
  • 4 in conversation