Solved: Re: How to identify a discrepancy from multiple records of a single su...

pakala60 · Posted 09-20-2017 04:44 PM

Hi,

I am trying to find out a discrepancy in the data. I am expecting to see either Test2 or Test2A has to report 'Yes' for a unique subject but not both. If both (Test2 and Test2A) reported as 'Yes' then I want to provide Discreapancy as 'Yes'.

Here is the data what I have:

SUBJECT	TEST	YN
1	Test1	Yes
1	Test2	Yes
1	Test2A	Yes
1	Test3	Yes
2	Test1	Yes
2	Test2	Yes
2	Test2A	No
2	Test3	Yes
2	Test4	Yes
3	Test1	Yes
3	Test2	Yes
3	Test2A	Yes
3	Test3	Yes

I would like to see the output like below:

SUBJECT	TEST	YN	Discrepancy
1	Test1	Yes
1	Test2	Yes	Yes
1	Test2A	Yes	Yes
1	Test3	Yes
2	Test1	Yes
2	Test2	Yes
2	Test2A	No
2	Test3	Yes
2	Test4	Yes
3	Test1	Yes
3	Test2	Yes	Yes
3	Test2A	Yes	Yes
3	Test3	Yes

I am new to SAS programming, can you please provide any suggestions how to get the output?

Thanks in advance!

Pakala

novinosrin · Posted 09-20-2017 05:14 PM

data have;

input SUBJECT TEST $ YN $;

datalines;

1 Test1 Yes

1 Test2 Yes

1 Test2A Yes

1 Test3 Yes

2 Test1 Yes

2 Test2 Yes

2 Test2A No

2 Test3 Yes

2 Test4 Yes

3 Test1 Yes

3 Test2 Yes

3 Test2A Yes

3 Test3 Yes

;

data want;

merge have have(firstobs=2 rename=(TEST=_TEST yn=_yn));

retain discrepancy;

if test='Test2' and _test='Test2A' and YN='Yes' and _yn='Yes' then do;

discrepancy="Yes";

output;

end;

else if test='Test2A' and discrepancy="Yes" then do;

discrepancy="Yes";

output;

call missing(discrepancy);

end;

else output;

drop _:;

run;

View solution in original post

Astounding · Posted 09-20-2017 04:58 PM

Since you're new to SAS programming, here are a few items to consider. First, there is not a really simple solution. Second, the exact spelling of variable values matters. These are all different to SAS: Yes, yes, YES. So I'll be perhaps too careful and apply the UPCASE function in making comparisons.

This solution assumes that your data set is already sorted BY SUBJECT:

data want;

n_yesses=0;

do until (last.subject);

set have;

by subject;

if upcase(test) in ("TEST2", "TEST2A") and upcase(YN) = "YES" then n_yesses + 1;

end;

do until (last.subject);

set have;

by subject;

if upcase(test) in ("TEST2", "TEST2A") and n_yesses = 2 then Discrepancy="Yes";

output;

end;

drop n_yesses;

run;

The top loop examines all observations for a SUBJECT, and counts the number of yesses. Then the bottom loop works with the exact same observations, assigns DISCREPANCY, and outputs the results.

novinosrin · Posted 09-20-2017 05:14 PM

data have;

input SUBJECT TEST $ YN $;

datalines;

1 Test1 Yes

1 Test2 Yes

1 Test2A Yes

1 Test3 Yes

2 Test1 Yes

2 Test2 Yes

2 Test2A No

2 Test3 Yes

2 Test4 Yes

3 Test1 Yes

3 Test2 Yes

3 Test2A Yes

3 Test3 Yes

;

data want;

merge have have(firstobs=2 rename=(TEST=_TEST yn=_yn));

retain discrepancy;

if test='Test2' and _test='Test2A' and YN='Yes' and _yn='Yes' then do;

discrepancy="Yes";

output;

end;

else if test='Test2A' and discrepancy="Yes" then do;

discrepancy="Yes";

output;

call missing(discrepancy);

end;

else output;

drop _:;

run;

HB · Posted 09-20-2017 05:32 PM

Use @Astounding's or @novinosrin''s solution because they are better, but I had to go for a pure SQL solution because that is what I do.

data test_scores;
   input 
	@1 SUBJECT $1.
	@3 TEST $7. 
	@10 YN $3.;
datalines;
1 Test1  Yes
1 Test2  Yes
1 Test2A Yes
1 Test3  Yes
2 Test1  Yes
2 Test2  Yes
2 Test2A No
2 Test3  Yes
2 Test4  Yes
3 Test1  Yes
3 Test2  Yes
3 Test2A Yes
3 Test3  Yes
;
run;


*we can identify the problem cases like this;
proc sql;
	create table problems as
	select a.*, 'Yes' as Discrepancy from 
	(
	select subject, test, yn
	from test_scores
	where test = 'Test2' and YN = 'Yes'
 	union
	select subject, test, yn
	from test_scores
	where test = 'Test2A' and YN = 'Yes'
	) as a
	group by subject
	having count(subject) > 1;
quit;

*and mark them in the whole dataset like this;
proc sql;
	create table all_with_marked_cases as
	select a.subject, a.test, a.yn, b.discrepancy 
	from test_scores a left join problems b
	on a.subject = b.subject and a.test = b.test and a.yn = b.yn;
quit;

Gives us

The SAS System			
			
SUBJECT	TEST	YN	Discrepancy
1	Test1	Yes	
1	Test2	Yes	Yes
1	Test2A	Yes	Yes
1	Test3	Yes	
2	Test1	Yes	
2	Test2	Yes	
2	Test2A	No	
2	Test3	Yes	
2	Test4	Yes	
3	Test1	Yes	
3	Test2	Yes	Yes
3	Test2A	Yes	Yes
3	Test3	Yes

You may now return to your regularly scheduled programming.

pakala60 · Posted 09-20-2017 06:43 PM

Thank you @Astounding @novinosrin and @HB!

Regards,

Pakala

How to identify a discrepancy from multiple records of a single subject?

Re: How to identify a discrepancy from multiple records of a single subject?

Re: How to identify a discrepancy from multiple records of a single subject?

Re: How to identify a discrepancy from multiple records of a single subject?

Re: How to identify a discrepancy from multiple records of a single subject?

Re: How to identify a discrepancy from multiple records of a single subject?

How to identify a discrepancy from multiple records of a single subject?

Re: How to identify a discrepancy from multiple records of a single subject?

Re: How to identify a discrepancy from multiple records of a single subject?

Re: How to identify a discrepancy from multiple records of a single subject?

Re: How to identify a discrepancy from multiple records of a single subject?

Re: How to identify a discrepancy from multiple records of a single subject?

Click image to register for webinar

Classroom Training Available!