BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pakala60
Calcite | Level 5

Hi,

 

I am trying to find out a discrepancy in the data. I am expecting to see either Test2 or Test2A has to report 'Yes' for a unique subject but not both. If both (Test2 and Test2A) reported as 'Yes' then I want to provide Discreapancy as 'Yes'. 

 

Here is the data what I have:

SUBJECTTESTYN
1Test1Yes
1Test2Yes
1Test2AYes
1Test3Yes
2Test1Yes
2Test2Yes
2Test2ANo
2Test3Yes
2Test4Yes
3Test1Yes
3Test2Yes
3Test2AYes
3Test3Yes

 

I would like to see the output like below:

SUBJECTTESTYNDiscrepancy 
1Test1Yes 
1Test2YesYes
1Test2AYesYes
1Test3Yes 
2Test1Yes 
2Test2Yes 
2Test2ANo 
2Test3Yes 
2Test4Yes 
3Test1Yes 
3Test2YesYes
3Test2AYesYes
3Test3Yes 

 

I am new to SAS programming, can you please provide any suggestions how to get the output?

 

Thanks in advance!

 

Pakala

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

data have;

input SUBJECT   TEST $     YN $;

datalines;

1    Test1 Yes

1    Test2 Yes

1    Test2A     Yes

1    Test3 Yes

2    Test1 Yes

2    Test2 Yes

2    Test2A     No

2    Test3 Yes

2    Test4 Yes

3    Test1 Yes

3    Test2 Yes

3    Test2A     Yes

3    Test3 Yes

;

 

 

 

data want;

merge have have(firstobs=2 rename=(TEST=_TEST yn=_yn));

retain discrepancy;

if test='Test2' and _test='Test2A' and YN='Yes' and _yn='Yes' then do;

discrepancy="Yes";

output;

end;

else if test='Test2A' and discrepancy="Yes" then do;

discrepancy="Yes";

output;

call missing(discrepancy);

end;

else output;

drop _:;

run;

View solution in original post

4 REPLIES 4
Astounding
PROC Star

Since you're new to SAS programming, here are a few items to consider.  First, there is not a really simple solution.  Second, the exact spelling of variable values matters.  These are all different to SAS:  Yes, yes, YES.  So I'll be perhaps too careful and apply the UPCASE function in making comparisons.

 

This solution assumes that your data set is already sorted BY SUBJECT:

 

data want;

n_yesses=0;

do until (last.subject);

   set have;

   by subject;

   if upcase(test) in ("TEST2", "TEST2A") and upcase(YN) = "YES" then n_yesses + 1;

end;

do until (last.subject);

   set have;

   by subject;

   if upcase(test) in ("TEST2", "TEST2A") and n_yesses = 2 then Discrepancy="Yes";

   output;

end;

drop n_yesses;

run;

 

The top loop examines all observations for a SUBJECT, and counts the number of yesses.  Then the bottom loop works with the exact same observations, assigns DISCREPANCY, and outputs the results.

novinosrin
Tourmaline | Level 20

data have;

input SUBJECT   TEST $     YN $;

datalines;

1    Test1 Yes

1    Test2 Yes

1    Test2A     Yes

1    Test3 Yes

2    Test1 Yes

2    Test2 Yes

2    Test2A     No

2    Test3 Yes

2    Test4 Yes

3    Test1 Yes

3    Test2 Yes

3    Test2A     Yes

3    Test3 Yes

;

 

 

 

data want;

merge have have(firstobs=2 rename=(TEST=_TEST yn=_yn));

retain discrepancy;

if test='Test2' and _test='Test2A' and YN='Yes' and _yn='Yes' then do;

discrepancy="Yes";

output;

end;

else if test='Test2A' and discrepancy="Yes" then do;

discrepancy="Yes";

output;

call missing(discrepancy);

end;

else output;

drop _:;

run;

HB
Barite | Level 11 HB
Barite | Level 11

Use @Astounding's or @novinosrin''s solution because they are better, but I had to go for a pure SQL solution because that is what I do.

 

data test_scores;
   input 
	@1 SUBJECT $1.
	@3 TEST $7. 
	@10 YN $3.;
datalines;
1 Test1  Yes
1 Test2  Yes
1 Test2A Yes
1 Test3  Yes
2 Test1  Yes
2 Test2  Yes
2 Test2A No
2 Test3  Yes
2 Test4  Yes
3 Test1  Yes
3 Test2  Yes
3 Test2A Yes
3 Test3  Yes
;
run;


*we can identify the problem cases like this;
proc sql;
	create table problems as
	select a.*, 'Yes' as Discrepancy from 
	(
	select subject, test, yn
	from test_scores
	where test = 'Test2' and YN = 'Yes'
 	union
	select subject, test, yn
	from test_scores
	where test = 'Test2A' and YN = 'Yes'
	) as a
	group by subject
	having count(subject) > 1;
quit;

*and mark them in the whole dataset like this;
proc sql;
	create table all_with_marked_cases as
	select a.subject, a.test, a.yn, b.discrepancy 
	from test_scores a left join problems b
	on a.subject = b.subject and a.test = b.test and a.yn = b.yn;
quit; 

Gives us

The SAS System			
			
SUBJECT	TEST	YN	Discrepancy
1	Test1	Yes	
1	Test2	Yes	Yes
1	Test2A	Yes	Yes
1	Test3	Yes	
2	Test1	Yes	
2	Test2	Yes	
2	Test2A	No	
2	Test3	Yes	
2	Test4	Yes	
3	Test1	Yes	
3	Test2	Yes	Yes
3	Test2A	Yes	Yes
3	Test3	Yes	

You may now return to your regularly scheduled programming.

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1158 views
  • 3 likes
  • 4 in conversation