Hi,
I am trying to find out a discrepancy in the data. I am expecting to see either Test2 or Test2A has to report 'Yes' for a unique subject but not both. If both (Test2 and Test2A) reported as 'Yes' then I want to provide Discreapancy as 'Yes'.
Here is the data what I have:
SUBJECT | TEST | YN |
1 | Test1 | Yes |
1 | Test2 | Yes |
1 | Test2A | Yes |
1 | Test3 | Yes |
2 | Test1 | Yes |
2 | Test2 | Yes |
2 | Test2A | No |
2 | Test3 | Yes |
2 | Test4 | Yes |
3 | Test1 | Yes |
3 | Test2 | Yes |
3 | Test2A | Yes |
3 | Test3 | Yes |
I would like to see the output like below:
SUBJECT | TEST | YN | Discrepancy |
1 | Test1 | Yes | |
1 | Test2 | Yes | Yes |
1 | Test2A | Yes | Yes |
1 | Test3 | Yes | |
2 | Test1 | Yes | |
2 | Test2 | Yes | |
2 | Test2A | No | |
2 | Test3 | Yes | |
2 | Test4 | Yes | |
3 | Test1 | Yes | |
3 | Test2 | Yes | Yes |
3 | Test2A | Yes | Yes |
3 | Test3 | Yes |
I am new to SAS programming, can you please provide any suggestions how to get the output?
Thanks in advance!
Pakala
data have;
input SUBJECT TEST $ YN $;
datalines;
1 Test1 Yes
1 Test2 Yes
1 Test2A Yes
1 Test3 Yes
2 Test1 Yes
2 Test2 Yes
2 Test2A No
2 Test3 Yes
2 Test4 Yes
3 Test1 Yes
3 Test2 Yes
3 Test2A Yes
3 Test3 Yes
;
data want;
merge have have(firstobs=2 rename=(TEST=_TEST yn=_yn));
retain discrepancy;
if test='Test2' and _test='Test2A' and YN='Yes' and _yn='Yes' then do;
discrepancy="Yes";
output;
end;
else if test='Test2A' and discrepancy="Yes" then do;
discrepancy="Yes";
output;
call missing(discrepancy);
end;
else output;
drop _:;
run;
Since you're new to SAS programming, here are a few items to consider. First, there is not a really simple solution. Second, the exact spelling of variable values matters. These are all different to SAS: Yes, yes, YES. So I'll be perhaps too careful and apply the UPCASE function in making comparisons.
This solution assumes that your data set is already sorted BY SUBJECT:
data want;
n_yesses=0;
do until (last.subject);
set have;
by subject;
if upcase(test) in ("TEST2", "TEST2A") and upcase(YN) = "YES" then n_yesses + 1;
end;
do until (last.subject);
set have;
by subject;
if upcase(test) in ("TEST2", "TEST2A") and n_yesses = 2 then Discrepancy="Yes";
output;
end;
drop n_yesses;
run;
The top loop examines all observations for a SUBJECT, and counts the number of yesses. Then the bottom loop works with the exact same observations, assigns DISCREPANCY, and outputs the results.
data have;
input SUBJECT TEST $ YN $;
datalines;
1 Test1 Yes
1 Test2 Yes
1 Test2A Yes
1 Test3 Yes
2 Test1 Yes
2 Test2 Yes
2 Test2A No
2 Test3 Yes
2 Test4 Yes
3 Test1 Yes
3 Test2 Yes
3 Test2A Yes
3 Test3 Yes
;
data want;
merge have have(firstobs=2 rename=(TEST=_TEST yn=_yn));
retain discrepancy;
if test='Test2' and _test='Test2A' and YN='Yes' and _yn='Yes' then do;
discrepancy="Yes";
output;
end;
else if test='Test2A' and discrepancy="Yes" then do;
discrepancy="Yes";
output;
call missing(discrepancy);
end;
else output;
drop _:;
run;
Use @Astounding's or @novinosrin''s solution because they are better, but I had to go for a pure SQL solution because that is what I do.
data test_scores;
input
@1 SUBJECT $1.
@3 TEST $7.
@10 YN $3.;
datalines;
1 Test1 Yes
1 Test2 Yes
1 Test2A Yes
1 Test3 Yes
2 Test1 Yes
2 Test2 Yes
2 Test2A No
2 Test3 Yes
2 Test4 Yes
3 Test1 Yes
3 Test2 Yes
3 Test2A Yes
3 Test3 Yes
;
run;
*we can identify the problem cases like this;
proc sql;
create table problems as
select a.*, 'Yes' as Discrepancy from
(
select subject, test, yn
from test_scores
where test = 'Test2' and YN = 'Yes'
union
select subject, test, yn
from test_scores
where test = 'Test2A' and YN = 'Yes'
) as a
group by subject
having count(subject) > 1;
quit;
*and mark them in the whole dataset like this;
proc sql;
create table all_with_marked_cases as
select a.subject, a.test, a.yn, b.discrepancy
from test_scores a left join problems b
on a.subject = b.subject and a.test = b.test and a.yn = b.yn;
quit;
Gives us
The SAS System SUBJECT TEST YN Discrepancy 1 Test1 Yes 1 Test2 Yes Yes 1 Test2A Yes Yes 1 Test3 Yes 2 Test1 Yes 2 Test2 Yes 2 Test2A No 2 Test3 Yes 2 Test4 Yes 3 Test1 Yes 3 Test2 Yes Yes 3 Test2A Yes Yes 3 Test3 Yes
You may now return to your regularly scheduled programming.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.