Hi,
I am trying to find out a discrepancy in the data. I am expecting to see either Test2 or Test2A has to report 'Yes' for a unique subject but not both. If both (Test2 and Test2A) reported as 'Yes' then I want to provide Discreapancy as 'Yes'.
Here is the data what I have:
SUBJECT | TEST | YN |
1 | Test1 | Yes |
1 | Test2 | Yes |
1 | Test2A | Yes |
1 | Test3 | Yes |
2 | Test1 | Yes |
2 | Test2 | Yes |
2 | Test2A | No |
2 | Test3 | Yes |
2 | Test4 | Yes |
3 | Test1 | Yes |
3 | Test2 | Yes |
3 | Test2A | Yes |
3 | Test3 | Yes |
I would like to see the output like below:
SUBJECT | TEST | YN | Discrepancy |
1 | Test1 | Yes | |
1 | Test2 | Yes | Yes |
1 | Test2A | Yes | Yes |
1 | Test3 | Yes | |
2 | Test1 | Yes | |
2 | Test2 | Yes | |
2 | Test2A | No | |
2 | Test3 | Yes | |
2 | Test4 | Yes | |
3 | Test1 | Yes | |
3 | Test2 | Yes | Yes |
3 | Test2A | Yes | Yes |
3 | Test3 | Yes |
I am new to SAS programming, can you please provide any suggestions how to get the output?
Thanks in advance!
Pakala
data have;
input SUBJECT TEST $ YN $;
datalines;
1 Test1 Yes
1 Test2 Yes
1 Test2A Yes
1 Test3 Yes
2 Test1 Yes
2 Test2 Yes
2 Test2A No
2 Test3 Yes
2 Test4 Yes
3 Test1 Yes
3 Test2 Yes
3 Test2A Yes
3 Test3 Yes
;
data want;
merge have have(firstobs=2 rename=(TEST=_TEST yn=_yn));
retain discrepancy;
if test='Test2' and _test='Test2A' and YN='Yes' and _yn='Yes' then do;
discrepancy="Yes";
output;
end;
else if test='Test2A' and discrepancy="Yes" then do;
discrepancy="Yes";
output;
call missing(discrepancy);
end;
else output;
drop _:;
run;
Since you're new to SAS programming, here are a few items to consider. First, there is not a really simple solution. Second, the exact spelling of variable values matters. These are all different to SAS: Yes, yes, YES. So I'll be perhaps too careful and apply the UPCASE function in making comparisons.
This solution assumes that your data set is already sorted BY SUBJECT:
data want;
n_yesses=0;
do until (last.subject);
set have;
by subject;
if upcase(test) in ("TEST2", "TEST2A") and upcase(YN) = "YES" then n_yesses + 1;
end;
do until (last.subject);
set have;
by subject;
if upcase(test) in ("TEST2", "TEST2A") and n_yesses = 2 then Discrepancy="Yes";
output;
end;
drop n_yesses;
run;
The top loop examines all observations for a SUBJECT, and counts the number of yesses. Then the bottom loop works with the exact same observations, assigns DISCREPANCY, and outputs the results.
data have;
input SUBJECT TEST $ YN $;
datalines;
1 Test1 Yes
1 Test2 Yes
1 Test2A Yes
1 Test3 Yes
2 Test1 Yes
2 Test2 Yes
2 Test2A No
2 Test3 Yes
2 Test4 Yes
3 Test1 Yes
3 Test2 Yes
3 Test2A Yes
3 Test3 Yes
;
data want;
merge have have(firstobs=2 rename=(TEST=_TEST yn=_yn));
retain discrepancy;
if test='Test2' and _test='Test2A' and YN='Yes' and _yn='Yes' then do;
discrepancy="Yes";
output;
end;
else if test='Test2A' and discrepancy="Yes" then do;
discrepancy="Yes";
output;
call missing(discrepancy);
end;
else output;
drop _:;
run;
Use @Astounding's or @novinosrin''s solution because they are better, but I had to go for a pure SQL solution because that is what I do.
data test_scores;
input
@1 SUBJECT $1.
@3 TEST $7.
@10 YN $3.;
datalines;
1 Test1 Yes
1 Test2 Yes
1 Test2A Yes
1 Test3 Yes
2 Test1 Yes
2 Test2 Yes
2 Test2A No
2 Test3 Yes
2 Test4 Yes
3 Test1 Yes
3 Test2 Yes
3 Test2A Yes
3 Test3 Yes
;
run;
*we can identify the problem cases like this;
proc sql;
create table problems as
select a.*, 'Yes' as Discrepancy from
(
select subject, test, yn
from test_scores
where test = 'Test2' and YN = 'Yes'
union
select subject, test, yn
from test_scores
where test = 'Test2A' and YN = 'Yes'
) as a
group by subject
having count(subject) > 1;
quit;
*and mark them in the whole dataset like this;
proc sql;
create table all_with_marked_cases as
select a.subject, a.test, a.yn, b.discrepancy
from test_scores a left join problems b
on a.subject = b.subject and a.test = b.test and a.yn = b.yn;
quit;
Gives us
The SAS System SUBJECT TEST YN Discrepancy 1 Test1 Yes 1 Test2 Yes Yes 1 Test2A Yes Yes 1 Test3 Yes 2 Test1 Yes 2 Test2 Yes 2 Test2A No 2 Test3 Yes 2 Test4 Yes 3 Test1 Yes 3 Test2 Yes Yes 3 Test2A Yes Yes 3 Test3 Yes
You may now return to your regularly scheduled programming.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.