I have following dataset with the following 4 variables...subjid,number,cat,and visitnum..... the number of lesion must remain constant from visit to visit.....i should output where The number of target lesions is not constant from visit to visit....the below sample subject have constant number from visit to visit and we should output discrepancies if there any like those highlighted in blue.....
100010001 T02 TARGET LESIONS 0
100010001 T01 TARGET LESIONS 0
100010001 T03 TARGET LESIONS 0
100010001 T02 TARGET LESIONS 1
100010001 T03 TARGET LESIONS 1
100010001 T01 TARGET LESIONS 1
100010001 T02 TARGET LESIONS 2
100010001 T01 TARGET LESIONS 2
100010001 T03 TARGET LESIONS 2
100010002 T02 TARGET LESIONS 0
100010002 T01 TARGET LESIONS 0
100010002 T02 TARGET LESIONS 1
100010002 T01 TARGET LESIONS 1
100010002 T02 TARGET LESIONS 2
100010002 T01 TARGET LESIONS 2
100030002 T03 TARGET LESIONS 0
100030002 T04 TARGET LESIONS 0
100030002 T02 TARGET LESIONS 0
100030002 T01 TARGET LESIONS 0
100030002 T01 TARGET LESIONS 1
100030002 T04 TARGET LESIONS 1
100030002 T03 TARGET LESIONS 1
100030002 T02 TARGET LESIONS 1
160020001 T02 TARGET LESIONS 0
160020001 T01 TARGET LESIONS 0
160020001 T02 TARGET LESIONS 1
Did you just say the each NUMBER have the same number of obs in each and every subjid group ?
data have; input subjid : $20. number $ cat & $40. visitnum ; cards; 100010001 T02 TARGET LESIONS 0 100010001 T01 TARGET LESIONS 0 100010001 T03 TARGET LESIONS 0 100010001 T02 TARGET LESIONS 1 100010001 T03 TARGET LESIONS 1 100010001 T01 TARGET LESIONS 1 100010001 T02 TARGET LESIONS 2 100010001 T01 TARGET LESIONS 2 100010001 T03 TARGET LESIONS 2 100010002 T02 TARGET LESIONS 0 100010002 T01 TARGET LESIONS 0 100010002 T02 TARGET LESIONS 1 100010002 T01 TARGET LESIONS 1 100010002 T02 TARGET LESIONS 2 100010002 T01 TARGET LESIONS 2 100030002 T03 TARGET LESIONS 0 100030002 T04 TARGET LESIONS 0 100030002 T02 TARGET LESIONS 0 100030002 T01 TARGET LESIONS 0 100030002 T01 TARGET LESIONS 1 100030002 T04 TARGET LESIONS 1 100030002 T03 TARGET LESIONS 1 100030002 T02 TARGET LESIONS 1 160020001 T02 TARGET LESIONS 0 160020001 T01 TARGET LESIONS 0 160020001 T02 TARGET LESIONS 1 160020002 T02 TARGET LESIONS 0 160020002 T01 TARGET LESIONS 0 160020002 T03 TARGET LESIONS 0 160020002 T03 TARGET LESIONS 1 160020002 T02 TARGET LESIONS 1 160020002 T02 TARGET LESIONS 2 ; run; proc sql; create table temp as select * from ( select *,count(*) as n from have group by subjid , visitnum) group by subjid having range(n) ne 0; quit;
Xia Keshan
Message was edited by: xia keshan
Message was edited by: xia keshan
What if you had:
100010002 T02 TARGET LESIONS 0
100010002 T01 TARGET LESIONS 0
100010002 T03 TARGET LESIONS 1
100010002 T01 TARGET LESIONS 1
Would you want to flag that discrepancy?
PG
Thanks Pg....We consider Visitnum=0 as baseline visit.....So what ever number present in baseline should be present in next occuring visits.....if any number not present in visitnum=0 but present in next visit or vice verse....we consider them as discrepany and we should output those.....
Did you just say the each NUMBER have the same number of obs in each and every subjid group ?
data have; input subjid : $20. number $ cat & $40. visitnum ; cards; 100010001 T02 TARGET LESIONS 0 100010001 T01 TARGET LESIONS 0 100010001 T03 TARGET LESIONS 0 100010001 T02 TARGET LESIONS 1 100010001 T03 TARGET LESIONS 1 100010001 T01 TARGET LESIONS 1 100010001 T02 TARGET LESIONS 2 100010001 T01 TARGET LESIONS 2 100010001 T03 TARGET LESIONS 2 100010002 T02 TARGET LESIONS 0 100010002 T01 TARGET LESIONS 0 100010002 T02 TARGET LESIONS 1 100010002 T01 TARGET LESIONS 1 100010002 T02 TARGET LESIONS 2 100010002 T01 TARGET LESIONS 2 100030002 T03 TARGET LESIONS 0 100030002 T04 TARGET LESIONS 0 100030002 T02 TARGET LESIONS 0 100030002 T01 TARGET LESIONS 0 100030002 T01 TARGET LESIONS 1 100030002 T04 TARGET LESIONS 1 100030002 T03 TARGET LESIONS 1 100030002 T02 TARGET LESIONS 1 160020001 T02 TARGET LESIONS 0 160020001 T01 TARGET LESIONS 0 160020001 T02 TARGET LESIONS 1 160020002 T02 TARGET LESIONS 0 160020002 T01 TARGET LESIONS 0 160020002 T03 TARGET LESIONS 0 160020002 T03 TARGET LESIONS 1 160020002 T02 TARGET LESIONS 1 160020002 T02 TARGET LESIONS 2 ; run; proc sql; create table temp as select * from ( select *,count(*) as n from have group by subjid , visitnum) group by subjid having range(n) ne 0; quit;
Xia Keshan
Message was edited by: xia keshan
Message was edited by: xia keshan
Thanks Xia.... it necessary that number observation in Visit=0 should be same as the number of obseravtions in subsequemt visit....and should output if there is any difference in obseravtion count....Also it would be great if we can add code where we can output where obeservation count do not match from baseline visit and also if NUMBER variable do not match from baseline visit to next visits......
It is more clear .
data have; input subjid : $20. number $ cat & $40. visitnum ; cards; 100010001 T02 TARGET LESIONS 0 100010001 T01 TARGET LESIONS 0 100010001 T03 TARGET LESIONS 0 100010001 T02 TARGET LESIONS 1 100010001 T03 TARGET LESIONS 1 100010001 T01 TARGET LESIONS 1 100010001 T02 TARGET LESIONS 2 100010001 T01 TARGET LESIONS 2 100010001 T03 TARGET LESIONS 2 100010002 T02 TARGET LESIONS 0 100010002 T01 TARGET LESIONS 0 100010002 T02 TARGET LESIONS 1 100010002 T01 TARGET LESIONS 1 100010002 T02 TARGET LESIONS 2 100010002 T01 TARGET LESIONS 2 100030002 T03 TARGET LESIONS 0 100030002 T04 TARGET LESIONS 0 100030002 T02 TARGET LESIONS 0 100030002 T01 TARGET LESIONS 0 100030002 T01 TARGET LESIONS 1 100030002 T04 TARGET LESIONS 1 100030002 T03 TARGET LESIONS 1 100030002 T02 TARGET LESIONS 1 160020001 T02 TARGET LESIONS 0 160020001 T01 TARGET LESIONS 0 160020001 T02 TARGET LESIONS 1 160020002 T02 TARGET LESIONS 0 160020002 T01 TARGET LESIONS 0 160020002 T03 TARGET LESIONS 0 160020002 T03 TARGET LESIONS 1 160020002 T02 TARGET LESIONS 1 160020002 T02 TARGET LESIONS 2 ; run; data want(drop=n group basic); group=0; do until(last.subjid); set have; by subjid; group+1;n=0; do until(last.visitnum); set have ; by subjid visitnum notsorted; n+1; end; if group=1 then basic=n; do until(last.visitnum); set have ; by subjid visitnum notsorted; if group ne 1 and n ne basic then output; end; end; run;
Xia Keshan
I believe you can also do it using Array(), but Hash seems handy:
data have;
input subjid : $20. number $ cat & $40. visitnum;
cards;
100010001 T02 TARGET LESIONS 0
100010001 T01 TARGET LESIONS 0
100010001 T03 TARGET LESIONS 0
100010001 T02 TARGET LESIONS 1
100010001 T03 TARGET LESIONS 1
100010001 T01 TARGET LESIONS 1
100010001 T02 TARGET LESIONS 2
100010001 T01 TARGET LESIONS 2
100010001 T03 TARGET LESIONS 2
100010002 T02 TARGET LESIONS 0
100010002 T01 TARGET LESIONS 0
100010002 T02 TARGET LESIONS 1
100010002 T01 TARGET LESIONS 1
100010002 T02 TARGET LESIONS 2
100010002 T01 TARGET LESIONS 2
100030002 T03 TARGET LESIONS 0
100030002 T04 TARGET LESIONS 0
100030002 T02 TARGET LESIONS 0
100030002 T01 TARGET LESIONS 0
100030002 T01 TARGET LESIONS 1
100030002 T04 TARGET LESIONS 1
100030002 T03 TARGET LESIONS 1
100030002 T02 TARGET LESIONS 1
160020001 T02 TARGET LESIONS 0
160020001 T01 TARGET LESIONS 0
160020001 T02 TARGET LESIONS 1
;
run;
data want;
if _n_=1 then
do;
declare hash h1();
h1.definekey('number');
h1.definedone();
declare hash h2();
h2.definekey('number');
h2.definedone();
end;
do until(last.subjid);
set have;
by subjid;
if visitnum=0 then
rc=h1.replace();
else rc=h2.replace();
end;
rc=h1.equals(hash:
'h2', result: flag);
put flag=;
do until (last.subjid);
set have;
by subjid;
if flag ne 1 then
output;
end;
rc=h1.clear();
rc=h2.clear();
drop rc flag;
run;
To get a detailed list of discrepansies:
data visit;
length subjId $12 cat $16;
input subjid $ number $ cat $& visitnum;
datalines;
100010001 T02 TARGET LESIONS 0
100010001 T01 TARGET LESIONS 0
100010001 T03 TARGET LESIONS 0
100010001 T02 TARGET LESIONS 1
100010001 T03 TARGET LESIONS 1
100010001 T01 TARGET LESIONS 1
100010001 T02 TARGET LESIONS 2
100010001 T01 TARGET LESIONS 2
100010001 T03 TARGET LESIONS 2
100010002 T02 TARGET LESIONS 0
100010002 T01 TARGET LESIONS 0
100010002 T02 TARGET LESIONS 1
100010002 T01 TARGET LESIONS 1
100010002 T02 TARGET LESIONS 2
100010002 T01 TARGET LESIONS 2
100010002 T03 TARGET LESIONS 2
100030002 T03 TARGET LESIONS 0
100030002 T04 TARGET LESIONS 0
100030002 T02 TARGET LESIONS 0
100030002 T01 TARGET LESIONS 0
100030002 T01 TARGET LESIONS 1
100030002 T04 TARGET LESIONS 1
100030002 T03 TARGET LESIONS 1
100030002 T02 TARGET LESIONS 1
160020001 T02 TARGET LESIONS 0
160020001 T01 TARGET LESIONS 0
160020001 T02 TARGET LESIONS 1
;
proc sql;
create table visit0 as
select unique a.subjId, b.visitNum, a.number
from visit as a inner join visit as b
on a.subjid = b.subjid
where a.visitnum = 0;
create table newNumbers as
select subjId, visitNum, number
from visit except corr
select * from visit0;
create table missingNumbers as
select subjId, visitNum, number
from visit0 except corr
select * from visit;
title "Discrepansies";
select "New number", * from newNumbers
union all
select "Missing number", * from missingNumbers
order by subjid, visitNum, number;
quit;
PG
I am novice, unlike the greats PG, Xia and Haikuo, hence the long and ugly way, for learning purpse it helps:
data have;/*Raw data copied from Haikuo*/
input subjid : $20. number $ cat & $40. visitnum;
cards;
100010001 T02 TARGET LESIONS 0
100010001 T01 TARGET LESIONS 0
100010001 T03 TARGET LESIONS 0
100010001 T02 TARGET LESIONS 1
100010001 T03 TARGET LESIONS 1
100010001 T01 TARGET LESIONS 1
100010001 T02 TARGET LESIONS 2
100010001 T01 TARGET LESIONS 2
100010001 T03 TARGET LESIONS 2
100010002 T02 TARGET LESIONS 0
100010002 T01 TARGET LESIONS 0
100010002 T02 TARGET LESIONS 1
100010002 T01 TARGET LESIONS 1
100010002 T02 TARGET LESIONS 2
100010002 T01 TARGET LESIONS 2
100030002 T03 TARGET LESIONS 0
100030002 T04 TARGET LESIONS 0
100030002 T02 TARGET LESIONS 0
100030002 T01 TARGET LESIONS 0
100030002 T01 TARGET LESIONS 1
100030002 T04 TARGET LESIONS 1
100030002 T03 TARGET LESIONS 1
100030002 T02 TARGET LESIONS 1
160020001 T02 TARGET LESIONS 0
160020001 T01 TARGET LESIONS 0
160020001 T02 TARGET LESIONS 1
;
run;
data want;
set have;
by subjid visitnum notsorted;
array n(30) _temporary_; *array subscript is arbitrary;
if first.subjid then call missing(of n{*});
if first.visitnum then count=0;
count+1;
if last.visitnum then n(_n_)=count;
if last.subjid then do;
if max(of n
proc sort data= have out=have1;
by subjid;
run;
data final;
merge want have1;
by subjid;
if discrepancy="yes" then output;
drop discrepancy;
run;
Regards,
Naveen Srinivasan
L&T Infotech
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.