Hi,
I am working on a lab data with different test types and corresponding specimen dates. I have included a snapshot of my data in the attachment (only tests included, there's dates for each test like date1, date2, date3 etc. for test1, test2, test3 resp).
there are 4 kinds of tests-
EIA IgM, EIA total, WB IgM and WB IgG.
I want to obtain proportion/counts of of observations who had -
1) WB IgG-
2) EIA (total or IgM) followed by WB IgM within 30 days
3)both combinations (1 AND 2)
4)neither (like those with only EIAs, or EIA followed by WBIgG but not within 30 days).
PLEASE SUGGEST!
Please post test data in machine readable form.
For a good response, post some test data, in the form of a datastep (most wont download files or type test data in for you), and an example of what the output should look like. You should be able to do this in multiple ways (proc freq/means/summary, or datastep retain count on sorted dataset, or proc sql).
CaseID | TEST1 | TEST2 | TEST3 | TEST4 | DATE1 | DATE2 | DATE3 | DATE4 |
1 | WB IgG |
|
|
| 2/5/2015 |
|
|
|
2 | WB IgG | EIA total | EIA total |
| 3/4/2015 | 3/4/2015 | 7/13/2015 |
|
3 | EIA IgM | WB IgM | WB IgG |
| 3/4/2016 | 3/4/2016 | 7/3/2016 |
|
4 | EIA total | WB IgG | WB IgG | WB IgG | 2/4/2015 | 2/4/2015 | 7/7/2015 | 12/2/2015 |
5 | WB IgM | EIA total |
|
| 7/11/2015 | 9/11/2015 |
|
|
6 | WB IgM | WB IgG | EIA total |
| 12/12/2015 | 12/12/2015 | 3/12/2016 |
|
7 | WB IgM | WB IgG | EIA total | WB IgM | 10/8/2015 | 10/8/2015 | 11/2/2015 | 11/2/2015 |
And re:output- as outlined in my first post, I am looking at different combinations of tests (used to “confirm” a case’s disease status) for each observations.
If WB IgG (regardless of other tests performed on that ID) say cat=1
If EIA total/EIA IgM and is followed by WB IgM within 30 days, cat=2
If cat=1 is true AND cat=2 is true, cat=3
If neither (only EIA total, EIA IgM, WB IgM but not within 30 days), cat=4.
Output I want will be a basic proc freq of “cat”.
hope this helps understand what i need
Thanks a lot!
Is the case of WB IgM preceding EIA total or EIA IgM to be considered category 4?
This is a poorly worded description of assigning a single category variable as the first line contradicts the second when the
"regardless of other tests" is used in the definition.
If WB IgG (regardless of other tests performed on that ID) say cat=1
If cat=1 is true AND cat=2 is true, cat=3
You need to clarify which combinations of other tests will yield a cat=1
For this one:
If EIA total/EIA IgM and is followed by WB IgM within 30 days, cat=2
What happens if you have both EIA total and EIA IgM but the WB lgm is within 30 days of only one of them?
Converting your data to a long structure help a lot:
data have;
infile datalines dsd;
input CaseID (TEST1 TEST2 TEST3 TEST4) (:$8.) (DATE1 DATE2 DATE3 DATE4)(:mmddyy.);
datalines;
1,WB IgG, , , ,2/5/2015, , ,
2,WB IgG,EIA total,EIA total, ,3/4/2015,3/4/2015,7/13/2015,
3,EIA IgM,WB IgM,WB IgG, ,3/4/2016,3/4/2016,7/3/2016,
4,EIA total,WB IgG,WB IgG,WB IgG,2/4/2015,2/4/2015,7/7/2015,12/2/2015
5,WB IgM,EIA total, , ,7/11/2015,9/11/2015, ,
6,WB IgM,WB IgG,EIA total, ,12/12/2015,12/12/2015,3/12/2016,
7,WB IgM,WB IgG,EIA total,WB IgM,10/8/2015,10/8/2015,11/2/2015,11/2/2015
;
data long;
set have;
array t test:;
array d date:;
do i = 1 to dim(t);
if not missing(t{i}) then do;
test = t{i}; date = d{i}; output;
end;
end;
keep caseId test date;
run;
proc sql;
create table cat1 as
select unique caseId
from long where test="WB IgG";
create table cat2 as
select unique a.caseId
from long as a inner join long as b
on a.caseId=b.caseId and intck("DAY", a.date, b.date) <= 30
where a.test in ("EIA total", "EIA IgM") and b.test = "WB IgM";
create table cat3 as
select * from cat1
intersect
select * from cat2;
create table cat4 as
select unique caseId from long
except
select * from cat1
except
select * from cat2;
create table cat as
select unique
caseId,
caseId in (select * from cat1) as cat1,
caseId in (select * from cat2) as cat2,
caseId in (select * from cat3) as cat3,
caseId in (select * from cat4) as cat4
from long
order by caseId;
select * from cat;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.