Dear all,
I need to create a new count variable to find out number of tests before and after a first positive test. Not sure how to approach this situation. My data set contains ID, Count, Result (1/0, 1 for positive and 0 for negative). Same ID can have multiple observations.
ID Count Result Count_New
1 1 0 2
1 2 0 2
1 3 1
1 4 0 3
1 5 0 3
1 6 1 3
2 1 0
2 2 0
3 1 1 0
4 1 0 2
4 2 0 2
4 3 1
5 1 1 0
5 2 0 2
5 3 1 2
Much easier in Datastep, But I wanted some fun In Proc SQL. Please wait for datastep solutions
data have;
input ID Count Result ;* Count_New;
cards;
1 1 0 2
1 2 0 2
1 3 1
1 4 0 3
1 5 0 3
1 6 1 3
2 1 0
2 2 0
3 1 1 0
4 1 0 2
4 2 0 2
4 3 1
5 1 1 0
5 2 0 2
5 3 1 2
;
proc sql;
create table want(drop=t:) as
select *,count(t1) as Count_New
from
(select *,min(ifn(result=1,count,.)) as t,
case when count<calculated t then 1
when count> calculated t then 0 else . end as t1
from have
group by id)
group id ,t1
order by id,count;
quit;
data have;
input ID Count Result ;* Count_New;
cards;
1 1 0 2
1 2 0 2
1 3 1
1 4 0 3
1 5 0 3
1 6 1 3
2 1 0
2 2 0
3 1 1 0
4 1 0 2
4 2 0 2
4 3 1
5 1 1 0
5 2 0 2
5 3 1 2
;
data want;
do _n_=1 by 1 until(last.id);
set have;
by id;
if t then continue;
if result then t=count;
end;
_iorc_=count;
do _n_=1 to _n_;
set have;
if t and count<t then Count_new=t-1;
else if t and count>t then Count_new=_iorc_-t;
else if t then Count_new=0;
output;
end;
drop t;
run;
I've modified your result data set a little bit.
data have;
input ID Count Result ;* Count_New;
cards;
1 1 0 2
1 2 0 2
1 3 1
1 4 0 3
1 5 0 3
1 6 1 3
2 1 0
2 2 0
3 1 1 0
4 1 0 2
4 2 0 2
4 3 1
5 1 1 0
5 2 0 2
5 3 1 2
;
data want (drop=_:);
do until (last.id);
set have;
by id;
if result=1 and count_first_positive=. then count_first_positive=count;
end;
if count_first_positive^=. then count_new=count_first_positive-1;
if count_first_positive^=. then _after_count=count-count_first_positive;
do until (last.id);
set have;
by id;
if count=count_first_positive then count_new=0;
else if count_new=0 then count_new=_after_count;
output;
end;
run;
I've added the variables COUNT_FIRST_POSITIVE to report the COUNT values for the first positive result. This will allow you to look at any record in the resulting data set and know whether it precedes (COUNT<COUNT_FIRST_POSITIVE) or follows (COUNT>COUNT_FIRST_POSITIVE) the event. Otherwise when you look at COUNT_NEW you still don't know whether the current observations precedes or follows.
Maybe something like this:
Data want;
set have;
by id;
if first.id then do;
P_first=_N_;
P_positive=.;
end;
if last.id then
P_last=_N_;
if result=1 and P_positive=. then
P_positive=_N_;
retain P_:;
drop P_:;
if last.id;
Count_New=P_positive-P_first;
do _N_=P_first to P_positive-1;
set have point=_N_;
output;
end;
Count_New=.;
set have point=P_positive;
output;
Count_New=P_last-P_positive;
do _N_=P_positive+1 to P_last;
set have point=_N_;
output;
end;
run;
Dear all,
Sorry for the late response. I have modified my request. I just want to identify tests that are done before 1st positive Result (i.e. all tests before 1st Result = 1) and all tests done after 1st result = 1 as two separate variables.
Thank you so much for all the responses. I really appreciate it.
Here is what I need for the new count variable:
ID Count Result Test_Before Test_After
1 1 0 Yes
1 2 0 Yes
1 3 1
1 4 0 Yes
1 5 0 Yes
1 6 1 Yes
2 1 0
2 2 0
3 1 1
4 1 0 Yes
4 2 0 Yes
4 3 1
5 1 1
5 2 0 Yes
5 3 1 Yes
Make a new retained variable to keep track of whether you have found a result yet.
Then it is easy to get the logic to create your new variables.
data want;
set have;
by id;
retain found ;
if first.id then found=0;
if result and not found then do;
found=1;
test_before=0;
test_after=0;
end;
else do;
test_before=not found;
test_after=found;
end;
drop found;
run;
test_ test_ Obs ID Count Result before after 1 1 1 0 1 0 2 1 2 0 1 0 3 1 3 1 0 0 4 1 4 0 0 1 5 1 5 0 0 1 6 1 6 1 0 1 7 2 1 0 1 0 8 2 2 0 1 0 9 3 1 1 0 0 10 4 1 0 1 0 11 4 2 0 1 0 12 4 3 1 0 0 13 5 1 1 0 0 14 5 2 0 0 1 15 5 3 1 0 1
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.