I am having trouble to figure out how to compute these two Variables (Want: Recovered and starting recovery day)
HEre is a sample Data. (ID=3 Day 1 should be N and not M). Yes/No is coded as 1/0 with a format.
So basically, a subject is considered recovered if they it's a Y from one point to the end of the study (Day 45). So in the example above, ID=1 recovered started on the 3rd day and ID=2 on the 4th). While ID=3 didn't recover as they didn't recover day 5 and 6. THere are different patterns, basically, where subjects can recover and unrecover.
definition of recovery: Yes at one timepoint and Yes at all future time points from that first time point examined (so up to Day 45).
I have been trying to think of an easy way to code this, however, the only way I can think about it check every time frame for Y (e.g. day 44-45, then day 43-45.. then 42-45 etc.)
I am sure there is an easier way to do this, but I can't figure it out. I am hoping someone here can help me.
The next part, What is the starting time point for recovery?
data have;
input ID Day1 Day2 Day3 Day4 Day5 Day6;
datalines;
1 0 0 1 1 1 1
2 0 1 0 1 1 1
3 0 1 1 1 0 0
;
Thank you @Tpham
data have;
input ID Day1 Day2 Day3 Day4 Day5 Day6;
datalines;
1 0 0 1 1 1 1
2 0 1 0 1 1 1
3 0 1 1 1 0 0
;
data want;
set have;
array d(*) day6-day1;
if coalesce(of d(*)) then do;
Recovered='YES';
startday=dim(d)-whichn(0,of d(*))+2;
end;
else Recovered='NO';
run;
Going forward, Kindly post in the form of datastep/plain text always. Thank you for honoring the request
Instead of pic, may i request you to post either in the form of a datastep or plain text to make it interesting for people like to me to easily copy/paste and test
Sorry about that. It's posted.
data have;
input ID Day1 Day2 Day3 Day4 Day5 Day6;
datalines;
1 0 0 1 1 1 1
2 0 1 0 1 1 1
3 0 1 1 1 0 0
;
Thank you @Tpham
data have;
input ID Day1 Day2 Day3 Day4 Day5 Day6;
datalines;
1 0 0 1 1 1 1
2 0 1 0 1 1 1
3 0 1 1 1 0 0
;
data want;
set have;
array d(*) day6-day1;
if coalesce(of d(*)) then do;
Recovered='YES';
startday=dim(d)-whichn(0,of d(*))+2;
end;
else Recovered='NO';
run;
Going forward, Kindly post in the form of datastep/plain text always. Thank you for honoring the request
Thanks for this, I am starting to breakdown to understand the functions you used and looking them up, as I am not familiar with these functions.
I assuming as with other functions, it would ignore missing data. (missing data is ok as long as the pattern stands)
Missing data as in? Can you modify your sample plz
data have;
input ID Day1 Day2 Day3 Day4 Day5 Day6;
datalines;
1 0 0 1 1 1 1
2 0 1 0 1 1 1
3 0 1 1 1 0 0
4 0 0 1 . 1 1
;
Yeah it looks like it does. As ID=4 is missing Day 4 and it returned Day 3 as the start day for recover
Also, can you explain or point me in the right direction on what coalesce(of d(*)) does. I am hoping to learn from this? From looking at your code and reading up on the documentation on this function, it looks like it return the first non-missing value. I am wondering how does the of d(*) indicates the 1?
Hi @Tpham It's actually simple, and I will leave some NOTES shortly once i get home. Right now, I am on the run to catch the CTA Train to get back home. Cya soon!
Hi @Tpham Sorry about the delay:
Here is the logic:
1. First off, I reverse the array grouping from right most to left most.This is to make it convenient to idenitify the last non missing value, reversing it makes it easy to identify latest non missing value.
2. coalesce(of d(*)) results in a boolean expression true(1) if found or false(0 or missing if not found)
3. If found, I basically calculate the position by subtracting from dim i.e total number of elements in array group minus the position at which the last 0 was found, and determing the first position of 1 the follows last 0, that's in essence startingday.
4. Thhe reversal of array is the charm of this solution and everything else is plain math.
5. I've been exposed to many of this kind, so it's easy and I'm sure with time you'll get the speed & intuition.
HTH & Have fun!
Thank you for explaining all this. This was a great learning opportunity!!!
data have;
input ID Day1 Day2 Day3 Day4 Day5 Day6;
datalines;
1 0 0 1 1 1 1
2 0 1 0 1 1 1
3 0 1 1 1 0 0
;
data want(drop=i);
set have;
array days{*} Day1-Day6;
Recovered="Yes";
if Day6=0 then do;
Recovered="No";
startingday=.;
end;
else do i=dim(days) to 1 by -1;
if days[i]=0 then do;
startingday=i+1;
return;
end;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.