BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tpham
Quartz | Level 8

 

 

I am having trouble to figure out how to compute these two Variables (Want: Recovered and starting recovery day)

 

HEre is a sample Data. (ID=3 Day 1 should be N and not M). Yes/No is coded as 1/0 with a format. 

 

image.png

 

 

 

 

 

 

 

 

 

So basically, a subject is considered recovered  if they it's a Y from one point to the end of the study (Day 45). So in the example above, ID=1 recovered started on the 3rd day and ID=2 on the 4th).  While ID=3 didn't recover as  they didn't recover day 5 and 6. THere are different patterns, basically, where subjects can recover and unrecover.

 

definition of recovery:  Yes at one timepoint and Yes at all future time points from that first time point examined (so up to Day 45).

 

I have been trying to think of an easy way to code this, however, the only way I can think about it check every time frame for Y (e.g. day 44-45, then day 43-45.. then 42-45 etc.)

 

I am sure there is an easier way to do this, but I can't figure it out. I am hoping someone here can help me. 

 

The next part, What is the starting time point for recovery? 

 

 

 

 

data have;
input ID Day1 Day2 Day3 Day4 Day5 Day6;
datalines;
1	0	0	1	1	1	1
2	0	1	0	1	1	1
3	0	1	1	1	0	0
;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Thank you @Tpham

data have;
input ID Day1 Day2 Day3 Day4 Day5 Day6;
datalines;
1	0	0	1	1	1	1
2	0	1	0	1	1	1
3	0	1	1	1	0	0
;

data want;
set have;
array d(*) day6-day1;
if coalesce(of d(*)) then do;
Recovered='YES';
startday=dim(d)-whichn(0,of d(*))+2;
end;
else Recovered='NO';
run;

 

Going forward, Kindly post in the form of datastep/plain text always. Thank you for honoring the request 

View solution in original post

10 REPLIES 10
novinosrin
Tourmaline | Level 20

Instead of pic, may i request you to post either in the form of a datastep or plain text to make it interesting for people like to me to easily copy/paste and test

Tpham
Quartz | Level 8

Sorry about that. It's posted.

 

data have;
input ID Day1 Day2 Day3 Day4 Day5 Day6;
datalines;
1	0	0	1	1	1	1
2	0	1	0	1	1	1
3	0	1	1	1	0	0
;

novinosrin
Tourmaline | Level 20

Thank you @Tpham

data have;
input ID Day1 Day2 Day3 Day4 Day5 Day6;
datalines;
1	0	0	1	1	1	1
2	0	1	0	1	1	1
3	0	1	1	1	0	0
;

data want;
set have;
array d(*) day6-day1;
if coalesce(of d(*)) then do;
Recovered='YES';
startday=dim(d)-whichn(0,of d(*))+2;
end;
else Recovered='NO';
run;

 

Going forward, Kindly post in the form of datastep/plain text always. Thank you for honoring the request 

Tpham
Quartz | Level 8

Thanks for this, I am starting to breakdown to understand the functions you used and looking them up, as I am not familiar with these functions.  

 

I assuming as with other functions, it would ignore missing data. (missing data is ok as long as the pattern stands)

 

novinosrin
Tourmaline | Level 20

Missing data as in? Can you modify your sample plz

Tpham
Quartz | Level 8
data have;
input ID Day1 Day2 Day3 Day4 Day5 Day6;
datalines;
1	0	0	1	1	1	1
2	0	1	0	1	1	1
3	0	1	1	1	0	0
4	0	0	1	.	1	1
;


Yeah it looks like it does. As ID=4 is missing Day 4 and it returned Day 3 as the start day for recover

 

Also, can you explain or point me in the right direction on what coalesce(of d(*)) does. I am hoping to learn from this? From looking at your code and reading up on the documentation on this function, it looks like it return the first non-missing value. I am wondering how does the of d(*) indicates the 1?

novinosrin
Tourmaline | Level 20

Hi @Tpham It's actually simple, and I will leave some NOTES shortly once i get  home. Right now, I am on the run to catch the CTA Train to get back home. Cya soon!

novinosrin
Tourmaline | Level 20

Hi @Tpham  Sorry about the delay:

 

Here is the logic:

1. First off, I reverse the array grouping from right most to left most.This is to make it convenient to idenitify the last non missing value, reversing it makes it easy to  identify latest non missing value.

2. coalesce(of d(*)) results in a boolean expression true(1) if found or false(0 or missing if not found)

3. If found, I basically calculate the position by subtracting from dim i.e total number of elements in array group minus the position at which the last 0 was found, and determing the first position of 1 the follows last 0, that's in essence startingday. 

4. Thhe reversal of array is the charm of this solution and everything else is plain math. 

5. I've been exposed to many of this kind, so it's easy and I'm sure with time you'll get the speed & intuition. 

 

HTH & Have fun!

 

 

 

 

Tpham
Quartz | Level 8

Thank you for explaining all this. This was a great learning opportunity!!!

PeterClemmensen
Tourmaline | Level 20
data have;
input ID Day1 Day2 Day3 Day4 Day5 Day6;
datalines;
1	0	0	1	1	1	1
2	0	1	0	1	1	1
3	0	1	1	1	0	0
;

data want(drop=i);
   set have;
   array days{*} Day1-Day6;
   Recovered="Yes";
   if Day6=0 then do;
      Recovered="No";
      startingday=.;
   end;
   else do i=dim(days) to 1 by -1;
      if days[i]=0 then do;
         startingday=i+1;
         return;
      end;
   end;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1690 views
  • 0 likes
  • 3 in conversation