Adding new columns that indicates the change in value : proc sql

yellowyellowred · Posted 07-27-2022 10:00 PM

I have a dataset

ID	Time	Answer1	Answer2
101	0	1	0
101	3	0	0
102	0	1	0
102	1	1	1

and want the output

ID	Improved_1	Maintained_1	Improved_2	Maintained_2
101	0	0	0	0
102	0	1	1	0

- Input dataset: There are two observations for each unique ID. Each Answer_k column can take on values 0 and 1, and can also be missing.

- Output dataset: "Improved" means if the answer changed from 0 to 1. "Maintained" means if the answer started as 1 and ends as 1.
For the case when the answer starts as 0 and ends as 0, then Improved = 0 and Maintained = 0.

For the case when the answer starts as 1 and ends as 0, then Improved = 0 and Maintained = 0.

For the case when the answer at any timepoint is missing, then Improved = 0 and Maintained = 0.

There are at most 2 or 3 "Answer" columns, so could someone please provide a solution using proc sql, like using a "case when" statement or something? I just want to use what I'm familiar with.

Thanks

ballardw · Posted 07-27-2022 11:34 PM

Proc SQL is generally not the approach for when order of data is important. SQL is designed to work on sets, not sequential records. The data step is designed for sequential processing.

First thing you should provide example data in the form of data step code and paste that into a code or text box opened using the </> or "running man" icons that appear above the message windows. If we have to make assumptions about variable type and values then we may have solution code that does not match your data.

My take: This assumes that your data set is already sorted by ID and time, and that time increases.

data have;
input ID Time Answer1 Answer2;
datalines;
101	0	1	0
101	3	0	0
102	0	1	0
102	1	1	1
;

data want;
   set have;
   by ID;
   lanswer1=lag(answer1);
   lanswer2=lag(answer2);
   improved1 = (lanswer1=0 and answer1=1);
   maintain1 = (lanswer1=1 and answer1=1);
   improved2 = (lanswer2=0 and answer2=1);
   maintain2 = (lanswer2=1 and answer2=1);

   if last.id;
   keep id improved: maintain:;
run;

The BY group allows the use of First and last to identify records. Lag gets the value of a variable from the previous record. Note that SAS will return a numeric 1 for a true comparison and 0 for false. So placing both of your conditions in the parentheses evaluates the whole thing as one logical comparison.

The If last.id only keeps records that are the last of the ID group for output.

FreelanceReinh · Posted 07-28-2022 04:47 AM

I would also prefer a DATA step for this task. For a PROC SQL approach you would need to distinguish the two observations per ID based on the TIME value (unless you create a sequential number in a preliminary DATA step). Your sample data suggest that perhaps TIME=0 indicates the first and TIME>0 the second observation.

proc sql;
create table want as
select a.id
      ,a.Answer1 as preAnswer1, b.Answer1 as postAnswer1
      ,a.Answer2 as preAnswer2, b.Answer2 as postAnswer2
      ,a.Answer1=0 & b.Answer1=1 as Improved_1, a.Answer1=1 & b.Answer1=1 as Maintained_1
      ,a.Answer2=0 & b.Answer2=1 as Improved_2, a.Answer2=1 & b.Answer2=1 as Maintained_2
from have(where=(time=0)) a, have(where=(time>0)) b
where a.id=b.id
order by id;
quit;

By using a full join instead of the inner join you could extend this code to the case that some IDs have only one observation.

Adding new columns that indicates the change in value : proc sql

Re: Adding new columns that indicates the change in value : proc sql

Re: Adding new columns that indicates the change in value : proc sql

Adding new columns that indicates the change in value : proc sql

Re: Adding new columns that indicates the change in value : proc sql

Re: Adding new columns that indicates the change in value : proc sql

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!