BookmarkSubscribeRSS Feed
yellowyellowred
Obsidian | Level 7

I have a dataset 

IDTimeAnswer1Answer2
101010
101300
102010
102111

 

 

and want the output

 

IDImproved_1Maintained_1Improved_2Maintained_2
1010000
1020110

 

 

- Input dataset: There are two observations for each unique ID. Each Answer_k column can take on values 0 and 1, and can also be missing. 

 

- Output dataset: "Improved" means if the answer changed from 0 to 1. "Maintained" means if the answer started as 1 and ends as 1.  
  For the case when the answer starts as 0 and ends as 0, then Improved = 0 and Maintained = 0.

 For the case when the answer starts as 1 and ends as 0, then Improved = 0 and Maintained = 0.

For the case when the answer at any timepoint is missing, then Improved = 0 and Maintained = 0.

 

There are at most 2 or 3 "Answer" columns, so could someone please provide a solution using proc sql, like using a "case when" statement or something? I just want to use what I'm familiar with.

Thanks

2 REPLIES 2
ballardw
Super User

Proc SQL is generally not the approach for when order of data is important. SQL is designed to work on sets, not sequential records. The data step is designed for sequential processing.

First thing you should provide example data in the form of data step code and paste that into a code or text box opened using the </> or "running man" icons that appear above the message windows. If we have to make assumptions about variable type and values then we may have solution code that does not match your data.

 

My take: This assumes that your data set is already sorted by ID and time, and that time increases.

data have;
input ID Time Answer1 Answer2;
datalines;
101	0	1	0
101	3	0	0
102	0	1	0
102	1	1	1
;

data want;
   set have;
   by ID;
   lanswer1=lag(answer1);
   lanswer2=lag(answer2);
   improved1 = (lanswer1=0 and answer1=1);
   maintain1 = (lanswer1=1 and answer1=1);
   improved2 = (lanswer2=0 and answer2=1);
   maintain2 = (lanswer2=1 and answer2=1);

   if last.id;
   keep id improved: maintain:;
run;

The BY group allows the use of First and last to identify records. Lag gets the value of a variable from the previous record. Note that SAS will return a numeric 1 for a true comparison and 0 for false. So placing both of your conditions in the parentheses evaluates the whole thing as one logical comparison.

The If last.id only keeps records  that are the last of the ID group for output.

FreelanceReinh
Jade | Level 19

I would also prefer a DATA step for this task. For a PROC SQL approach you would need to distinguish the two observations per ID based on the TIME value (unless you create a sequential number in a preliminary DATA step). Your sample data suggest that perhaps TIME=0 indicates the first and TIME>0 the second observation.

proc sql;
create table want as
select a.id
      ,a.Answer1 as preAnswer1, b.Answer1 as postAnswer1
      ,a.Answer2 as preAnswer2, b.Answer2 as postAnswer2
      ,a.Answer1=0 & b.Answer1=1 as Improved_1, a.Answer1=1 & b.Answer1=1 as Maintained_1
      ,a.Answer2=0 & b.Answer2=1 as Improved_2, a.Answer2=1 & b.Answer2=1 as Maintained_2
from have(where=(time=0)) a, have(where=(time>0)) b
where a.id=b.id
order by id;
quit;

By using a full join instead of the inner join you could extend this code to the case that some IDs have only one observation.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 290 views
  • 0 likes
  • 3 in conversation