DATA Step, Macro, Functions and more

deriving variable by looking between observations

Accepted Solution Solved
Reply
Contributor
Posts: 42
Accepted Solution

deriving variable by looking between observations

[ Edited ]

Hi All,

 

Here is the sample data. I need to derive two variables llt_code and comments. Below is the rules for derivation.

 

Year    Study    Term             LLT

2000   123         nausea          A

2000   456         nausea          B

2001    789         nausea         C

2001    789         nausea         A

 

If TERM is same but has different values in LLT then populate LLT_CODE = “Codes do not match” for all these records. Or, set LLT_CODE to “Match”.

 

Comment: Concatenate unique different LLT_TERM within same MHTERM with ‘; ‘ delimeter.

                  Ex: LLT_CODE =’ Codes do not match’ and comment =’Comparing with A; B; C’

 

Final Data:

 

Year    Study    Term             LLT   LLT_TERM                             comments

2000   123         nausea          A     Codes do not match                Comparing with A; B; C

2000   456         nausea          B     Codes do not match                 Comparing with A; B; C

2001    789         nausea         C     Codes do not match                  Comparing with A; B; C

2001    789         nausea         A     Codes do not match                  Comparing with A; B; C

 

Could someone please help me with the logic.

 

Thanks a Lot.

 


Accepted Solutions
Solution
3 weeks ago
Super User
Posts: 10,046

Re: deriving variable by looking between observations

OK. No problem. Just assign a  value to missing value.
Note: the value stand for missing value can't be the same as LLT, here I use 'NA'.


data have;
input Year    Study    Term  : $20.           LLT $;
if missing(LLT) then LLT='NA';
cards;
2000   123         nausea          A
2000   456         nausea          B
2001    789         nausea         C
2001    442         nausea         A
2000    234      headache       ache
2001 423     headache       ache
2000    123         toothpain      pain
2000    234      toothpain       ache
2001 423         toothpain       ache
2001   23       headache      pain
2001   234       headache       pain
2002   123       headache       pain
2002   456        headache       .
;
run;
data want;
length list $ 2000  LLT_TERM comments $ 80;
n=0;
 do until(last.term);
  set have;
  by term notsorted;
  n+1;
  if not findw(list,strip(llt),'; ') then list=catx(';',list,llt);
 end;
 flag=countw(list,';');
 do until(last.term);
  set have;
  by term notsorted;  
  if flag=1 then do;comments=' '; LLT_TERM='matched'; end;
   else do;comments='Comparing with '||list; LLT_TERM=' Codes do not match';end;
  output;
 end;
drop n flag list;
 run;


View solution in original post


All Replies
Respected Advisor
Posts: 4,937

Re: deriving variable by looking between observations

All your example cases are the same. Please show a more complete example with some matches. Tell us what happens when some records match and others don't.

PG
Contributor
Posts: 42

Re: deriving variable by looking between observations

Year    Study    Term             LLT
2000   123         nausea          A
2000   456         nausea          B
2001    789         nausea         C
2001    442         nausea         A
2000    234      headache       ache
2001 423     headache       ache
2000    123         toothpain      pain
2000    234      toothpain       ache
2001 423         toothpain       ache

output should look like this:

Final Data: based on the term we need to apply the logic.
 
Year    Study    Term             LLT   LLT_TERM                             comments
2000   123         nausea          A     Codes do not match                Comparing with A; B; C
2000   456         nausea          B     Codes do not match                 Comparing with A; B; C
2001    789         nausea         C     Codes do not match                  Comparing with A; B; C
2001    442         nausea         A     Codes do not match                  Comparing with A; B; C

2000    234      headache       ache   match
2001 423     headache       ache   match
2000    123         toothpain      pain   Codes do not match   Comparing with pain; ache
2000    234      toothpain       ache  Codes do not match   Comparing with pain; ache
2001 423         toothpain       ache  Codes do not match   Comparing with pain; ache

Super User
Posts: 10,046

Re: deriving variable by looking between observations

Assuming I understand what you mean.

 

data have;
input Year    Study    Term  : $20.           LLT $;
cards;
2000   123         nausea          A
2000   456         nausea          B
2001    789         nausea         C
2001    442         nausea         A
2000    234      headache       ache
2001 423     headache       ache
2000    123         toothpain      pain
2000    234      toothpain       ache
2001 423         toothpain       ache
;
run;
data want;
length list $ 20000  LLT_TERM comments $ 80;
n=0;
 do until(last.term);
  set have;
  by term notsorted;
  n+1;
  if not findw(list,strip(llt),'; ') then list=catx(';',list,llt);
 end;
 flag=countw(list,';');
 do until(last.term);
  set have;
  by term notsorted;  
  if flag=1 then do;comments=' '; LLT_TERM='matched'; end;
   else do;comments='Comparing with '||list; LLT_TERM=' Codes do not match';end;
  output;
 end;
drop n flag list;
 run;
Respected Advisor
Posts: 4,937

Re: deriving variable by looking between observations

Great code @Ksharp! Could be improved by putting the assignments to comments and LLT_TERM before the second do until loop. You could also get rid of flag and n.

PG
Super User
Posts: 10,046

Re: deriving variable by looking between observations

@PGStats , Yes. As you wish . Glad to see you come back .

Contributor
Posts: 42

Re: deriving variable by looking between observations

Thank You Very Much. Working as expected.

Contributor
Posts: 42

Re: deriving variable by looking between observations

Hi ,

 

have issue with the code for  below scenarios, Where llt is missing for 2000 or 2001. It is showing as count 1 so assigning as match,but this scenario should be treated as codes do not match.

 

2000   123         nausea          A
2001    442        nausea          .

 

have another scenario: where in 2001 same term but llt value is different for 2001  records, in this scenario also getting match, but it should come as do not match.

 

2000   123         nausea          A

2000   234         nausea          A
2001    442        nausea         A

2001    442        nausea          .

 

 

Contributor
Posts: 42

Re: deriving variable by looking between observations

Hi This solution is working for all other scenarios, but facing issues in the below scenarios. Could you please suggest how to fix this issue.

 

study        year       term             llt

123        2001         headache      pain

234        2001         headache       pain

123        2002         headache       pain

456        2002         headache       .

 

234        2001         neckpain       injury

456        2002         neckpain       .

 

In the above scenarios flag as 1, because there is no other code in this scenario I want to display codes does not match but it is showing as match. Please suggest me the fix for this scenario.

 

Thanks,

sasg

 

Solution
3 weeks ago
Super User
Posts: 10,046

Re: deriving variable by looking between observations

OK. No problem. Just assign a  value to missing value.
Note: the value stand for missing value can't be the same as LLT, here I use 'NA'.


data have;
input Year    Study    Term  : $20.           LLT $;
if missing(LLT) then LLT='NA';
cards;
2000   123         nausea          A
2000   456         nausea          B
2001    789         nausea         C
2001    442         nausea         A
2000    234      headache       ache
2001 423     headache       ache
2000    123         toothpain      pain
2000    234      toothpain       ache
2001 423         toothpain       ache
2001   23       headache      pain
2001   234       headache       pain
2002   123       headache       pain
2002   456        headache       .
;
run;
data want;
length list $ 2000  LLT_TERM comments $ 80;
n=0;
 do until(last.term);
  set have;
  by term notsorted;
  n+1;
  if not findw(list,strip(llt),'; ') then list=catx(';',list,llt);
 end;
 flag=countw(list,';');
 do until(last.term);
  set have;
  by term notsorted;  
  if flag=1 then do;comments=' '; LLT_TERM='matched'; end;
   else do;comments='Comparing with '||list; LLT_TERM=' Codes do not match';end;
  output;
 end;
drop n flag list;
 run;


Contributor
Posts: 42

Re: deriving variable by looking between observations

Thank a Lot.

Contributor
Posts: 42

Re: deriving variable by looking between observations

one more last question if I want to display LLT_CODE in the comments as alphabetical order.

 

ex:

2000   123         nausea         vomiting
2001   456         nausea         nausea

2000   123         headache       injury
2001   456        headache         pain

 

comments should be 'Comparing nausea ; vomiting'

 'Comparing injury;  pain'

Super User
Posts: 10,046

Re: deriving variable by looking between observations

OK. No problem.

 


data have;
input Year    Study    Term  : $20.           LLT $;
if missing(LLT) then LLT='NA';
cards;
2000   123         nausea          C
2000   456         nausea          B
2001    789         nausea         A
2001    442         nausea         A
2000    234      headache       ache
2001 423     headache       ache
2000    123         toothpain      pain
2000    234      toothpain       ache
2001 423         toothpain       ache
2001   23       headache      pain
2001   234       headache       pain
2002   123       headache       pain
2002   456        headache       .
;
run;
data want;
length list $ 2000  LLT_TERM comments $ 80;
n=0;
 do until(last.term);
  set have;
  by term notsorted;
  n+1;
  if not findw(list,strip(llt),'; ') then list=catx(';',list,llt);
 end;
 flag=countw(list,';');
 do until(last.term);
  set have;
  by term notsorted;  
  if flag=1 then do;comments=' '; LLT_TERM='matched'; end;
   else do;comments=list; LLT_TERM=' Codes do not match';end;
  output;
 end;
drop n flag list;
 run;



 data final_want;
  set want;
  length new_comments $ 2000;
  array x{999} $ 40;
  do i=1 to countw(comments,';');
    x{i}=scan(comments,i,';');
	if x{i}='NA' then x{i}=' ';
  end;
 call sortc(of x{*});
 new_comments='Comparing with: '||catx(';',of x{*});
 drop comments x1-x999 i;
run;
Frequent Contributor
Posts: 149

Re: deriving variable by looking between observations

Problem can be solved by using "look ahead and back"-technique. You will find papers and posts explaining it in detail.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 267 views
  • 1 like
  • 4 in conversation