BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dera
Obsidian | Level 7

Hi,

 

I have two data sets. The first data set (named 1A) includes a code associated with key words.

The second data set (1B) includes patients' ID with a diagnosis description. See below :

 

Dataset 1A:

CODEYESNO
q123"neck pain" "head pain""No" "Not" "No sign of" "Not known for"
q456"back pain", "lombar pain" "lower back pain""No" "Not" "No sign of" "Not known for"
q789"Knee pain" "Hip pain" "Leg pain""No" "Not" "No sign of" "Not known for"

 

Dataset 1B:

IDDESCRIPTION
1Neck pain
2No sign of lombar pain. Normal exam.
3Lower back pain
4Not known for knee pain

 

 

I am looking for a way to tell SAS that "If the word from the "DESCRIPTION" variable in the 1B data set matches one of the word in the "YES" variable AND that words from the "NO" variable from the 1A data set are not found in the 1B data set, assign the matching code".

 

To illustrate by thoughts, here is a table of what I am looking for :

 

IDDESCRIPTIONCODE
1Neck painq123
2No sign of lombar pain. Normal exam. 
3Lower back painq456
4Not known for knee pain 

 

 

 

Hope it's clear enough. Thanks in advance for your help.

 

 

Cheers

1 ACCEPTED SOLUTION

Accepted Solutions
yabwon
Onyx | Level 15

Hi,

 

I would try following code, maybe it is not the most sophisticated but it seems to solve your problem.

 

all the best

Bart

 

data codes;
infile cards4 dlm = "#" dsd;
input CODE : $ 10. YES : $ 100. NO : $ 100.;
cards4;
q123#"neck pain" "head pain"#"No" "Not" "No sign of" "Not known for"
q456#"back pain" "lombar pain" "lower back pain"#"No" "Not" "No sign of" "Not known for"
q789#"Knee pain" "Hip pain" "Leg pain"#"No" "Not" "No sign of" "Not known for"
;;;;
run;

data subject;
infile cards4 dlm = "#";
input ID DESCRIPTION : $ 50.;
cards4;
1#Neck pain
2#No sign of lombar pain. Normal exam.
3#Lower back pain
4#Not known for knee pain
;;;;
run;

data want(keep = ID DESCRIPTION CD rename=(CD=CODE));

/* for each observation from SUBJECT... */
set subject; length CD $ 100;


/* ... you are looping through all observations from CODES */
do point=1 to nobs;
 set codes nobs=nobs point=point;
 drop yes no;
 put _N_= point=;

 /* then you are parsing the YES list of "pains" for a given code 
    and traverse through that list 
 */
 i=1; length Y $ 50;
 Y = dequote(scan(YES, i, " ", "q"));
 do while (Y ne "");
  put i= Y=;
  if FIND(DESCRIPTION, Y , 'i') then do; /* if you find "pain" in Description ...*/

    NEGATIVE=0; /* ... you look for posible negations in NO list */
    j=1; length N $ 50;
    N = dequote(scan(NO, j, " ", "q"));
    do while (N ne "");   
        
        if FIND(DESCRIPTION, N , 'i') then NEGATIVE=1; 
        put j= N= NEGATIVE=; 

       j=j+1;
       N = dequote(scan(NO, j, " ", "q"));
    end;
    
    if not NEGATIVE then CD = catx(" ", CD, CODE); /* if there is no negation you adding a code to the subject */
  end;
  i=i+1;
  y = dequote(scan(yes, i, " ", "q"));
 end;

end;

output; run;
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



View solution in original post

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, but it doesn't really make sense going that way.  For instance,

Neck pain would not be the same as neck pain, pain in neck, neck ache etc. All of which would likely code to the same code.

What you need to do is some sort of coding exercise as you have not put the rigors into the data capture part (i.e. used free text).  To this end you can do some text matching as you present here, but it would need someone to thoroughly look at this to ensure things are coded correctly.  For instance if the text is "cervical stenosis", should this be classified as Neck pain?  Its not just a matter of pattern matching.

 

As for the how, well thats simple, use the code list and call execute, something like;

data _null_;
  set dataset1a end=last;
  if _n_=1 then call execute('data want;  set dataset1b;');
  call execute(cat('if index(',upcase(yes),',upcase(description) and not(index(',upcase(no),',upcase(description)) then code="',code,'";'));
if last then call execute(';run;');
run;

That will generate an if for each row in dataset1a.  It won't work of course, as you have multiple parts to each item, but demonstrates the method.

yabwon
Onyx | Level 15

Hi,

 

I would try following code, maybe it is not the most sophisticated but it seems to solve your problem.

 

all the best

Bart

 

data codes;
infile cards4 dlm = "#" dsd;
input CODE : $ 10. YES : $ 100. NO : $ 100.;
cards4;
q123#"neck pain" "head pain"#"No" "Not" "No sign of" "Not known for"
q456#"back pain" "lombar pain" "lower back pain"#"No" "Not" "No sign of" "Not known for"
q789#"Knee pain" "Hip pain" "Leg pain"#"No" "Not" "No sign of" "Not known for"
;;;;
run;

data subject;
infile cards4 dlm = "#";
input ID DESCRIPTION : $ 50.;
cards4;
1#Neck pain
2#No sign of lombar pain. Normal exam.
3#Lower back pain
4#Not known for knee pain
;;;;
run;

data want(keep = ID DESCRIPTION CD rename=(CD=CODE));

/* for each observation from SUBJECT... */
set subject; length CD $ 100;


/* ... you are looping through all observations from CODES */
do point=1 to nobs;
 set codes nobs=nobs point=point;
 drop yes no;
 put _N_= point=;

 /* then you are parsing the YES list of "pains" for a given code 
    and traverse through that list 
 */
 i=1; length Y $ 50;
 Y = dequote(scan(YES, i, " ", "q"));
 do while (Y ne "");
  put i= Y=;
  if FIND(DESCRIPTION, Y , 'i') then do; /* if you find "pain" in Description ...*/

    NEGATIVE=0; /* ... you look for posible negations in NO list */
    j=1; length N $ 50;
    N = dequote(scan(NO, j, " ", "q"));
    do while (N ne "");   
        
        if FIND(DESCRIPTION, N , 'i') then NEGATIVE=1; 
        put j= N= NEGATIVE=; 

       j=j+1;
       N = dequote(scan(NO, j, " ", "q"));
    end;
    
    if not NEGATIVE then CD = catx(" ", CD, CODE); /* if there is no negation you adding a code to the subject */
  end;
  i=i+1;
  y = dequote(scan(yes, i, " ", "q"));
 end;

end;

output; run;
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



dera
Obsidian | Level 7

That works perfectly! Thanks Yabwon.

 

Cheers

 

yabwon
Onyx | Level 15

One more version which is a little bit more I/O efficient,

Bart

 

data _null_;
if 0 then set codes nobs=nobs;
call symputx('_NOBS_', nobs, "G");
stop;
run;
data want2(keep = ID DESCRIPTION CD rename=(CD=CODE));

/* load data into memory in temporary arrays */
array _CODE[&_NOBS_.] $ 10 _temporary_; 
array _YES[&_NOBS_.] $ 100 _temporary_;
array _NO[&_NOBS_.] $ 100 _temporary_;

do until(eof);
set codes end=eof curobs=curobs;
_CODE[curobs] = CODE;
_YES[curobs] = YES;
_NO[curobs] = NO;
end;


/* loop to get each observation from SUBJECT and... */
do until(_EOF_);
set subject end=_EOF_ curobs=_N_; length CD $ 100;
CD = "";

/* ... you are looping through all observations from CODES, 
   but now you use data stored in arrays instead of dataset on disk */
do point = 1 to &_NOBS_.;
 CODE = _CODE[point];
 YES = _YES[point];
 NO = _NO[point];
 drop yes no;
 put _N_= point=;

 /* then you are parsing the YES list of "pains" for a given code 
    and traverse through that list 
 */
 i=1; length Y $ 50;
 Y = dequote(scan(YES, i, " ", "q"));
 do while (Y ne "");
  put i= Y=;
  if FIND(DESCRIPTION, Y , 'i') then do; /* if you find "pain" in Description ...*/

    NEGATIVE=0; /* ... you look for posible negations in NO list */
    j=1; length N $ 50;
    N = dequote(scan(NO, j, " ", "q"));
    do while (N ne "");   
        
        if FIND(DESCRIPTION, N , 'i') then NEGATIVE=1; 
        put j= N= NEGATIVE=; 

       j=j+1;
       N = dequote(scan(NO, j, " ", "q"));
    end;
    
    if not NEGATIVE then CD = catx(" ", CD, CODE); /* if there is no negation you adding a code to the subject */
  end;
  i=i+1;
  y = dequote(scan(yes, i, " ", "q"));
 end;

end;
output;
end;

stop;
run;
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 867 views
  • 0 likes
  • 3 in conversation