BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ybz12003
Rhodochrosite | Level 12

Hello,

I have sample data, A, which contains Result and Result_Name columns. I have another sample data, B, which contains Result_Name and Result_Code. Data B acts as a result coding guide. I want to use Data B as a reference to check whether Data A has any entries that don't match the Data B coding guide. The result I'm looking for is shown as "Want." One particular condition is that if the Result_Name contains "Non-coded ID format" or "Non-coded text," the match result should be set to 'TBD.' Please let me know how to approach this.

 

data Sample_A;
      infile datalines dsd;
  input LabID : $15. Test_Code : $15.  Result : $25. Result_Name : $25.;
datalines;
21156, 100157-7_100157-7, BA.2.86, PCR_result, 
21156, 100157-7_100157-7, XBB.1.16.3, PCR_result, 
21156, 100157-7_100157-7, FL.4, PCR_result, 
21156, 100157-7_100157-7, EG.5.1.3, PCR_result,
21156, 100157-6_100157-8, 10828004, Conclusion_result,
21156, 100157-6_100157-8, 260415000, Conclusion_result,
21156, 100157-6_100157-8, 260373001, Conclusion_result,
21156, 100157-6_100157-8, 10000004, Conclusion_result, 
21156, 100357-6_100157-8, 5656, Non-coded ID format, 
21156, 100357-6_100157-8, 7878, Non-coded ID format, 
21156, 104157-6_100157-8, LKJH, Non-coded text, 
21156, 104157-6_100157-8, PPGH, Non-coded text, 
24597, 100157-7_100157-7, XBB.1.5.70, PCR_result, 
24597, 100157-7_100157-7, FL.15, PCR_result, 
24597, 100157-7_100157-7, BA.2.86, PCR_result, 
24597, 100157-6_100157-8, 260373001, Conclusion_result, 
24597, 100157-6_100157-8, 10828005, Conclusion_result, 
24597, 100157-6_100157-8, 10987004, Conclusion_result, 
24597, 100357-6_100157-8, 3245, Non-coded ID format, 
24597, 104157-6_100157-8, OPUY, Non-coded text,  
;

data Sample_B;
      infile datalines dsd;
  input Result_Name : $25. Result_Code : $25. ;
datalines;
PCR_result, BA.2.86, 
PCR_result, BA.5.86,
PCR_result, FL.3, 
PCR_result, FL.8,
PCR_result, FL.10, 
PCR_result, XBB.1.16.3,
PCR_result, XBB.1.5.70,
PCR_result, XBB.3.66.3, 
PCR_result, XBB.8.15.88,
PCR_result, EG.5.1.3, 
PCR_result, JK.0.14.39, 
Conclusion_result, 10828003, 
Conclusion_result, 10828004,
Conclusion_result, 10828005,
Conclusion_result, 10828009,
Conclusion_result, 10987004, 
Conclusion_result, 260373001,
Conclusion_result, 260383001, 
Conclusion_result, 260215000,
Conclusion_result, 260415000, 
Non-coded ID format,Identifier,
Non-coded text, Text,
;

data Want;
      infile datalines dsd;
  input LabID : $15. Test_Code : $15.  Result : $25. Result_Name : $25. Match : $5.;
datalines;
21156, 100157-7_100157-7, BA.2.86, PCR_result, Y,
21156, 100157-7_100157-7, XBB.1.16.3, PCR_result, Y, 
21156, 100157-7_100157-7, FL.4, PCR_result, N, 
21156, 100157-7_100157-7, EG.5.1.3, PCR_result, Y,
21156, 100157-6_100157-8, 10828001, Conclusion_result, N,
21156, 100157-6_100157-8, 260415000, Conclusion_result, Y,
21156, 100157-6_100157-8, 260373001, Conclusion_result, Y,
21156, 100157-6_100157-8, 10000004, Conclusion_result, N,
21156, 100357-6_100157-8, 5656, Non-coded ID format, TBD,
21156, 100357-6_100157-8, 7878, Non-coded ID format, TBD,
21156, 104157-6_100157-8, LKJH, Non-coded text, TBD, 
21156, 104157-6_100157-8, PPGH, Non-coded text, TBD,
24597, 100157-7_100157-7, XBB.1.5.70, PCR_result, Y,
24597, 100157-7_100157-7, FL.15, PCR_result, N,
24597, 100157-7_100157-7, BA.2.86, PCR_result, Y,
24597, 100157-6_100157-8, 260373001, Conclusion_result, Y,
24597, 100157-6_100157-8, 11828005, Conclusion_result, N,
24597, 100157-6_100157-8, 10987064, Conclusion_result,  N,
24597, 100357-6_100157-8, 3245, Non-coded ID format, TBD,
24597, 104157-6_100157-8, OPUY, Non-coded text, TBD,
;
1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @ybz12003,

 

I would probably read the reference dataset B into a hash object, which can then be queried after checking for "Non-coded ..." cases:

data Want;
set Sample_A;
if _n_=1 then do;
  dcl hash h(dataset:'Sample_B(rename=(Result_Code=Result))');
  h.definekey('Result_Name', 'Result');
  h.definedone();
  if 0 then set Sample_B(rename=(Result_Code=Result));
end;
length Match $5;
if Result_Name in ("Non-coded ID format", "Non-coded text") then Match='TBD';
else Match=ifc(h.check()=0,'Y','N');
run;

You may want to modify the IF condition for the "Non-coded ..." cases if the criterion is not an exact match with those strings.

 

Note that length 15 is not sufficient to accommodate the values listed for variable Test_Code. More importantly, three values of variable Result differ between dataset Sample_A and your sample dataset Want. This results in three observations where my suggested code finds a match, whereas your Want dataset has Match='N', based on the different values (see observations 5, 17 and 18).

View solution in original post

3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hello @ybz12003,

 

I would probably read the reference dataset B into a hash object, which can then be queried after checking for "Non-coded ..." cases:

data Want;
set Sample_A;
if _n_=1 then do;
  dcl hash h(dataset:'Sample_B(rename=(Result_Code=Result))');
  h.definekey('Result_Name', 'Result');
  h.definedone();
  if 0 then set Sample_B(rename=(Result_Code=Result));
end;
length Match $5;
if Result_Name in ("Non-coded ID format", "Non-coded text") then Match='TBD';
else Match=ifc(h.check()=0,'Y','N');
run;

You may want to modify the IF condition for the "Non-coded ..." cases if the criterion is not an exact match with those strings.

 

Note that length 15 is not sufficient to accommodate the values listed for variable Test_Code. More importantly, three values of variable Result differ between dataset Sample_A and your sample dataset Want. This results in three observations where my suggested code finds a match, whereas your Want dataset has Match='N', based on the different values (see observations 5, 17 and 18).

ybz12003
Rhodochrosite | Level 12
Thank you for the suggestion; that is an awesome program.
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1553 views
  • 1 like
  • 3 in conversation