BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ybz12003
Rhodochrosite | Level 12

Hello,

I have sample data, A, which contains Result and Result_Name columns. I have another sample data, B, which contains Result_Name and Result_Code. Data B acts as a result coding guide. I want to use Data B as a reference to check whether Data A has any entries that don't match the Data B coding guide. The result I'm looking for is shown as "Want." One particular condition is that if the Result_Name contains "Non-coded ID format" or "Non-coded text," the match result should be set to 'TBD.' Please let me know how to approach this.

 

data Sample_A;
      infile datalines dsd;
  input LabID : $15. Test_Code : $15.  Result : $25. Result_Name : $25.;
datalines;
21156, 100157-7_100157-7, BA.2.86, PCR_result, 
21156, 100157-7_100157-7, XBB.1.16.3, PCR_result, 
21156, 100157-7_100157-7, FL.4, PCR_result, 
21156, 100157-7_100157-7, EG.5.1.3, PCR_result,
21156, 100157-6_100157-8, 10828004, Conclusion_result,
21156, 100157-6_100157-8, 260415000, Conclusion_result,
21156, 100157-6_100157-8, 260373001, Conclusion_result,
21156, 100157-6_100157-8, 10000004, Conclusion_result, 
21156, 100357-6_100157-8, 5656, Non-coded ID format, 
21156, 100357-6_100157-8, 7878, Non-coded ID format, 
21156, 104157-6_100157-8, LKJH, Non-coded text, 
21156, 104157-6_100157-8, PPGH, Non-coded text, 
24597, 100157-7_100157-7, XBB.1.5.70, PCR_result, 
24597, 100157-7_100157-7, FL.15, PCR_result, 
24597, 100157-7_100157-7, BA.2.86, PCR_result, 
24597, 100157-6_100157-8, 260373001, Conclusion_result, 
24597, 100157-6_100157-8, 10828005, Conclusion_result, 
24597, 100157-6_100157-8, 10987004, Conclusion_result, 
24597, 100357-6_100157-8, 3245, Non-coded ID format, 
24597, 104157-6_100157-8, OPUY, Non-coded text,  
;

data Sample_B;
      infile datalines dsd;
  input Result_Name : $25. Result_Code : $25. ;
datalines;
PCR_result, BA.2.86, 
PCR_result, BA.5.86,
PCR_result, FL.3, 
PCR_result, FL.8,
PCR_result, FL.10, 
PCR_result, XBB.1.16.3,
PCR_result, XBB.1.5.70,
PCR_result, XBB.3.66.3, 
PCR_result, XBB.8.15.88,
PCR_result, EG.5.1.3, 
PCR_result, JK.0.14.39, 
Conclusion_result, 10828003, 
Conclusion_result, 10828004,
Conclusion_result, 10828005,
Conclusion_result, 10828009,
Conclusion_result, 10987004, 
Conclusion_result, 260373001,
Conclusion_result, 260383001, 
Conclusion_result, 260215000,
Conclusion_result, 260415000, 
Non-coded ID format,Identifier,
Non-coded text, Text,
;

data Want;
      infile datalines dsd;
  input LabID : $15. Test_Code : $15.  Result : $25. Result_Name : $25. Match : $5.;
datalines;
21156, 100157-7_100157-7, BA.2.86, PCR_result, Y,
21156, 100157-7_100157-7, XBB.1.16.3, PCR_result, Y, 
21156, 100157-7_100157-7, FL.4, PCR_result, N, 
21156, 100157-7_100157-7, EG.5.1.3, PCR_result, Y,
21156, 100157-6_100157-8, 10828001, Conclusion_result, N,
21156, 100157-6_100157-8, 260415000, Conclusion_result, Y,
21156, 100157-6_100157-8, 260373001, Conclusion_result, Y,
21156, 100157-6_100157-8, 10000004, Conclusion_result, N,
21156, 100357-6_100157-8, 5656, Non-coded ID format, TBD,
21156, 100357-6_100157-8, 7878, Non-coded ID format, TBD,
21156, 104157-6_100157-8, LKJH, Non-coded text, TBD, 
21156, 104157-6_100157-8, PPGH, Non-coded text, TBD,
24597, 100157-7_100157-7, XBB.1.5.70, PCR_result, Y,
24597, 100157-7_100157-7, FL.15, PCR_result, N,
24597, 100157-7_100157-7, BA.2.86, PCR_result, Y,
24597, 100157-6_100157-8, 260373001, Conclusion_result, Y,
24597, 100157-6_100157-8, 11828005, Conclusion_result, N,
24597, 100157-6_100157-8, 10987064, Conclusion_result,  N,
24597, 100357-6_100157-8, 3245, Non-coded ID format, TBD,
24597, 104157-6_100157-8, OPUY, Non-coded text, TBD,
;
1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @ybz12003,

 

I would probably read the reference dataset B into a hash object, which can then be queried after checking for "Non-coded ..." cases:

data Want;
set Sample_A;
if _n_=1 then do;
  dcl hash h(dataset:'Sample_B(rename=(Result_Code=Result))');
  h.definekey('Result_Name', 'Result');
  h.definedone();
  if 0 then set Sample_B(rename=(Result_Code=Result));
end;
length Match $5;
if Result_Name in ("Non-coded ID format", "Non-coded text") then Match='TBD';
else Match=ifc(h.check()=0,'Y','N');
run;

You may want to modify the IF condition for the "Non-coded ..." cases if the criterion is not an exact match with those strings.

 

Note that length 15 is not sufficient to accommodate the values listed for variable Test_Code. More importantly, three values of variable Result differ between dataset Sample_A and your sample dataset Want. This results in three observations where my suggested code finds a match, whereas your Want dataset has Match='N', based on the different values (see observations 5, 17 and 18).

View solution in original post

3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hello @ybz12003,

 

I would probably read the reference dataset B into a hash object, which can then be queried after checking for "Non-coded ..." cases:

data Want;
set Sample_A;
if _n_=1 then do;
  dcl hash h(dataset:'Sample_B(rename=(Result_Code=Result))');
  h.definekey('Result_Name', 'Result');
  h.definedone();
  if 0 then set Sample_B(rename=(Result_Code=Result));
end;
length Match $5;
if Result_Name in ("Non-coded ID format", "Non-coded text") then Match='TBD';
else Match=ifc(h.check()=0,'Y','N');
run;

You may want to modify the IF condition for the "Non-coded ..." cases if the criterion is not an exact match with those strings.

 

Note that length 15 is not sufficient to accommodate the values listed for variable Test_Code. More importantly, three values of variable Result differ between dataset Sample_A and your sample dataset Want. This results in three observations where my suggested code finds a match, whereas your Want dataset has Match='N', based on the different values (see observations 5, 17 and 18).

ybz12003
Rhodochrosite | Level 12
Thank you for the suggestion; that is an awesome program.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 370 views
  • 1 like
  • 3 in conversation