I have a dataset called impact with three columns that each contain a visit number:
visit_number_a
visit_number_b
visit_number_c
How do I see if any values (the count) of visit_number_a match visit_number_b?
And if any values of visit_number_a match to visit_number_b and to visit_number_c?
And if so, how do I generate a list of the values that match across the columns?
Would I use proc compare? But only include values where the difference equals zero?
The visit numbers are 11 digits long.
proc compare base =impact;
var visit_number_a;
with visit_number_b;
where visit_number_a - visit_number_b = 0;
run;
??????
Thank you.
Proc compare does a record by record comparison.
When you say : values (the count) of visit_number_a match visit_number_b?
it is hard without any example to tell if you need to summarize data to get counts, if the value of the variable is a count or something else. And with "visit numbers" 11 digits it sounds like your values are identifications. So not really clear here what is wanted.
I suggest providing some example data in the form of a data step and then show what the expected result for that would be.
If you are not sure how to write a data step it might look something like:
data have; input visit_number_a :$11. visit_number_b :$11. visit_number_c :$11; datalines; 12345123451 23456234562 34567345673 12345123451 33333333333 44444444444 ;
Provide enough values to demonstrate what want done and show the expected result for your given example data.
I used character values because I don't think you are doing arithmetic with the values.
data impact;
input patient_id :$5. visit_number_a :$11. visit_number_b :$11. visit_number_c :$11.;
datalines;
Z6049 24942400656 85960815714 85960815714
Z6052 49415147535 81413953798 81413953798
Z6053 66886694586 66886694586 66886694586
Z6054 76875635808 76875635808 78586120861
Z6056 86538343903 20060379083 20060379083
Z6095 18466992548 12088652966 12088652966
Z6100 32224422523 32224422523 32224422523
Z6105 94980188348 63002863712 63002863712
Z6107 14446393827 14446393827 72562481096
Z6117 70314446348 52278174988 52278174988
Z6123 59330419958 65729476921 65729476921
;
proc print data=impact;
run;
So if I were in microsoft Excel I'd have a table similar to this:
| PATIENT_ID | visit_number_a | visit_number_b | visit_number_c | IF_A_equals_B | IF_A_equals_B_equals_C | 
| Z6049 | 24942400656 | 85960815714 | 85960815714 | FALSE | FALSE | 
| Z6052 | 49415147535 | 81413953798 | 81413953798 | FALSE | FALSE | 
| Z6053 | 66886694586 | 66886694586 | 66886694586 | TRUE | TRUE | 
| Z6054 | 76875635808 | 76875635808 | 78586120861 | TRUE | FALSE | 
| Z6056 | 86538343903 | 20060379083 | 20060379083 | FALSE | FALSE | 
| Z6095 | 18466992548 | 12088652966 | 12088652966 | FALSE | FALSE | 
| Z6100 | 32224422523 | 32224422523 | 32224422523 | TRUE | TRUE | 
| Z6105 | 94980188348 | 63002863712 | 63002863712 | FALSE | FALSE | 
| Z6107 | 14446393827 | 14446393827 | 72562481096 | TRUE | FALSE | 
| Z6117 | 70314446348 | 52278174988 | 52278174988 | FALSE | FALSE | 
| Z6123 | 59330419958 | 65729476921 | 65729476921 | FALSE | FALSE | 
In SAS I want to find out the number of observations where a = b and where a = b = c
Number of observations where a = b : 4
Number of observations where a = b = c : 2
And generate a table of the observations where a = b:
| PATIENT_ID | visit_number_a | visit_number_b | visit_number_c | 
| Z6053 | 66886694586 | 66886694586 | 66886694586 | 
| Z6054 | 76875635808 | 76875635808 | 78586120861 | 
| Z6100 | 32224422523 | 32224422523 | 32224422523 | 
| Z6107 | 14446393827 | 14446393827 | 72562481096 | 
And generate a table of the observations where a = b = c:
| PATIENT_ID | visit_number_a | visit_number_b | visit_number_c | 
| Z6053 | 66886694586 | 66886694586 | 66886694586 | 
| Z6100 | 32224422523 | 32224422523 | 32224422523 | 
Thank you for a nice clean example.
This creates two data sets as your show:
data A_equal_B (drop=A_B A_B_C )
     A_equal_B_C  (drop=A_B A_B_C )
;
   set impact;
   /* add the true/false*/
   A_B   = (visit_number_a =visit_number_b);
   A_B_C = (visit_number_a =visit_number_b = visit_number_c);
   if A_B then output A_equal_B;
   if A_B_C then output A_equal_B_C;
run;
Key bits: SAS returns numeric 1/0 for True/False for comparisons. So placing a comparison such as (visit_number_a =visit_number_b) in the () returns a value of 1 when true or 0 when false. SAS will allow multiple comparisons (if you are careful) at one time, equal is easy.
The explicit Output statement writes to the data set when the value is true.
Not sure exactly which you meant by table, so created data sets.
Alternate you could use the following to create one data set with the additional variables and then select which ones for printing (or other uses, the Where statement can be used will most procedures)
data want; set impact; /* add the true/false*/ A_B = (visit_number_a =visit_number_b); A_B_C = (visit_number_a =visit_number_b = visit_number_c); run; proc print data=want noobs; where A_B; Title "A equals B"; var patient_id visit_number: ; run; title; proc print data=want noobs; where A_B_C; Title "A equals B equals C"; var patient_id visit_number: ; run; title;
The Where statement can use any expression that returns a "true/false" result. SAS will actually use most numbers except 0 as "true" so be careful that you know what you mean.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
