I was asked to create a grid of the combinations of any two diagnoses by unique claim (the grid would be set up like those old multiplication tables from elementary school). However the diagnoses exist in separate variables for primary diagnosis and secondary diagnosis; each observation is a unique claim. I want to identify the combinations so that combination A/B would be counted as the same as B/A. A simple proc tabulate would did not work because A/B and B/A were counted separately. I came up with a round-about way of doing it, but I feel there must be a more straight-forward approach. Can you help me get there? My longer code is:
proc sql;
create table claimsub as
select claims2.*,
t.Diagnosis as Diagnosis1,
t2.Diagnosis as Diagnosis2
from claims2
inner join tops2 t on claims2.diag1=t.diag1
inner join tops2 t2 on claims2.diag2=t2.diag1
union
select claims2.*,
t2.Diagnosis as Diagnosis1,
t.Diagnosis as Diagnosis2
from claims2
inner join tops2 t on claims2.diag1=t.diag1
inner join tops2 t2 on claims2.diag2=t2.diag1;
quit;
proc sort data=claimsub;
by Diagnosis1 Diagnosis2;
run;
proc sort nodupkey data=claimsub out=claimsub2;
by claim;
run;
proc tabulate data=claimsub;
class Diagnosis2 Diagnosis1;
table Diagnosis2, Diagnosis1 * N = '';
run;
Thanks!
Not sure I understand the problem, but if you want to make a triangular result matrix from PROC FREQ then sort the values in the two variables.
data want ;
set have ;
if diag1 > diag2 then do; swap=diag1; diag1=diag2; diag2=swap; end;
drop swap;
run;
proc freq ;
tables diag1*diag2 ;
run;
I may be missing something because this does look like a basic proc freq.
Can you post an example of what your data looks like?
If I understand your data structure correctly (one row per claim, with two diagnostic variables), you can simply create two new variables and assign the "smallest" code to variable 1 and the "largest" code to variable 2.
Further, you could do all the processing in one PROC SQL step (no need to the proc sort or the proc tabulate) using GROUP BY and ORDER BY clauses.
Not sure I understand the problem, but if you want to make a triangular result matrix from PROC FREQ then sort the values in the two variables.
data want ;
set have ;
if diag1 > diag2 then do; swap=diag1; diag1=diag2; diag2=swap; end;
drop swap;
run;
proc freq ;
tables diag1*diag2 ;
run;
This answer works because my variables are numeric and solves my dilemma here. But taking the thought a step further, what should I do if my variables were character instead numeric (diag1desc/diag2desc instead of diag1/diag2)?
claim | diag1 | diag1desc | diag2 | diag2desc | Diagnosis1 | Diagnosis2 |
214991610 | 30981 | PROLONG POSTTRAUM STRESS | 30981 | PROLONG POSTTRAUM STRESS | 06 - 30981 - PROLONG POSTTRAUM STRESS | 06 - 30981 - PROLONG POSTTRAUM STRESS |
216531478 | 311 | DEPRESSIVE DISORDER NEC | 30000 | ANXIETY STATE NOS | 05 - 311 - DEPRESSIVE DISORDER NEC | 02 - 30000 - ANXIETY STATE NOS |
225098371 | 311 | DEPRESSIVE DISORDER NEC | 311 | DEPRESSIVE DISORDER NEC | 05 - 311 - DEPRESSIVE DISORDER NEC | 05 - 311 - DEPRESSIVE DISORDER NEC |
237469534 | 30000 | ANXIETY STATE NOS | 30981 | PROLONG POSTTRAUM STRESS | 02 - 30000 - ANXIETY STATE NOS | 06 - 30981 - PROLONG POSTTRAUM STRESS |
229156524 | 311 | DEPRESSIVE DISORDER NEC | 30981 | PROLONG POSTTRAUM STRESS | 05 - 311 - DEPRESSIVE DISORDER NEC | 06 - 30981 - PROLONG POSTTRAUM STRESS |
How would it make any difference if the DIAG variables were character?
Right again. Forgive me for misunderstanding the evaluation. Very helpful.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.