How to create 10 by 10 matrix to map out the overlapped flag variables?
I'd like to assess the overlapping pattern 10 different flag variables. Here is some background why I'm here. I have 17,811 rows of unique 'med_codes' column (see excel data attached if you have time) which is a mixture of literally all categories of medical procedure, diagnosis, billing and administrative codes. My goal is to define each code row. I'm kinda getting there since I have only 466 codes undefined. How did I do that? I have flagged if complete and source list of i.e., ICD10, ICD9, CPT and HCPCS et.c linked to my 'med_codes' column. If unknown code is linked to ICD-9cm then flag_icd9cm=1; it takes 0. As a result, I have 10 different flags below for each code categories. Unfortunately, they overlapped. Data=check has 466 rows even though sum of all linked variables reached 27449 exceeding total row number 17,811.
So now, my question is: How to map out the pattern how flag variables are crossed altogether? Had I known how many different flag variables are overlapped to what extent then I would look at them and weed them out using certain rules that makes sense to the nature of my project.
Thanks zillionz in advance. Any brainstorming or direct coding hints are appreciated.
flag_icd10pc | flag_icd10cm | flag_hcpc | flag_icd9_pc | flag_cpt | flag_icd9cm | flag_cpt_del | flag_hipps |
flag_abc |
flag_bill | flag_dent | flag_D |
data check; set cruise;
linked=0;
if flag_cpt=1 then linked=1;
if flag_hcpc=1 then linked=1;
if flag_icd9_pc=1 then linked=1;
if flag_icd10pc=1 then linked=1;
if flag_icd9cm=1 then linked=1;
if flag_icd10cm=1 then linked=1;
if flag_cpt_del=1 then linked=1;
if flag_hipps=1 then linked=1;
if flag_other=1 then linked=1;
if flag_abc=1 then linked=1;
if flag_bill=1 then linked=1;
if flag_dent=1 then linked=1;
if flag_D=1 then linked=1;
if linked=0;
run;
/*have time to have a close
look at my data?, plz use .xlsx data
attached*/
proc import
datafile="cruise.xlsx"
out=cruise
dbms=xlsx
replace;
getnames=yes;
run;
Not too sure I understand the output you are after (where is your example?) but his may help.
data HAVE;
do I= 1 to 100;
FLAG_ICD10PC =round(ranuni(0)*.56);
FLAG_ICD10CM =round(ranuni(0)*.56);
FLAG_HCPC =round(ranuni(0)*.56);
FLAG_ICD9_PC =round(ranuni(0)*.56);
FLAG_CPT =round(ranuni(0)*.56);
FLAG_ICD9CM =round(ranuni(0)*.56);
FLAG_CPT_DEL =round(ranuni(0)*.56);
FLAG_HIPPS =round(ranuni(0)*.56);
FLAG_ABC =round(ranuni(0)*.56);
FLAG_BILL =round(ranuni(0)*.56);
output;
end;
run;
data REPORT;
set HAVE end=LASTOBS;
array F [10] FLAG:;
array X [10,10] 8;
do I= 1 to 10;
do J=2 to 10;
if I = J then continue;
if F[I] & F[J] then do;
X[I,J]+1;
X[J,I]+1;
end;
end;
end;
if LASTOBS then do;
do I= 1 to 10;
do J=1 to 10;
if I = J then continue;
VAL=coalesce(X[I,J],0);
output;
end;
end;
end;
keep I J VAL;
run;
proc format ;
value varnames 1= 'FLAG_ICD10PC'
2= 'FLAG_ICD10CM'
3= 'FLAG_HCPC '
4= 'FLAG_ICD9_PC'
5= 'FLAG_CPT '
6= 'FLAG_ICD9CM '
7= 'FLAG_CPT_DEL'
8= 'FLAG_HIPPS '
9= 'FLAG_ABC '
10= 'FLAG_BILL ';
run;
proc tabulate data=REPORT;
class I J;
var VAL;
tables I=' ',J=' '*VAL=' ' *sum=' '*f=5.0 ;
format I J varnames.;
run;
FLAG_ICD10PC | FLAG_ICD10CM | FLAG_HCPC | FLAG_ICD9_PC | FLAG_CPT | FLAG_ICD9CM | FLAG_CPT_DEL | FLAG_HIPPS | FLAG_ABC | FLAG_BILL | |
---|---|---|---|---|---|---|---|---|---|---|
FLAG_ICD10PC | . | 1 | 1 | 1 | 2 | 3 | 2 | 1 | 2 | 1 |
FLAG_ICD10CM | 1 | . | 6 | 2 | 2 | 0 | 2 | 8 | 8 | 4 |
FLAG_HCPC | 1 | 6 | . | 0 | 4 | 0 | 2 | 4 | 6 | 6 |
FLAG_ICD9_PC | 1 | 2 | 0 | . | 2 | 4 | 0 | 0 | 0 | 0 |
FLAG_CPT | 2 | 2 | 4 | 2 | . | 4 | 2 | 4 | 6 | 6 |
FLAG_ICD9CM | 3 | 0 | 0 | 4 | 4 | . | 2 | 2 | 0 | 2 |
FLAG_CPT_DEL | 2 | 2 | 2 | 0 | 2 | 2 | . | 2 | 8 | 0 |
FLAG_HIPPS | 1 | 8 | 4 | 0 | 4 | 2 | 2 | . | 6 | 2 |
FLAG_ABC | 2 | 8 | 6 | 0 | 6 | 0 | 8 | 6 | . | 8 |
FLAG_BILL | 1 | 4 | 6 | 0 | 6 | 2 | 0 | 2 | 8 | . |
Not too sure I understand the output you are after (where is your example?) but his may help.
data HAVE;
do I= 1 to 100;
FLAG_ICD10PC =round(ranuni(0)*.56);
FLAG_ICD10CM =round(ranuni(0)*.56);
FLAG_HCPC =round(ranuni(0)*.56);
FLAG_ICD9_PC =round(ranuni(0)*.56);
FLAG_CPT =round(ranuni(0)*.56);
FLAG_ICD9CM =round(ranuni(0)*.56);
FLAG_CPT_DEL =round(ranuni(0)*.56);
FLAG_HIPPS =round(ranuni(0)*.56);
FLAG_ABC =round(ranuni(0)*.56);
FLAG_BILL =round(ranuni(0)*.56);
output;
end;
run;
data REPORT;
set HAVE end=LASTOBS;
array F [10] FLAG:;
array X [10,10] 8;
do I= 1 to 10;
do J=2 to 10;
if I = J then continue;
if F[I] & F[J] then do;
X[I,J]+1;
X[J,I]+1;
end;
end;
end;
if LASTOBS then do;
do I= 1 to 10;
do J=1 to 10;
if I = J then continue;
VAL=coalesce(X[I,J],0);
output;
end;
end;
end;
keep I J VAL;
run;
proc format ;
value varnames 1= 'FLAG_ICD10PC'
2= 'FLAG_ICD10CM'
3= 'FLAG_HCPC '
4= 'FLAG_ICD9_PC'
5= 'FLAG_CPT '
6= 'FLAG_ICD9CM '
7= 'FLAG_CPT_DEL'
8= 'FLAG_HIPPS '
9= 'FLAG_ABC '
10= 'FLAG_BILL ';
run;
proc tabulate data=REPORT;
class I J;
var VAL;
tables I=' ',J=' '*VAL=' ' *sum=' '*f=5.0 ;
format I J varnames.;
run;
FLAG_ICD10PC | FLAG_ICD10CM | FLAG_HCPC | FLAG_ICD9_PC | FLAG_CPT | FLAG_ICD9CM | FLAG_CPT_DEL | FLAG_HIPPS | FLAG_ABC | FLAG_BILL | |
---|---|---|---|---|---|---|---|---|---|---|
FLAG_ICD10PC | . | 1 | 1 | 1 | 2 | 3 | 2 | 1 | 2 | 1 |
FLAG_ICD10CM | 1 | . | 6 | 2 | 2 | 0 | 2 | 8 | 8 | 4 |
FLAG_HCPC | 1 | 6 | . | 0 | 4 | 0 | 2 | 4 | 6 | 6 |
FLAG_ICD9_PC | 1 | 2 | 0 | . | 2 | 4 | 0 | 0 | 0 | 0 |
FLAG_CPT | 2 | 2 | 4 | 2 | . | 4 | 2 | 4 | 6 | 6 |
FLAG_ICD9CM | 3 | 0 | 0 | 4 | 4 | . | 2 | 2 | 0 | 2 |
FLAG_CPT_DEL | 2 | 2 | 2 | 0 | 2 | 2 | . | 2 | 8 | 0 |
FLAG_HIPPS | 1 | 8 | 4 | 0 | 4 | 2 | 2 | . | 6 | 2 |
FLAG_ABC | 2 | 8 | 6 | 0 | 6 | 0 | 8 | 6 | . | 8 |
FLAG_BILL | 1 | 4 | 6 | 0 | 6 | 2 | 0 | 2 | 8 | . |
Hi Chris,
Great approach. I applied your code to my actual data. resulted numbers in the matrix are doubled the amount that would be expected expected simple proc freq 2 by 2 and misplaced across the cells. Any idea how it happened?
Thanks zillions!!!
data cruise1;
set cruise end=LASTOBS;
array F [12] FLAG:;
array X [12,12] 8;
do I= 1 to 12;
do J=2 to 12;
if I = J then continue;
if F[I] & F[J] then do;
X[I,J]+1;
X[J,I]+1;
end;
end;
end;
if LASTOBS then do;
do I= 1 to 12;
do J=1 to 12;
if I = J then continue;
VAL=coalesce(X[I,J],0);
output;
end;
end;
end;
keep I J VAL;
run;
proc format ;
value varnames 1= 'FLAG_ICD10PC'
2= 'FLAG_ICD10CM'
3= 'FLAG_HCPC'
4= 'FLAG_ICD9_PC'
5= 'FLAG_CPT'
6= 'FLAG_ICD9CM'
7= 'FLAG_CPT_DEL'
8= 'FLAG_HIPPS'
9= 'FLAG_ABC'
10= 'FLAG_BILL'
11= 'FLAG_D'
12= 'FLAG_dent' ;
run;
proc tabulate data=cruise1;
class I J;
var VAL;
tables I=' ',J=' '*VAL=' ' *sum=' '*f=5.0 ;
format I J varnames.;
run;
proc freq data=cruise;
tables flag_hcpc*flag_cpt/norow nocol nopercent nocum;
run; /*6433*/
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.