I have claims data that I would like to summarize the procedures within a visit (claim num) and the diagnoses for the procedure combinations within a visit.
Example of data I have:
Patient ID | Claim num | Procedure | Primary diagnosis |
123456 | 1111 | COVID RNA test | cough |
123456 | 1111 | COVID antibody test | cough |
123456 | 5555 | COVID antibody test | cough |
777777 | 4567 | COVID RNA test | Sore throat |
777777 | 4567 | Chest X-ray | Cough |
888888 | 1212 | COVID antibody test | cough |
Data I want, as a dataset:
Procedure combination by claim # | Primary diagnosis cough | Primary diagnosis sore throat, cough |
COVID RNA test, COVID antibody test | 1 |
|
COVID antibody test | 2 |
|
COVID RNA test, chest X-ray |
| 1 |
data have;
infile datalines delimiter=',';
input patient_id $ claim_num $ procedure :$25. primarydiagnosis :$15.;
datalines;
123456, 1111, COVID RNA test, cough
123456, 1111, COVID antibody test, cough
123456, 5555, COVID antibody test, cough
777777, 4567, COVID RNA test, Sore throat
777777, 4567, Chest X-ray, cough
888888, 1212, COVID antibody test, cough
;
Please make sure that your example data set runs. This what my log looks like running your code:
553 data have; 554 infile datalines delimiter=','; 555 input patient_id $ claim_num $ procedure $ primary diagnosis $; 556 datalines; NOTE: Invalid data for primary in line 557 30-80. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-- 558 123456, 1111, COVID antibody test, cough NOTE: Invalid data errors for file CARDS occurred outside the printed range. NOTE: Increase available buffer lines with the INFILE n= option. patient_id=123456 claim_num=1111 procedure=COVID RN primary=. diagnosis=123456 _ERROR_=1 _N_=1 NOTE: Invalid data for primary in line 559 35-80. 560 777777, 4567, COVID RNA test, Sore throat NOTE: Invalid data errors for file CARDS occurred outside the printed range. NOTE: Increase available buffer lines with the INFILE n= option. patient_id=123456 claim_num=5555 procedure=COVID an primary=. diagnosis=777777 _ERROR_=1 _N_=2 NOTE: Invalid data for primary in line 561 27-80. NOTE: LOST CARD. 562 ; NOTE: Invalid data errors for file CARDS occurred outside the printed range. NOTE: Increase available buffer lines with the INFILE n= option. patient_id=777777 claim_num=4567 procedure=Chest X- primary=. diagnosis= _ERROR_=1 _N_=3 NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set WORK.HAVE has 2 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds 562 ;
Summarize and transpose:
data have; infile datalines delimiter=','; input patient_id $ claim_num $ procedure :$25. primarydiagnosis :$15.; datalines; 123456, 1111, COVID RNA test, cough 123456, 1111, COVID antibody test, cough 123456, 5555, COVID antibody test, cough 777777, 4567, COVID RNA test, Sore throat 777777, 4567, Chest X-ray, cough ; proc summary data=have nway; class procedure primarydiagnosis; output out=counts; run; proc transpose data=counts out=want (drop=_name_) prefix=dx; by procedure; var _freq_; id primarydiagnosis; run;
Replace DX with something else but remember SAS variable names are limited to 32 characters and including fixed text in a name like "primary diagnosis" uses more than half that limit. Default rules also so not allow spaces in the variable names.
If the only thing you are going to do with that data set is to print it then Proc Report or Tabulate are likely better choices to make a report.
I'm trying to combine data across observations to see the total count of all the tests combined and diagnoses combined that occurred at a visit. A visit can have multiple observations. All the observations for a visit will have the same claim number. When I used the code you provided there was just one procedure per row and one diagnosis per column.
Data I want:
Procedure combination by claim # | Primary diagnosis cough | Primary diagnosis sore throat, cough |
COVID RNA test, COVID antibody test | 1 |
|
COVID antibody test | 1 |
|
COVID RNA test, chest X-ray |
| 1 |
@jk2018 wrote:
I'm trying to combine data across observations to see the total count of all the tests combined and diagnoses combined that occurred at a visit. A visit can have multiple observations. All the observations for a visit will have the same claim number. When I used the code you provided there was just one procedure per row and one diagnosis per column.
Data I want:
Procedure combination by claim #
Primary diagnosis cough
Primary diagnosis sore throat, cough
COVID RNA test, COVID antibody test
1
COVID antibody test
1
COVID RNA test, chest X-ray
1
So, where is the VISIT information in that? Don't see it any where.
Summarize with the VISIT but since I don't see visit in here anywhere its pretty questionable as to what you want.
If that "combination by claim #" is to be the visit then this is NOT a good candidate for a data set as you are moving values, Claim, into the same column as procedure and the result would be horrid to use for any real purpose.
"to see" really sounds like a report.
proc tabulate data=have; class claim_num procedure primarydiagnosis; table claim_num='Claim Number=', procedure, primarydiagnosis ; run;
data have;
infile datalines delimiter=',';
input patient_id $ claim_num $ procedure :$25. primarydiagnosis :$15.;
datalines;
123456, 1111, COVID RNA test, cough
123456, 1111, COVID antibody test, cough
123456, 5555, COVID antibody test, cough
777777, 4567, COVID RNA test, Sore throat
777777, 4567, Chest X-ray, cough
888888, 1212, COVID antibody test, cough
;
data temp;
do until(last.claim_num);
set have;
by patient_id claim_num;
length a b $ 200 ;
if not findw(a,strip(procedure),', ') then a=catx(',',a,procedure);
if not findw(b,strip(primarydiagnosis),', ') then b=catx(',',b,primarydiagnosis);
end;
keep a b;
run;
proc freq data=temp;
table a*b /nocol norow nopercent ;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.