BookmarkSubscribeRSS Feed
jk2018
Calcite | Level 5

I have claims data that I would like to summarize the procedures within a visit (claim num) and the diagnoses for the procedure combinations within a visit.  

 

Example of data I have:

Patient ID

Claim num

Procedure

Primary diagnosis

123456

1111

COVID RNA test

cough

123456

1111

COVID antibody test

cough

123456

5555

COVID antibody test

cough

777777

4567

COVID RNA test

Sore throat

777777

4567

Chest X-ray

Cough

888888

1212

COVID antibody test

cough

 

Data I want, as a dataset:

Procedure combination by claim #

Primary diagnosis cough

Primary diagnosis sore throat, cough

COVID RNA test, COVID antibody test

1

 

COVID antibody test

2

 

COVID RNA test, chest X-ray

 

1

 

data have;
infile datalines delimiter=',';

input patient_id $ claim_num $ procedure :$25. primarydiagnosis :$15.;
datalines;
123456, 1111, COVID RNA test, cough
123456, 1111, COVID antibody test, cough
123456, 5555, COVID antibody test, cough
777777, 4567, COVID RNA test, Sore throat
777777, 4567, Chest X-ray, cough 

888888, 1212, COVID antibody test, cough

;

 

4 REPLIES 4
ballardw
Super User

Please make sure that your example data set runs. This what my log looks like running your code:

553  data have;
554  infile datalines delimiter=',';
555  input patient_id $ claim_num $ procedure $ primary diagnosis $;
556  datalines;

NOTE: Invalid data for primary in line 557 30-80.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
558        123456, 1111, COVID antibody test, cough
NOTE: Invalid data errors for file CARDS occurred outside the printed range.
NOTE: Increase available buffer lines with the INFILE n= option.
patient_id=123456 claim_num=1111 procedure=COVID RN primary=. diagnosis=123456 _ERROR_=1 _N_=1
NOTE: Invalid data for primary in line 559 35-80.
560        777777, 4567, COVID RNA test, Sore throat
NOTE: Invalid data errors for file CARDS occurred outside the printed range.
NOTE: Increase available buffer lines with the INFILE n= option.
patient_id=123456 claim_num=5555 procedure=COVID an primary=. diagnosis=777777 _ERROR_=1 _N_=2
NOTE: Invalid data for primary in line 561 27-80.
NOTE: LOST CARD.
562        ;
NOTE: Invalid data errors for file CARDS occurred outside the printed range.
NOTE: Increase available buffer lines with the INFILE n= option.
patient_id=777777 claim_num=4567 procedure=Chest X- primary=. diagnosis=  _ERROR_=1 _N_=3
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.HAVE has 2 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


562  ;

Summarize and transpose:

data have;
infile datalines delimiter=',';
input patient_id $ claim_num $ procedure :$25. primarydiagnosis :$15.;
datalines;
123456, 1111, COVID RNA test, cough
123456, 1111, COVID antibody test, cough
123456, 5555, COVID antibody test, cough
777777, 4567, COVID RNA test, Sore throat
777777, 4567, Chest X-ray, cough 
;

proc summary data=have nway;
   class procedure   primarydiagnosis;
   output out=counts;
run;

proc transpose data=counts out=want (drop=_name_)
   prefix=dx;
   by procedure;
   var _freq_;
   id primarydiagnosis;
run; 

Replace DX with something else but remember SAS variable names are limited to 32 characters and including fixed text in a name like "primary diagnosis" uses more than half that limit. Default rules also so not allow spaces in the variable names.

 

If the only thing you are going to do with that data set is to print it then Proc Report or Tabulate are likely better choices to make a report.

jk2018
Calcite | Level 5

I'm trying to combine data across observations to see the total count of all the tests combined and diagnoses combined that occurred at a visit.  A visit can have multiple observations.  All the observations for a visit will have the same claim number.  When I used the code you provided there was just one procedure per row and one diagnosis per column.  

 

Data I want:

Procedure combination by claim #

Primary diagnosis cough

Primary diagnosis sore throat, cough

COVID RNA test, COVID antibody test

1

 

COVID antibody test

1

 

COVID RNA test, chest X-ray

 

1

 

ballardw
Super User

@jk2018 wrote:

I'm trying to combine data across observations to see the total count of all the tests combined and diagnoses combined that occurred at a visit.  A visit can have multiple observations.  All the observations for a visit will have the same claim number.  When I used the code you provided there was just one procedure per row and one diagnosis per column.  

 

Data I want:

Procedure combination by claim #

Primary diagnosis cough

Primary diagnosis sore throat, cough

COVID RNA test, COVID antibody test

1

 

COVID antibody test

1

 

COVID RNA test, chest X-ray

 

1

 


So, where is the VISIT information in that? Don't see it any where.

Summarize with the VISIT but since I don't see visit in here anywhere its pretty questionable as to what you want.

If that "combination by claim #" is to be the visit then this is NOT a good candidate for a data set as you are moving values, Claim, into the same column as procedure and the result would be horrid to use for any real purpose.

 "to see" really sounds like a report.

proc tabulate data=have;
   class claim_num procedure primarydiagnosis;
   table claim_num='Claim Number=',
         procedure,
         primarydiagnosis
         ;
run;

 

 

 

Ksharp
Super User
data have;
infile datalines delimiter=',';
input patient_id $ claim_num $ procedure :$25. primarydiagnosis :$15.;
datalines;
123456, 1111, COVID RNA test, cough
123456, 1111, COVID antibody test, cough
123456, 5555, COVID antibody test, cough
777777, 4567, COVID RNA test, Sore throat
777777, 4567, Chest X-ray, cough 
888888, 1212, COVID antibody test, cough
;

data temp;
do until(last.claim_num);
 set have;
 by patient_id  claim_num;
 length a b $ 200 ;
 if not findw(a,strip(procedure),', ') then a=catx(',',a,procedure);
 if not findw(b,strip(primarydiagnosis),', ') then b=catx(',',b,primarydiagnosis);
end;
keep a b;
run;

proc freq data=temp;
table a*b /nocol norow nopercent ;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1177 views
  • 1 like
  • 3 in conversation