SAS FORMAT Creation from SAS Dataset

Accepted Solution Solved
Reply
Contributor
Posts: 24
Accepted Solution

SAS FORMAT Creation from SAS Dataset

Hi,

I am having a problem with performance where in I have to subset a bunch of Medical Claims against a standard list of Medical Codes and assign the description of each Medical Code.

I have a lookup dataset that has this information, i.e., code and medical description. The problem is that some Codes have the same descriptions. The Client will have a UI using which they will select the description. This will give them the total number of patients for that Medical Condition. Since same code has multiple descriptions, the patient will be counted in both cases and this is fine by the Client.

Example:

Code     Description

01          Flu

02          Cold

03          Cough

01          Pain

Coming to the way I have been processing, the medical claim has 9 columns for Codes (Diag1-Diag9) and I am using an ARRAY to transpose these 9 columns into 1 Code column. This in effect will increase the number of observations in the SAS dataset (currently, we process 1 year claims data and after transposing the claims we have about 2 billion observations). That is each claim is broken down into 9 records (assuming we have all diag codes).

Data Temp;

     Set Med_Claim (Keep=Patid Diag-Diag9);

     Array _Daig{*} Diag1-Diag9;

     Do I = 1 To 9 While (_Diag{I} Ne '');

          Code = _Diag{I};

          Output;

     End;

Run;

I then use a PROC SQL join to keep all the descriptions from the lookup as below:

Proc SQL:

     Create Temp2 As

     Select DISTINCT

               B.Description,

               A.Code,

               A.PatId

     From Temp A, Lookup B

     Where A.Code = B.Code;

Quit;

The above step is taking a lot of time (about 10 hours). To address this, I thought of using a HASH Join. This approach failed as the HASH Dataset was not unique on CODE.

And hence I started working towards creating a FORMAT using the lookup dataset as follows. The Control dataset gets created fine, but the FORMAT step fails. Please can anyone help me with a work around for this!! Your help is much appreciated.

Data CTRL;

     Length Label $60;

     Set Lookup(Rename=(Code=Start Description=Label)) End=Eof;

     Retain FmtName 'MedCond' Type 'C';

     End = Start;

     Output;

     If Eof Then Do;

          HLO = 'O';

          Label = '';

          Output;

     End;

Run;

Proc Format Lib = Work CntLin = Ctrl;

Run;

 

ERROR: This range is repeated, or values overlap: 01-00100.


Accepted Solutions
Solution
‎11-15-2013 07:29 AM
Super Contributor
Posts: 339

Re: SAS FORMAT Creation from SAS Dataset

multidata and hash loop example as requested.

data lookup;
     input id desc $;
     datalines;
1 desc1.1
1 desc1.2
2 desc2.1
2 desc2.2
2 desc2.3
;
run;

data have;
     input patid code;
     datalines;
101 1
102 1
103 2
104 1
105 2
105 1
106 2
;
run;


data want;
     length id 8. desc $7.;
     if _n_=1 then do;
     declare hash lookup(dataset: "work.lookup", multidata: "YES");
     lookup.definekey("id");
     lookup.definedata("desc");
     lookup.definedone();
    end;
     set have;
     rc=lookup.find(key: code);
     if rc=0 then do;
          output;
          do while(lookup.find_next()=0);
               output;
          end;
     end;
  drop rc id;
run;

proc freq data=want;
table desc;
run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,128

Re: SAS FORMAT Creation from SAS Dataset

Hi bhavani,

could we not have unique codes for each description? Also if we are getting codes from the temp dataset, how can we confirm that the 01 code is for flu and not pain or vice versa.

these answers will help us to help you.

Thanks,

Jagadish

Thanks,
Jag
Contributor
Posts: 24

Re: SAS FORMAT Creation from SAS Dataset

That's the problem, we cannot have different codes for each descriptions in the real world (atleast not in Health Care). Also the TEMP is a dataset that I create from the Medical Claims and each claim will have multiple codes(depending on what the patients health). So if a patient has both PAIN and FLU, the claims will have two 01's attached to it. Ad that is why the client want to see this patient in both the reports.

Regular Contributor
Posts: 195

Re: SAS FORMAT Creation from SAS Dataset

Hi,Try the following code...I assume that your START and END values are equal...

data temp;

input code;

cards4;

01

02

03

01

;;;;

data lookup;

   input start label $;

   end = start;

   cards4;

01 Flu

02 Cold

03 Cough

01 Pain

;;;;

%macro fmt;

proc sql noprint;

   select count(*) into :count

   from lookup;

   select start into :list_code separated by "?"

   from lookup;

   select compress(quote(label)) into :list_desc separated by "?"

   from lookup;

quit;

proc format lib = work;

   value codes_fmt (multilabel)

   %do i = 1 %to %eval(&count.-1);

       %scan(&list_code.,&i.,"?") = %scan(&list_desc.,&i.,"?")

   %end;

       %scan(&list_code.,&count.,"?") = %scan(&list_desc.,&count.,"?");

quit;

data temp;

  set temp;

  description = code;

run;

proc means data = temp noprint;

   format description codes_fmt.;

   class code description / mlf;

   output out = want(keep = code description _TYPE_ where = (_TYPE_ = 3));

run;

     

%mend fmt;

options mlogic mprint;

%fmt;

Hope it meets your requirement...

-Urvish

Contributor
Posts: 24

Re: SAS FORMAT Creation from SAS Dataset

Thanks for the solution Urvish. Unfortunately, this doesn't work for my situation.

The reason being that, I have mocked up the data in this post to give a snap shot of the actual scenario.

In reality, the Code and Descriptions will be $10 and $60 repectively. Additionally, I will have about 91000 distinct entries for Code and Description.

As a result, the macro variables list_code and list_desc will be longer that the SAS limitation of 65K and the code will fail.

Example: (real time)

Code      Description

4413       Abdominal Aortic Aneurysm

4413       Arterial / Vascular Disease

Appreacite you help on this.

Respected Advisor
Posts: 3,777

Re: SAS FORMAT Creation from SAS Dataset

To create a multilabel format you need to tell SAS that is your intention.  If I am unsure as to the exact options needed for a control data set I create a similar format and a CNTLOUT data set.  Then as in this case I can see that HLO needs to include the code M.

proc format cntlout=cntl;
   value $dis(multilabel notsorted)
     
'01'='Flu'
     
'02'='Cold'
     
'03'='Cough'
     
'01'='Pain'
      other=
'???'
      ;
   select $dis;
   run;
proc print;
  
run;

data control;
   infile cards eof=eof;
   input start $ label $;
   retain fmtname 'diss' type 'C' hlo 'SM   ';
  
return;
eof:
  
call missing(start,label);
   hlo = cats(hlo,'O');
   return;
  
cards;
01          Flu
02          Cold
03          Cough
01          Pain
;;;;
   run;
proc print;
  
run;
proc format cntlin=control;
   select $diss;
   run;

Contributor
Posts: 24

Re: SAS FORMAT Creation from SAS Dataset

Thanks for this innovative approach data _null_;

The proc format step can be formulated if the codes are only a handful (say 500), in my case I have about 54000 codes and this approach will not work for me.

Is there a way where in the PROC FORMAT can read my lookup???

Respected Advisor
Posts: 3,777

Re: SAS FORMAT Creation from SAS Dataset

My example has two parts

  1. how to figure out what value of HLO you need to create a multilabel format.  PROC FORMAT with VALUE statement and CNTLOUT
    1. but I dont't think you need a multilabel format.
  2. how to create a control data set to create a multilabel format.

Show example of the "code data" and I can show you how to write a data step to create a CONTROL data set.

Contributor
Posts: 24

Re: SAS FORMAT Creation from SAS Dataset

I did not follow what you said. Are you asking me to furnish some data from the Lookup and the Claims to correlate and come up with a solution.

If thats the case then here is the sample data.

LOOKUP:

140Head and Neck Cancer
140Mouth or Throat Cancer
140Neoplasms
140Oral Cancer
140Solid Tumors
1530Colon Cancer
1530Colorectal Cancer
1530Neoplasms
1530Oropharyngeal Cancer
1530Solid Tumors
1740Anesthesia
1740Otitis Media
1740Tuberculosis
14002Head and Neck Cancer
14002Mouth or Throat Cancer
14002Neoplasms
14002Oral Cancer
14002Solid Tumors
15308Colon Cancer
15308Colorectal Cancer
15308Neoplasms
15308Oropharyngeal Cancer
15308Solid Tumors

CLAIMS DATA:

CODEYEARPATID
0174020121
0174020121
0174020122
0174020122
0174020123
0174020123
0174020124
0174020125
0174020125
0174020125
0174020125
0174020126
153020127
153020127
153020127
153020127
153020127
153020127
153020127
153020127
153020127
153020127
153020127
153020127
153020127
153020127
14020128
14020128
140320128
1400220128
1500820128
Respected Advisor
Posts: 3,777

Re: SAS FORMAT Creation from SAS Dataset

Should the lookup be unique?

Contributor
Posts: 24

Re: SAS FORMAT Creation from SAS Dataset

The lookup will not be unique. By now you would have noticed that same DESCRIPTION is applicable to more than 1 CODE. And thats the problem...

Respected Advisor
Posts: 3,777

Re: SAS FORMAT Creation from SAS Dataset

You need to describe what you want to do about that.

Frequent Contributor
Posts: 114

Re: SAS FORMAT Creation from SAS Dataset

Hello bhavani,

I just stumbled upon this thread and I have to agree with data_null_. You should clarify how the data should be handled.

In the above example patient 8 does have 2 entries for 2012 with code 140.

Now what does that mean? Did he have 'Head and Neck Cancer' and 'Neoplasms'?

Or did he have Head and Neck Cancer, Mouth or Throat Cancer, Neoplasms, Oral Cancer and Solid Tumors?

Or in other words: How should the two observations be related to the lookup data? Should we assume, that we can match the observations to the first lookups we find? I think this would corrupt the data and its meaning, as well as mathing all lookups to every observation we find.

I would consider telling the client that the 'sh*t in, sh*t out' principle still applies to data and reporting ;-) (sorry, for rude language, no offense intended)

Cheers Michael

Contributor
Posts: 24

Re: SAS FORMAT Creation from SAS Dataset

The goal is to generate reports for each of the medical conditions which is a count total patients. The LOOKUP table was created from a flat file that the client sends us every year. Now this file will have the same layout as I had furnished in the earlier post. In a practical situation, a patient might have multiple diseases (diagnostic conditions). Having said that, I want PATIENT 8 be tagged with all the medical conditions for code 140 (all 5 them should be tied to patient 8).

I have been using as PROC SQL step (refer my first post), that distinctly gets all the descriptions of each code. I am furnishing the step here below:

Proc SQL:

    Create Temp2 As

    Select DISTINCT

              B.Description,

              A.Code,

              A.PatId

    From Claims A, Lookup B

    Where A.Code = B.Code;

Quit;


The above step will give me the following result for patient 8 (please see the attachment).

The data in the lookup table is actually medical conditions hierarchy. The 3 digit codes (like 140) are the highest level. Here 140 represents CANCER. The next level is a 4 digit code like 1400, 1401, 1402... 1499. This represents a sub group of CANCER (like Head, Bone Marrow etc). The last/least level of hierarchy is the 5 digit code, I.. 14000, 14001,14002.... 14010, 14011 .... 149999. The lowest level is the actual CANCER that was diagnosed when this patient visited a Physician. So in effect, this patient 8 has CANCERs of all kinds (may be it started with one part of the body and eventually spread to other parts as well). The UNFORTUNATE thing is that while the claims are prepared, some codes are missed out(due to poor handwriting, typos etc) and hence are tied to the highest level of hierarchy which is 140 in this case.

The above SQL join takes more than 10 Hrs to run and I wanted to tune the code for better performance. One of the best methods suggested in a lot of SUGI papers was the FORMAT merge. I tried to modify the join using HASH objects, but it failed citing unique codes issue. I was sure that SAS Merge would also fail due to REPEAT BY GROUP on the Key variable. The only other option I saw fit was the format merge.

I came up with the following approach to create the MULTILABEL format; but I did not know how to use it in a DATA STEP to assign the multiple descriptions.

Filename Tmp1 Temp;

Data _Null_;

     Set Lookup End=Last;

     File Tm1 New Lrecl=1000;

   

     String1 = Compress('"'||Code||'"');

     String2 = '"'||Trim(Description)||'"';

     If _N_ = 1 Then Do;

          Put 'Value $Desc (Multilabel Notsorted)' ;

     End;

     Put String1 ' = ' String2;

   

     If Last Then Put ';' ;

Run;

Proc Format CntlOut=Cntl;

     %Include Tmp1;

     Select $Desc;

Run;

Proc Print Data=Cntl; Run;

This worked fine, created the DESC Multilable format and the CNTL dataset. Now I have to use this format in a DATA STEP to assign the descriptions.

If there is any other way/approach other that PROC FORMAT, please suggest.

I hope the above explanation helps understand what is needed.

Super User
Posts: 17,745

Re: SAS FORMAT Creation from SAS Dataset

So does 140 mean a multiple diagnosis of all types? This type of reporting doesn't make clinical sense to me.

It looks like a bad merge or conversion from one coding type to another.

EG. 1 patient has, for a single diagnosis of throat cancer and would be counted how many times?

140Head and Neck Cancer
140Mouth or Throat Cancer
140Neoplasms
140Oral Cancer
140Solid Tumors
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 24 replies
  • 750 views
  • 6 likes
  • 8 in conversation