BookmarkSubscribeRSS Feed
ronnie_h
Calcite | Level 5

I have a table called 'subjects' that has 86k observations - each observation represents a single individual - there is a variable called ICD_code e.g. 1244 which holds the icd code for their cause of death if they are dead or its left empty if the person is still alive.

 

I have another table called diseases which is  70 observations long - each observation represents a disease defined by a set of pairs of  icd_codes

 

e.g. 

'lung cancer' 3, 1237 7584 , 6854 7845, 4579 5879 ...

the first variable is the name of the disease grouping  - lung cancer

the next is a number which is an index that gives the number of pairs of icd_codes  that define the disease grouping.

Then i have a number of icd_code pairs each of which represent a range - each value is in a separate variable e.g. L1, U1, L2, U2, l3, u3 ...

 

I have created 70 new variables v1 - v70 in the subject table one for each disease grouping and set them all to zero.

 

 What i need to do now is for each subject who is dead and has an icd_code - i need to check their ICD_code to see which of the 70 groupings contains that code and then set the value of the appropriate varable v* to 1.

 

So in my example of lung cancer because the icd_code value 1244 IS in the first pair of codes (i.e. its GE 1237 and LE 7584)   I want to change the zero in the relevant V variable to a 1 to recognise this.

 

I can't work out how to do this! - i used to be a fortran programmer and in that language i coudl have done it - but i can;t seem to work out in SAS how to use data from two different tables at the same time - i can't join them together as they are not the same types of data.

 

Any suggestions greatly appreciated.

 

3 REPLIES 3
ballardw
Super User

There are going to be some details. In your Subjects data set what are the properties of the variable ICD_code, as in type, numeric or character, and if character the defined length.

This will be needed to correctly parse out details of your other dataset. If you can't provide actual example data then copy the text used to create the data and paste it into a text box opened on this forum with using the </> icon that appears above the main message windows.

 

Question: what are you really going to do with those 70 variables, especially when next week/month/report cycle someone adds another 1/3/17 diseases of interest?

Depending on what  your actual ICD_codes look like it may be easier to get the disease directly into one variable instead of managing so many indicator variables.

 

SAS has multiple different look up tools: Formats, Proc SQL (which is most definitely not FORTRAN in any form) Hash objects or even a function (though I wouldn't start down that path unless everything is very static).

 

FWIW, the last FORTRAN program I wrote is used SAS to prototype the code because the IO was cleaner and the logic involved a bunch of date manipulation that was easier to test the logic in SAS before translating to FORTRAN.

Tom
Super User Tom
Super User

Convert your second dataset into a FORMAT that converts ICDCODE into DISEASE_GROUPING.  Then use the format to classify your larger dataset.

 

The first thing you will want to do is convert your wide structure into a tall structure so each disease/ICD pairing is on a separate observation.  While you are doing that make a dataset that the PROC IMPORT can understand.

 

Let's assume that DISEASES looks like this with numeric ICD codes.

data diseases ;
  length disease $30 count l1-l5 u1-u5 8;
  array lower l1-l5;
  array upper u1-u5;
  infile cards dsd truncover;
  input disease count @;
  do count=1 to count;
    input lower[count] upper[count] @;
  end;
cards;
'lung cancer', 3, 1237 ,1245 , 6854 ,7845, 4579 ,5879 
;;;;

Then you would want to make a numeric format. So you could call it ICDGROUP perhaps?  So something like this:

data format;
  set diseases end=eof;
  fmtname='ICDGROUP';
  array lower l1-l5;
  array upper u1-u5;
  label=disease;
  hlo=' ';
  do count=1 to count;
    start=lower[count];
    end=upper[count];
    output;
  end;
  if eof then do;
    call missing(start,end);
    label='UNKNOWN';
    hlo='O';
    output;
  end;
run;
proc format cntlin=format;
run;

Now let's make a SUBJECTS file.

data subjects ;
  input subjid icd_code;
cards;
1 1238
2 .
3 6850
4 6900
5 4580
;

And use the ICDGROUP format to make a DISEASE variable.

data want;
 set subjects;
 if icd_code then disease=put(icd_code,icdgroup.);
run;

Result

Tom_0-1723817848865.png

 

 

 

ronnie_h
Calcite | Level 5

i think i can follow that - i'll give it a try next week.  

many thanks!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 140 views
  • 0 likes
  • 3 in conversation