I have a table called 'subjects' that has 86k observations - each observation represents a single individual - there is a variable called ICD_code e.g. 1244 which holds the icd code for their cause of death if they are dead or its left empty if the person is still alive.
I have another table called diseases which is 70 observations long - each observation represents a disease defined by a set of pairs of icd_codes
e.g.
'lung cancer' 3, 1237 7584 , 6854 7845, 4579 5879 ...
the first variable is the name of the disease grouping - lung cancer
the next is a number which is an index that gives the number of pairs of icd_codes that define the disease grouping.
Then i have a number of icd_code pairs each of which represent a range - each value is in a separate variable e.g. L1, U1, L2, U2, l3, u3 ...
I have created 70 new variables v1 - v70 in the subject table one for each disease grouping and set them all to zero.
What i need to do now is for each subject who is dead and has an icd_code - i need to check their ICD_code to see which of the 70 groupings contains that code and then set the value of the appropriate varable v* to 1.
So in my example of lung cancer because the icd_code value 1244 IS in the first pair of codes (i.e. its GE 1237 and LE 7584) I want to change the zero in the relevant V variable to a 1 to recognise this.
I can't work out how to do this! - i used to be a fortran programmer and in that language i coudl have done it - but i can;t seem to work out in SAS how to use data from two different tables at the same time - i can't join them together as they are not the same types of data.
Any suggestions greatly appreciated.
There are going to be some details. In your Subjects data set what are the properties of the variable ICD_code, as in type, numeric or character, and if character the defined length.
This will be needed to correctly parse out details of your other dataset. If you can't provide actual example data then copy the text used to create the data and paste it into a text box opened on this forum with using the </> icon that appears above the main message windows.
Question: what are you really going to do with those 70 variables, especially when next week/month/report cycle someone adds another 1/3/17 diseases of interest?
Depending on what your actual ICD_codes look like it may be easier to get the disease directly into one variable instead of managing so many indicator variables.
SAS has multiple different look up tools: Formats, Proc SQL (which is most definitely not FORTRAN in any form) Hash objects or even a function (though I wouldn't start down that path unless everything is very static).
FWIW, the last FORTRAN program I wrote is used SAS to prototype the code because the IO was cleaner and the logic involved a bunch of date manipulation that was easier to test the logic in SAS before translating to FORTRAN.
Convert your second dataset into a FORMAT that converts ICDCODE into DISEASE_GROUPING. Then use the format to classify your larger dataset.
The first thing you will want to do is convert your wide structure into a tall structure so each disease/ICD pairing is on a separate observation. While you are doing that make a dataset that the PROC IMPORT can understand.
Let's assume that DISEASES looks like this with numeric ICD codes.
data diseases ;
length disease $30 count l1-l5 u1-u5 8;
array lower l1-l5;
array upper u1-u5;
infile cards dsd truncover;
input disease count @;
do count=1 to count;
input lower[count] upper[count] @;
end;
cards;
'lung cancer', 3, 1237 ,1245 , 6854 ,7845, 4579 ,5879
;;;;
Then you would want to make a numeric format. So you could call it ICDGROUP perhaps? So something like this:
data format;
set diseases end=eof;
fmtname='ICDGROUP';
array lower l1-l5;
array upper u1-u5;
label=disease;
hlo=' ';
do count=1 to count;
start=lower[count];
end=upper[count];
output;
end;
if eof then do;
call missing(start,end);
label='UNKNOWN';
hlo='O';
output;
end;
run;
proc format cntlin=format;
run;
Now let's make a SUBJECTS file.
data subjects ;
input subjid icd_code;
cards;
1 1238
2 .
3 6850
4 6900
5 4580
;
And use the ICDGROUP format to make a DISEASE variable.
data want;
set subjects;
if icd_code then disease=put(icd_code,icdgroup.);
run;
Result
i think i can follow that - i'll give it a try next week.
many thanks!
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.