BookmarkSubscribeRSS Feed
tstevens
Calcite | Level 5

Hello,

 

I am trying to take free text entries and categorize them into disease categories using SAS 9.4.

 

My code looks like this: 

 

data one;

set zero;

 

array dx(6) il2a1 il2a2 il2a3 il2a4 il2a5 il2a6;
neo_unspec=0;
do i= 1 to 6;
if upcase (dx(i)) in ("(R) OCCIPITAL BRAIN TUMOUR"
"155.2 - LIVER MASS"
"1570 PANCREATIC MASS"
"1709 - TUMOR LEFT HIP"
"1916 - CEREBELLAR MASS"
"1918 BRAIN TUMOUR"
"1919 BRAIN TUMOR")

then neo_unspec=1;
end;
dropi;


array dx1(6) il2a1 il2a2 il2a3 il2a4 il2a5 il2a6;
neo_malig=0;
do i= 1 to 6;
if upcase (dx1(i)) in ("CANCER OF PROSTATE (2002) WITH METS TO BONE AND LUNG"
"CANCER OF PROSTATE (EVIDENCE OF BONE METASTATIC DISEASE)"
"CANCER OF PROSTATE - 2005"
"CANCER OF PROSTATE - TURP JUNE 4/13"
"CANCER OF PROSTATE WIITH METS TO BONE"
"CANCER OF PROSTATE WITH BONE METASTASES, METS TO PELVIS AND LOWER ABDOMEN"
"CANCER OF PROSTATE WITH BONE METS"
"CANCER OF PROSTATE WITH METASTASES TO BONE"
"CANCER OF PROSTATE WITH METASTASES."
"CANCER OF PROSTATE WITH METS TO BACK AND LYMPH NODES.")

then neo_malig=1;
end;
dropi;

 

array dx4(6) il2a1 il2a2 il2a3 il2a4 il2a5 il2a6;
circulatory=0;
do i= 1 to 6;
if upcase (dx4(i)) in ("HEART BLOCK-PACEMAKER"
"HEART BLOCK/PACEMAKER"
"HEART BLOCK/TINNITUS"
"HEART BLOCKAGES"
"HEART BURN"
"HEART CABG"
"HEART CALCIFICATION"
"HEART CATHETERIZATION"
"HEART CONDITION")

then circulatory=1;
end;
dropi;

 

run;

 

This actually goes on with many more diagnosis categories and there are 100s of thousands of unique text entries, but above is just an example.

 

il2a1-il2a6 are the free-text diagnosis variables. 

 

We started with an excel spreadsheet of unique text entries, created columns for each new diagnosis category of interest and then manually went through each entry and placed a 1 in the appropriate category. Then, we sorted the excel data and added quotation marks to the free text entries to copy/paste into SAS.

 

The problem is that when I run it, some entries are populating more than one category. I've double checked the raw data to make sure entries are not in fact categorized multiple times. Please help!

 

1 REPLY 1
ballardw
Super User

If I understand your situation you might consider reading some of those Excel columns into a data set. If you already have the 1 indicator then you should be able to join the values of the text to get the indicator (or missing).

 

You may also be better off to transpose the data so you have a single diagnosis variable plus the other identfications. Then you don't need to work with an array and

Your issue with

The problem is that when I run it, some entries are populating more than one category. I've double checked the raw data to make sure entries are not in fact categorized multiple times. Please help!

 

You likely need to provide some actual example data as there is not enough information provided to tell why some of your code might behave that way.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

Remove any variables that are not directly related to this issue. Make sure to include some that are getting the "wrong" result.

 

BTW it is not best practice to define multiple arrays for the exact same elements. You can confuse yourself and code.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 298 views
  • 0 likes
  • 2 in conversation