BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Rahim9
Fluorite | Level 6

Hi all,

 

I am new to this community. I need help manipulating my data in order to conduct an analysis. 

My data is a trauma records data, it contains up to 65 variables for every patient who sustained a trauma and was captured in the registry. I am conducting an analysis to identify association between certain factors (gender, race, and insurance status) and risk of death. However, In my multivariate logistic regression model, I need to control for comorbid conditions. The dataset has 5 columns (variables) for comorbid conditions for each patient in the record. These comorbid variables are coded as character variables such as "epilepsy seizures" "dementia" "osteoporosis" etc.

 

What I want to do is to get SAS to search each of those five comorbid variables for the presence of any of selected conditions (30 in number) that I would provide. SAS should then create a count variable that will indicated the number of comorbid conditions present on each patient. (0 to 30).

 

This is basically how my data looks like:

record_id   comorbid_1   comorbid_2    comorbid_3   comorbid_4    comorbid_5

1             dementia              -                    -               pulmonary_disorder      -

2                   -               osteoporosis       dementia           -              prostate cancer

3              hepatitis                -               pacemaker         dementia             -

 

I know this is a lengthy post, I really would appreciate any help.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Not really a long post, so no worries.

 

It helps to provide data in the form of a data step so we have something to code against. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

Here is one way that may  get you started. The data step is an example of turning your pasted text into a data step that can create usable data.

data have;
infile datalines dlm=',' dsd truncover;
informat record_id   comorbid_1   comorbid_2    comorbid_3   comorbid_4    comorbid_5 $25.;
input record_id   comorbid_1   comorbid_2    comorbid_3   comorbid_4    comorbid_5;
datalines;
1,dementia,,,pulmonary_disorder,
2,,osteoporosis,dementia,,prostate cancer
3, hepatitis,,pacemaker,dementia,
;

data want;
   set have;
   array words{5} $25 _temporary_ ('dementia','pulmonary_disorder','osteoporosis','hepatitis','pacemaker');
   array c comorbid_:;
   cc=0;
   do i= 1 to dim(words);
      cc= cc+ (whichc(words[i], of c(*))>0);
   end;
   drop i;
   label cc='Comorbidity count';
run;

Some things that a new SAS user will not recognize right away.

The first array, WORDS, contains values that will 1) disappear at the end of the data step, that's what the _temporary_ means, 2) sets the initial (and not changed values) in the (). The number in the {} is how many values you need this has to be large enough to match the number of actual terms you need. The $25 says that the array will hold character values and how long each is. This needs to be at least as long as the longest value you need to search for.

 

We set the value to 0 before starting the search.

The WHICHC function will search for the first parameter, the current item as we loop through the list of words, and if found in any of the elements contained in the second array c returns which position, 1 to 5 in this case if found or 0 if not found. The comparison of >1 creates a 1 when any match is found so we can add that to the counter variable CC.

 

Things to consider: if your data entry isn't clean and has values like :Dementia, dementia or dEMENTIA you may want to UPCASE all of the array values so case is not an issue as "dementia" will not match "Dementia" in the WHICHC values.

Second if you have spelling issues: "demensia" for instance, you likely want to fix those first OR you end up with many more values in the WORDS array to include all of the different spellings.

View solution in original post

7 REPLIES 7
Reeza
Super User
You need an array and a loop to search through the items.
Rahim9
Fluorite | Level 6

Thank you for your response. Any suggestion on how to code Array and Loop?

Reeza
Super User

This paper covers using arrays to search for diagnosis and shows how to convert your code:

 

https://support.sas.com/resources/papers/proceedings16/5720-2016.pdf

 


@Rahim9 wrote:

Thank you for your response. Any suggestion on how to code Array and Loop?


 

Rahim9
Fluorite | Level 6

I will look at it.

 

Thank you!

ballardw
Super User

Not really a long post, so no worries.

 

It helps to provide data in the form of a data step so we have something to code against. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

Here is one way that may  get you started. The data step is an example of turning your pasted text into a data step that can create usable data.

data have;
infile datalines dlm=',' dsd truncover;
informat record_id   comorbid_1   comorbid_2    comorbid_3   comorbid_4    comorbid_5 $25.;
input record_id   comorbid_1   comorbid_2    comorbid_3   comorbid_4    comorbid_5;
datalines;
1,dementia,,,pulmonary_disorder,
2,,osteoporosis,dementia,,prostate cancer
3, hepatitis,,pacemaker,dementia,
;

data want;
   set have;
   array words{5} $25 _temporary_ ('dementia','pulmonary_disorder','osteoporosis','hepatitis','pacemaker');
   array c comorbid_:;
   cc=0;
   do i= 1 to dim(words);
      cc= cc+ (whichc(words[i], of c(*))>0);
   end;
   drop i;
   label cc='Comorbidity count';
run;

Some things that a new SAS user will not recognize right away.

The first array, WORDS, contains values that will 1) disappear at the end of the data step, that's what the _temporary_ means, 2) sets the initial (and not changed values) in the (). The number in the {} is how many values you need this has to be large enough to match the number of actual terms you need. The $25 says that the array will hold character values and how long each is. This needs to be at least as long as the longest value you need to search for.

 

We set the value to 0 before starting the search.

The WHICHC function will search for the first parameter, the current item as we loop through the list of words, and if found in any of the elements contained in the second array c returns which position, 1 to 5 in this case if found or 0 if not found. The comparison of >1 creates a 1 when any match is found so we can add that to the counter variable CC.

 

Things to consider: if your data entry isn't clean and has values like :Dementia, dementia or dEMENTIA you may want to UPCASE all of the array values so case is not an issue as "dementia" will not match "Dementia" in the WHICHC values.

Second if you have spelling issues: "demensia" for instance, you likely want to fix those first OR you end up with many more values in the WORDS array to include all of the different spellings.

Rahim9
Fluorite | Level 6

Thank you so much.

 

This is very helpful. I will study it and get back.

 

Best,

Rahim9
Fluorite | Level 6

Thank you.

 

I used the code you provided and it has worked! Even though the 'cc' variable that was created by the array turned out to be binary '0' and '1' instead of a count variable, I can still use that in my analysis since the proportion of records with any of the comorbid conditions of interest to me is just 2.42%

 

Appreciate your help!

 

Best,

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1128 views
  • 2 likes
  • 3 in conversation