- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am working on a large dataset that lists all follow-up conditions in a range of columns using ICD10 codes. ICD10 codes thankfully have a similar pattern where there is a letter and then a range of number for similar disorders (e.g. Cancer is C000 to C969). I want to go through each column and create a new variable if the person has had one of those types of disorders. I have provided an example below. Disease_X_0 is the column with the ICD10 code, and Cancer is the new variable I would like to create.
ID | Disease_1_0 | Disease_2_0 | Disease_3_0 | Disease_4_0 | Cancer |
1 | C001 | B005 | 1 | ||
2 | D55 | C97 | 1 | ||
3 | C97 | 1 | |||
4 | K00 | 0 | |||
5 | D57 | 0 |
I feel like there is an easy way to do this but cannot seem to find it with all my googling! Thank you so much in advance for your time!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
you’re welcome !
You can mention a range as follows in the prxmatch function:
/C[0-500]/
If the ICD code must have 3 digits following the C letter, you can add this check :
AND prxmatch(‘/C\d\d\d/‘ ...)
if prxmatch('/C\d\d\d/',_Disease(i))>0 and prxmatch('/C[0-500]/',_Disease(i))>0 then cancer=1;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am working on a large dataset that lists all follow-up conditions in a range of columns using ICD10 codes. ICD10 codes thankfully have a similar pattern where there is a letter and then a range of number for similar disorders (e.g. Cancer is C000 to C969). I want to go through each column and create a new variable if the person has had one of those types of disorders. I have provided an example below. Disease_X_0 is the column with the ICD10 code, and Cancer is the new variable I would like to create.
ID | Disease_1_0 | Disease_2_0 | Disease_3_0 | Disease_4_0 | Cancer |
1 | C001 | B005 | 1 | ||
2 | D55 | C97 | 1 | ||
3 | C97 | 1 | |||
4 | K00 | 0 | |||
5 | D57 | 0 |
I feel like there is an easy way to do this but cannot seem to find it with all my googling! Thank you so much in advance for your time!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @RAGC
You can try the below code. Please avoid duplicate posts 😉
The array function enables to perform the same manipulation on multiple variables.
Hope this helps!
Best,
data have;
infile datalines dlm="," dsd missover;
input ID Disease_1_0 $ Disease_2_0 $ Disease_3_0 $ Disease_4_0 $;
datalines;
1,C001,B005,,
2,D55, ,C97,
3,,C97,,,
4,K00,,,,
5,,D57,,,
;
run;
data want;
set have;
array _Disease (*) Disease_:;
cancer=0;
do i=1 to dim(_Disease);
if prxmatch('/C\d+/',_Disease(i))>0 then cancer=1; /* \d+ mean 1 digit or more */
end;
drop i;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @RAGC
You can try the below code.
The array function enables to perform the same manipulation on multiple variables.
Hope this helps!
Best,
data have;
infile datalines dlm="," dsd missover;
input ID Disease_1_0 $ Disease_2_0 $ Disease_3_0 $ Disease_4_0 $;
datalines;
1,C001,B005,,
2,D55, ,C97,
3,,C97,,,
4,K00,,,,
5,,D57,,,
;
run;
data want;
set have;
array _Disease (*) Disease_:;
cancer=0;
do i=1 to dim(_Disease);
if prxmatch('/C\d+/',_Disease(i))>0 then cancer=1; /* \d+ mean 1 digit or more */
end;
drop i;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @ed_sas_member,
My sincerest apologies for the duplicate posting! I realized I posted in the wrong section, and thought I had deleted it before posting it in here (which I hope is the appropriate section).
Your code worked super well! Thank you so much! If you would indulge me for one step further that I am struggling with. I see that you mentioned the d+ is for anything greater than 1 digit. If there a way to specify a range of the values (e.g. C001 to C500) even though there are up to C969 available characters?.
Best,
Rebecca
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
you’re welcome !
You can mention a range as follows in the prxmatch function:
/C[0-500]/
If the ICD code must have 3 digits following the C letter, you can add this check :
AND prxmatch(‘/C\d\d\d/‘ ...)
if prxmatch('/C\d\d\d/',_Disease(i))>0 and prxmatch('/C[0-500]/',_Disease(i))>0 then cancer=1;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Have a wonderful day,
Best,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry @RAGC Late to the party. Anyways, for what it's worth
data have;
infile datalines dlm="," dsd missover;
input ID Disease_1_0 $ Disease_2_0 $ Disease_3_0 $ Disease_4_0 $;
datalines;
1,C001,B005,,
2,D55, ,C97,
3,,C97,,,
4,K00,,,,
5,,D57,,,
;
run;
data want;
set have;
array c(969) _temporary_ (1:969);
array t Disease_1_0--Disease_4_0;
do _n_=1 to dim(t) until(cancer=1);
cancer= first(t(_n_))='C' and input(compress(t(_n_),,'kd'),best.) in c;
end;
run;
proc print noobs;run;
ID | Disease_1_0 | Disease_2_0 | Disease_3_0 | Disease_4_0 | cancer |
---|---|---|---|---|---|
1 | C001 | B005 | 1 | ||
2 | D55 | C97 | 1 | ||
3 | C97 | 1 | |||
4 | K00 | 0 | |||
5 | D57 | 0 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Oops!, I took the list as 1:969 where I missed to start from 0 i.e. 0:969.
Correction:
data have;
infile datalines dlm="," dsd missover;
input ID Disease_1_0 $ Disease_2_0 $ Disease_3_0 $ Disease_4_0 $;
datalines;
1,C000,B005,,
2,D55, ,C97,
3,,C97,,,
4,K00,,,,
5,,D57,,,
;
run;
data want;
set have;
array c(000:969) _temporary_ (0:969);
array t Disease_1_0--Disease_4_0;
do _n_=1 to dim(t) until(cancer=1);
cancer= first(t(_n_))='C' and input(compress(t(_n_),,'kd'),best.) in c;
end;
run;
proc print noobs;run;