Solved: Re: How to flag a range of characters from multiple columns to create ...

RAGC · Posted 05-02-2020 02:21 PM

Hi,

I am working on a large dataset that lists all follow-up conditions in a range of columns using ICD10 codes. ICD10 codes thankfully have a similar pattern where there is a letter and then a range of number for similar disorders (e.g. Cancer is C000 to C969). I want to go through each column and create a new variable if the person has had one of those types of disorders. I have provided an example below. Disease_X_0 is the column with the ICD10 code, and Cancer is the new variable I would like to create.

ID	Disease_1_0	Disease_2_0	Disease_3_0	Disease_4_0	Cancer
1	C001	B005			1
2	D55		C97		1
3		C97			1
4	K00				0
5		D57			0

I feel like there is an easy way to do this but cannot seem to find it with all my googling! Thank you so much in advance for your time!

ed_sas_member · Posted 05-02-2020 03:02 PM

you’re welcome !
You can mention a range as follows in the prxmatch function:
/C[0-500]/

If the ICD code must have 3 digits following the C letter, you can add this check :
AND prxmatch(‘/C\d\d\d/‘ ...)

if prxmatch('/C\d\d\d/',_Disease(i))>0 and prxmatch('/C[0-500]/',_Disease(i))>0  then cancer=1;

View solution in original post

RAGC · Posted 05-02-2020 02:14 PM

Hi,

I am working on a large dataset that lists all follow-up conditions in a range of columns using ICD10 codes. ICD10 codes thankfully have a similar pattern where there is a letter and then a range of number for similar disorders (e.g. Cancer is C000 to C969). I want to go through each column and create a new variable if the person has had one of those types of disorders. I have provided an example below. Disease_X_0 is the column with the ICD10 code, and Cancer is the new variable I would like to create.

ID	Disease_1_0	Disease_2_0	Disease_3_0	Disease_4_0	Cancer
1	C001	B005			1
2	D55		C97		1
3		C97			1
4	K00				0
5		D57			0

I feel like there is an easy way to do this but cannot seem to find it with all my googling! Thank you so much in advance for your time!

ed_sas_member · Posted 05-02-2020 02:41 PM

Hi @RAGC

You can try the below code. Please avoid duplicate posts 😉

The array function enables to perform the same manipulation on multiple variables.

Hope this helps!

Best,

data have;
	infile datalines dlm="," dsd missover;
	input ID Disease_1_0 $ Disease_2_0 $ Disease_3_0 $ Disease_4_0 $;
	datalines;
1,C001,B005,,
2,D55, ,C97,
3,,C97,,,
4,K00,,,,
5,,D57,,,
;
run;

data want;
	set have;
	array _Disease (*) Disease_:;
	cancer=0;
	do i=1 to dim(_Disease);
		if prxmatch('/C\d+/',_Disease(i))>0 then cancer=1; /* \d+ mean 1 digit or more */
	end;
	drop i;
run;

ed_sas_member · Posted 05-02-2020 02:35 PM

Hi @RAGC

You can try the below code.

The array function enables to perform the same manipulation on multiple variables.

Hope this helps!

Best,

data have;
	infile datalines dlm="," dsd missover;
	input ID Disease_1_0 $ Disease_2_0 $ Disease_3_0 $ Disease_4_0 $;
	datalines;
1,C001,B005,,
2,D55, ,C97,
3,,C97,,,
4,K00,,,,
5,,D57,,,
;
run;

data want;
	set have;
	array _Disease (*) Disease_:;
	cancer=0;
	do i=1 to dim(_Disease);
		if prxmatch('/C\d+/',_Disease(i))>0 then cancer=1; /* \d+ mean 1 digit or more */
	end;
	drop i;
run;

RAGC · Posted 05-02-2020 02:48 PM

Hi @ed_sas_member,

My sincerest apologies for the duplicate posting! I realized I posted in the wrong section, and thought I had deleted it before posting it in here (which I hope is the appropriate section).

Your code worked super well! Thank you so much! If you would indulge me for one step further that I am struggling with. I see that you mentioned the d+ is for anything greater than 1 digit. If there a way to specify a range of the values (e.g. C001 to C500) even though there are up to C969 available characters?.

Best,
Rebecca

ed_sas_member · Posted 05-02-2020 03:02 PM

you’re welcome !
You can mention a range as follows in the prxmatch function:
/C[0-500]/

If the ICD code must have 3 digits following the C letter, you can add this check :
AND prxmatch(‘/C\d\d\d/‘ ...)

if prxmatch('/C\d\d\d/',_Disease(i))>0 and prxmatch('/C[0-500]/',_Disease(i))>0  then cancer=1;

Kurt_Bremser · Posted 05-02-2020 03:07 PM

Hi! I merged everything.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RAGC · Posted 05-02-2020 03:14 PM

Thanks @ed_sas_member for all your help!

Thanks @Kurt_Bremser for the merge.

Best,
Rebecca

ed_sas_member · Posted 05-02-2020 03:20 PM

You’re welcome Rebecca!
Have a wonderful day,
Best,

novinosrin · Posted 05-02-2020 04:15 PM

Sorry @RAGC Late to the party. Anyways, for what it's worth


data have;
	infile datalines dlm="," dsd missover;
	input ID Disease_1_0 $ Disease_2_0 $ Disease_3_0 $ Disease_4_0 $;
	datalines;
1,C001,B005,,
2,D55, ,C97,
3,,C97,,,
4,K00,,,,
5,,D57,,,
;
run;


data want;
 set have;
 array c(969) _temporary_  (1:969);
 array t Disease_1_0--Disease_4_0;
 do _n_=1 to dim(t) until(cancer=1);
  cancer= first(t(_n_))='C' and input(compress(t(_n_),,'kd'),best.) in c;
 end;
run;

proc print noobs;run;

ID	Disease_1_0	Disease_2_0	Disease_3_0	cancer
1	C001	B005		1
2	D55		C97	1
3		C97		1
4	K00			0
5		D57		0

novinosrin · Posted 05-02-2020 04:31 PM

Oops!, I took the list as 1:969 where I missed to start from 0 i.e. 0:969.

Correction:


data have;
	infile datalines dlm="," dsd missover;
	input ID Disease_1_0 $ Disease_2_0 $ Disease_3_0 $ Disease_4_0 $;
	datalines;
1,C000,B005,,
2,D55, ,C97,
3,,C97,,,
4,K00,,,,
5,,D57,,,
;
run;


data want;
 set have;
 array c(000:969) _temporary_  (0:969);
 array t Disease_1_0--Disease_4_0;
 do _n_=1 to dim(t) until(cancer=1);
  cancer= first(t(_n_))='C' and input(compress(t(_n_),,'kd'),best.) in c;
 end;
run;

proc print noobs;run;

How to flag a range of characters from multiple columns to create a new variable

Re: How to flag a range of characters from multiple columns to create a new variable

Flagging a range of characters to create a new variable

Re: Flagging a range of characters to create a new variable

Re: How to flag a range of characters from multiple columns to create a new variable

Re: How to flag a range of characters from multiple columns to create a new variable

Re: How to flag a range of characters from multiple columns to create a new variable

Re: How to flag a range of characters from multiple columns to create a new variable

Re: How to flag a range of characters from multiple columns to create a new variable

Re: How to flag a range of characters from multiple columns to create a new variable

Re: How to flag a range of characters from multiple columns to create a new variable

Re: How to flag a range of characters from multiple columns to create a new variable

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away