Solved: search for first encounter to match cell value and return another valu...

axescot78 · Posted 08-06-2021 12:27 AM

I have patient data in SAS that looks like this (I used CSV to make empty cells load easier):

pat_ID,visit_date,fp_1,fp_2,fp_3,fp_4,fp_5,fp_6,fp_7,fp_8,ini_dx,
87103,2/23/2010,,3E,,,,,,,2,
87103,1/16/1983,2D,,,,,,,,2,
87103,6/20/2005,9E,,,,,,,,2,
87103,3/18/2019,3E,,,,,,,,2,
87103,4/23/1992,,2D,4E,,,,,,2,
87103,7/25/1997,,,,6H,,,,,2,
61291,1/12/2019,2B,,,,,,,,4,
61291,2/19/2016,,,,,,,2D,,4,
61291,6/17/1994,7G,,,,,,,,4,
61291,8/30/1994,,2C,,,,,,,4,
61291,9/30/1999,4H,,,,,,,,4,
61291,11/11/1992,4H,,,,,,,,4,
61291,4/19/1983,,,,9F,,,,,4,
61291,8/13/1995,3D,,,,,,,,4,
61760,10/4/2016,,5C,,,,,,,5,
61760,11/10/2019,1F,,,,,,,,5,
61760,11/7/1989,,2D,,,,,,,5,
61760,8/10/2015,5H,,,,,,,,5,
30098,3/19/1996,,8H,,,,,,,1,
30098,10/3/2003,,8H,,,,,,,1,
30098,7/24/1990,2G,1F,,,,,,,1,
30098,3/13/1987,,,,4C,,,,,1,
30098,10/12/2013,9F,,,,,,,,1,
30098,6/8/1984,,,,,,,,,1,
30098,6/20/2013,4D,,,,,,,,1,
30098,1/6/1990,3A,,,,,,,,1,
27754,7/11/1986,,,6E,,,,,,8,
27754,3/18/1995,,,5C,,,,,,8,
27754,9/18/1988,,,,1B,,,,,8,
27754,8/15/2000,5E,,,,,,,,8,
27754,11/26/1986,4A,,,,,,,,8,
27754,8/17/2013,,2B,,,,,,,8,
27754,5/31/2008,,8G,,,,,,,8,

I want to search the follow up variables(fp_1 thru fp_8) for the initial diagnosis (ini_dx) and then to return the 'visit_date' as well as that value. For example, the first row for patient 87103 does not match initial diagnosis '2' but the second row does (fp_1 has '2D') so return that 'visit_date' and '2D'. How could I do this?

jimbarbour · Posted 08-06-2021 01:57 AM

OK, so here's a little program, below. What it does is identifies a diagnosis ("Addl_Dx") in the follow ups whose first character matches the initial diagnosis. Then, I follow that with a Sort NoDupKey. I used the Pat_ID and Addl_Dx as keys for my sort. The results are shown below. Notice that there are two Addl_Dx values for Pat_ID 61760.

DATA	Want	(KEEP=Pat_ID Ini_Dx Visit_Date2 Addl_Dx);
	DROP	_:;
	SET	Have;
	ARRAY	fp	{*}		fp_1  	-	fp_8;
	FORMAT	Visit_Date2	MMDDYYS10.;

	DO	_i						=	1	TO	DIM(fp);
		IF	INDEX(fp{_i}, STRIP(Ini_Dx))	THEN
			DO;
				Addl_Dx			=	fp{_i};
				Visit_Date2		=	Visit_Date;
			END;
	END;

	IF	MISSING(Addl_Dx)					THEN
		DELETE;
	ELSE
		OUTPUT;
RUN;

PROC	SORT	DATA=Want
				OUT	=Want_DeDup
				NODUPKEY;
	BY	PAT_ID	Addl_Dx;
RUN;

Results:

Jim

View solution in original post

mkeintz · Posted 08-06-2021 01:07 AM

What do you want your output to look like?

Would it be one obs per PAT_ID containing the first qualifying record?. That's what the untested code below does.


data want;
  set have;
  by pat_id notsorted;
  where  trim(ini_dx)=:fp_1 or
         trim(ini_dx)=:fp_2 or
         trim(ini_dx)=:fp_3 or
         trim(ini_dx)=:fp_4 or
         trim(ini_dx)=:fp_5 or
         trim(ini_dx)=:fp_6 or
         trim(ini_dx)=:fp_7 or
         trim(ini_dx)=:fp_8  ;
  if first.pat_id;
run;

I say untested because your have not provided sample data in the form of a working data step.

This assumes that INI_DX and all the FP_ variables are read as character variables.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

axescot78 · Posted 08-06-2021 01:21 AM

@mkeintz wrote:

What do you want your output to look like?

Would it be one obs per PAT_ID containing the first qualifying record?. That's what the untested code below does.
data want;
  set have;
  by pat_id notsorted;
  where  trim(ini_dx)=:fp_1 or
         trim(ini_dx)=:fp_2 or
         trim(ini_dx)=:fp_3 or
         trim(ini_dx)=:fp_4 or
         trim(ini_dx)=:fp_5 or
         trim(ini_dx)=:fp_6 or
         trim(ini_dx)=:fp_7 or
         trim(ini_dx)=:fp_8  ;
  if first.pat_id;
run;
I say untested because your have not provided sample data in the form of a working data step.

This assumes that INI_DX and all the FP_ variables are read as character variables.

I want it to create 2 more variables, one with the visit date and the other with the value ('2D' in my example). Then, I will keep the variables of interest (patient ID, initial dx, and the 2 new variables) and remove duplicate keys with NODUPKEY. Sorry about the data set. I thought the csv could be saved and imported into SAS. With code like this:

proc import datafile="Book1.csv"
        out=data
        dbms=csv
        replace;
run;

proc print data=work.data;
run;

jimbarbour · Posted 08-06-2021 01:57 AM

OK, so here's a little program, below. What it does is identifies a diagnosis ("Addl_Dx") in the follow ups whose first character matches the initial diagnosis. Then, I follow that with a Sort NoDupKey. I used the Pat_ID and Addl_Dx as keys for my sort. The results are shown below. Notice that there are two Addl_Dx values for Pat_ID 61760.

DATA	Want	(KEEP=Pat_ID Ini_Dx Visit_Date2 Addl_Dx);
	DROP	_:;
	SET	Have;
	ARRAY	fp	{*}		fp_1  	-	fp_8;
	FORMAT	Visit_Date2	MMDDYYS10.;

	DO	_i						=	1	TO	DIM(fp);
		IF	INDEX(fp{_i}, STRIP(Ini_Dx))	THEN
			DO;
				Addl_Dx			=	fp{_i};
				Visit_Date2		=	Visit_Date;
			END;
	END;

	IF	MISSING(Addl_Dx)					THEN
		DELETE;
	ELSE
		OUTPUT;
RUN;

PROC	SORT	DATA=Want
				OUT	=Want_DeDup
				NODUPKEY;
	BY	PAT_ID	Addl_Dx;
RUN;

Results:

Jim

axescot78 · Posted 08-06-2021 03:10 PM

Thank you for the detailed solution @jimbarbour ! I will be sure to include the data code for any future questions.

jimbarbour · Posted 08-06-2021 02:07 AM

@mkeintz wrote:

you have not provided sample data in the form of a working data step.

@axescot78:

What @mkeintz and @andreas_lds are talking about is something like the below, a working data step. We can take a working data step and post it into a SAS session in seconds. We can then start helping you immediately. If we have to grab a csv file, create a file on our server, then write a program or use a Proc Import to bring it in, that's tedious -- and we're answering dozens of questions like this a day. Just as sort of a courtesy to those who would like to help you, it's nice if you'd just give us a working data step.

Jim

DATA	Have;
	INFILE	DATALINES	DSD	DLM=',';
	INPUT
		Pat_ID			$
		Visit_Date	: 	MMDDYY10.
		fp_1			$
		fp_2			$
		fp_3			$
		fp_4			$
		fp_5			$
		fp_6			$
		fp_7			$
		fp_8			$
		Ini_Dx			$
		;
	FORMAT	Visit_Date	MMDDYYS10.;
DATALINES;
87103,2/23/2010,,3E,,,,,,,2,
87103,1/16/1983,2D,,,,,,,,2,
87103,6/20/2005,9E,,,,,,,,2,
87103,3/18/2019,3E,,,,,,,,2,
87103,4/23/1992,,2D,4E,,,,,,2,
87103,7/25/1997,,,,6H,,,,,2,
61291,1/12/2019,2B,,,,,,,,4,
61291,2/19/2016,,,,,,,2D,,4,
61291,6/17/1994,7G,,,,,,,,4,
61291,8/30/1994,,2C,,,,,,,4,
61291,9/30/1999,4H,,,,,,,,4,
61291,11/11/1992,4H,,,,,,,,4,
61291,4/19/1983,,,,9F,,,,,4,
61291,8/13/1995,3D,,,,,,,,4,
61760,10/4/2016,,5C,,,,,,,5,
61760,11/10/2019,1F,,,,,,,,5,
61760,11/7/1989,,2D,,,,,,,5,
61760,8/10/2015,5H,,,,,,,,5,
30098,3/19/1996,,8H,,,,,,,1,
30098,10/3/2003,,8H,,,,,,,1,
30098,7/24/1990,2G,1F,,,,,,,1,
30098,3/13/1987,,,,4C,,,,,1,
30098,10/12/2013,9F,,,,,,,,1,
30098,6/8/1984,,,,,,,,,1,
30098,6/20/2013,4D,,,,,,,,1,
30098,1/6/1990,3A,,,,,,,,1,
27754,7/11/1986,,,6E,,,,,,8,
27754,3/18/1995,,,5C,,,,,,8,
27754,9/18/1988,,,,1B,,,,,8,
27754,8/15/2000,5E,,,,,,,,8,
27754,11/26/1986,4A,,,,,,,,8,
27754,8/17/2013,,2B,,,,,,,8,
27754,5/31/2008,,8G,,,,,,,8,
;
RUN;

andreas_lds · Posted 08-06-2021 01:41 AM

Please post the data in usable form. What exactly do you expect as result? The visit_date from the first obs (per pat_id)?

search for first encounter to match cell value and return another value

Re: search for first encounter to match cell value and return another value

Re: search for first encounter to match cell value and return another value

Re: search for first encounter to match cell value and return another value

Re: search for first encounter to match cell value and return another value

Re: search for first encounter to match cell value and return another value

Re: search for first encounter to match cell value and return another value

Re: search for first encounter to match cell value and return another value

Registration is open

Registration is open

SAS Training: Just a Click Away