BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
axescot78
Quartz | Level 8

I have patient data in SAS that looks like this (I used CSV to make empty cells load easier):


pat_ID,visit_date,fp_1,fp_2,fp_3,fp_4,fp_5,fp_6,fp_7,fp_8,ini_dx,
87103,2/23/2010,,3E,,,,,,,2,
87103,1/16/1983,2D,,,,,,,,2,
87103,6/20/2005,9E,,,,,,,,2,
87103,3/18/2019,3E,,,,,,,,2,
87103,4/23/1992,,2D,4E,,,,,,2,
87103,7/25/1997,,,,6H,,,,,2,
61291,1/12/2019,2B,,,,,,,,4,
61291,2/19/2016,,,,,,,2D,,4,
61291,6/17/1994,7G,,,,,,,,4,
61291,8/30/1994,,2C,,,,,,,4,
61291,9/30/1999,4H,,,,,,,,4,
61291,11/11/1992,4H,,,,,,,,4,
61291,4/19/1983,,,,9F,,,,,4,
61291,8/13/1995,3D,,,,,,,,4,
61760,10/4/2016,,5C,,,,,,,5,
61760,11/10/2019,1F,,,,,,,,5,
61760,11/7/1989,,2D,,,,,,,5,
61760,8/10/2015,5H,,,,,,,,5,
30098,3/19/1996,,8H,,,,,,,1,
30098,10/3/2003,,8H,,,,,,,1,
30098,7/24/1990,2G,1F,,,,,,,1,
30098,3/13/1987,,,,4C,,,,,1,
30098,10/12/2013,9F,,,,,,,,1,
30098,6/8/1984,,,,,,,,,1,
30098,6/20/2013,4D,,,,,,,,1,
30098,1/6/1990,3A,,,,,,,,1,
27754,7/11/1986,,,6E,,,,,,8,
27754,3/18/1995,,,5C,,,,,,8,
27754,9/18/1988,,,,1B,,,,,8,
27754,8/15/2000,5E,,,,,,,,8,
27754,11/26/1986,4A,,,,,,,,8,
27754,8/17/2013,,2B,,,,,,,8,
27754,5/31/2008,,8G,,,,,,,8,


I want to search the follow up variables(fp_1 thru fp_8) for the initial diagnosis (ini_dx) and then to return the 'visit_date' as well as that value. For example, the first row for patient 87103 does not match initial diagnosis '2' but the second row does (fp_1 has '2D') so return that 'visit_date' and '2D'. How could I do this?

1 ACCEPTED SOLUTION

Accepted Solutions
jimbarbour
Meteorite | Level 14

OK, so here's a little program, below.  What it does is identifies a diagnosis ("Addl_Dx") in the follow ups whose first character matches the initial diagnosis.  Then, I follow that with a Sort NoDupKey.  I used the Pat_ID and Addl_Dx as keys for my sort.  The results are shown below.  Notice that there are two Addl_Dx values for Pat_ID 61760.

 

DATA	Want	(KEEP=Pat_ID Ini_Dx Visit_Date2 Addl_Dx);
	DROP	_:;
	SET	Have;
	ARRAY	fp	{*}		fp_1  	-	fp_8;
	FORMAT	Visit_Date2	MMDDYYS10.;

	DO	_i						=	1	TO	DIM(fp);
		IF	INDEX(fp{_i}, STRIP(Ini_Dx))	THEN
			DO;
				Addl_Dx			=	fp{_i};
				Visit_Date2		=	Visit_Date;
			END;
	END;

	IF	MISSING(Addl_Dx)					THEN
		DELETE;
	ELSE
		OUTPUT;
RUN;

PROC	SORT	DATA=Want
				OUT	=Want_DeDup
				NODUPKEY;
	BY	PAT_ID	Addl_Dx;
RUN;

Results:

jimbarbour_0-1628229456090.png

 

Jim

View solution in original post

6 REPLIES 6
mkeintz
PROC Star

What do you want your output to look like? 

 

Would it be one obs per PAT_ID containing the first qualifying record?.  That's what the untested code below does.

 


data want;
  set have;
  by pat_id notsorted;
  where  trim(ini_dx)=:fp_1 or
         trim(ini_dx)=:fp_2 or
         trim(ini_dx)=:fp_3 or
         trim(ini_dx)=:fp_4 or
         trim(ini_dx)=:fp_5 or
         trim(ini_dx)=:fp_6 or
         trim(ini_dx)=:fp_7 or
         trim(ini_dx)=:fp_8  ;
  if first.pat_id;
run;

I say untested because your have not provided sample data in the form of a working data step.

 

This assumes that INI_DX and all the FP_ variables are read as character variables.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
axescot78
Quartz | Level 8

 


@mkeintz wrote:

What do you want your output to look like? 

 

Would it be one obs per PAT_ID containing the first qualifying record?.  That's what the untested code below does.

 


data want;
  set have;
  by pat_id notsorted;
  where  trim(ini_dx)=:fp_1 or
         trim(ini_dx)=:fp_2 or
         trim(ini_dx)=:fp_3 or
         trim(ini_dx)=:fp_4 or
         trim(ini_dx)=:fp_5 or
         trim(ini_dx)=:fp_6 or
         trim(ini_dx)=:fp_7 or
         trim(ini_dx)=:fp_8  ;
  if first.pat_id;
run;

I say untested because your have not provided sample data in the form of a working data step.

 

This assumes that INI_DX and all the FP_ variables are read as character variables.

 

 


I want it to create 2 more variables, one with the visit date and the other with the value ('2D' in my example). Then, I will keep the variables of interest (patient ID, initial dx, and the 2 new variables) and remove duplicate keys with NODUPKEY. Sorry about the data set. I thought the csv could be saved and imported into SAS. With code like this:

 

proc import datafile="Book1.csv"
        out=data
        dbms=csv
        replace;
run;

proc print data=work.data;
run;

 

 

 

jimbarbour
Meteorite | Level 14

OK, so here's a little program, below.  What it does is identifies a diagnosis ("Addl_Dx") in the follow ups whose first character matches the initial diagnosis.  Then, I follow that with a Sort NoDupKey.  I used the Pat_ID and Addl_Dx as keys for my sort.  The results are shown below.  Notice that there are two Addl_Dx values for Pat_ID 61760.

 

DATA	Want	(KEEP=Pat_ID Ini_Dx Visit_Date2 Addl_Dx);
	DROP	_:;
	SET	Have;
	ARRAY	fp	{*}		fp_1  	-	fp_8;
	FORMAT	Visit_Date2	MMDDYYS10.;

	DO	_i						=	1	TO	DIM(fp);
		IF	INDEX(fp{_i}, STRIP(Ini_Dx))	THEN
			DO;
				Addl_Dx			=	fp{_i};
				Visit_Date2		=	Visit_Date;
			END;
	END;

	IF	MISSING(Addl_Dx)					THEN
		DELETE;
	ELSE
		OUTPUT;
RUN;

PROC	SORT	DATA=Want
				OUT	=Want_DeDup
				NODUPKEY;
	BY	PAT_ID	Addl_Dx;
RUN;

Results:

jimbarbour_0-1628229456090.png

 

Jim

axescot78
Quartz | Level 8

Thank you for the detailed solution @jimbarbour ! I will be sure to include the data code for any future questions.

jimbarbour
Meteorite | Level 14

@mkeintz wrote:

you have not provided sample data in the form of a working data step.

@axescot78:

 

What @mkeintz and @andreas_lds are talking about is something like the below, a working data step.  We can take a working data step and post it into a SAS session in seconds.  We can then start helping you immediately.  If we have to grab a csv file, create a file on our server, then write a program or use a Proc Import to bring it in, that's tedious -- and we're answering dozens of questions like this a day.  Just as sort of a courtesy to those who would like to help you, it's nice if you'd just give us a working data step.

 

Jim

 

DATA	Have;
	INFILE	DATALINES	DSD	DLM=',';
	INPUT
		Pat_ID			$
		Visit_Date	: 	MMDDYY10.
		fp_1			$
		fp_2			$
		fp_3			$
		fp_4			$
		fp_5			$
		fp_6			$
		fp_7			$
		fp_8			$
		Ini_Dx			$
		;
	FORMAT	Visit_Date	MMDDYYS10.;
DATALINES;
87103,2/23/2010,,3E,,,,,,,2,
87103,1/16/1983,2D,,,,,,,,2,
87103,6/20/2005,9E,,,,,,,,2,
87103,3/18/2019,3E,,,,,,,,2,
87103,4/23/1992,,2D,4E,,,,,,2,
87103,7/25/1997,,,,6H,,,,,2,
61291,1/12/2019,2B,,,,,,,,4,
61291,2/19/2016,,,,,,,2D,,4,
61291,6/17/1994,7G,,,,,,,,4,
61291,8/30/1994,,2C,,,,,,,4,
61291,9/30/1999,4H,,,,,,,,4,
61291,11/11/1992,4H,,,,,,,,4,
61291,4/19/1983,,,,9F,,,,,4,
61291,8/13/1995,3D,,,,,,,,4,
61760,10/4/2016,,5C,,,,,,,5,
61760,11/10/2019,1F,,,,,,,,5,
61760,11/7/1989,,2D,,,,,,,5,
61760,8/10/2015,5H,,,,,,,,5,
30098,3/19/1996,,8H,,,,,,,1,
30098,10/3/2003,,8H,,,,,,,1,
30098,7/24/1990,2G,1F,,,,,,,1,
30098,3/13/1987,,,,4C,,,,,1,
30098,10/12/2013,9F,,,,,,,,1,
30098,6/8/1984,,,,,,,,,1,
30098,6/20/2013,4D,,,,,,,,1,
30098,1/6/1990,3A,,,,,,,,1,
27754,7/11/1986,,,6E,,,,,,8,
27754,3/18/1995,,,5C,,,,,,8,
27754,9/18/1988,,,,1B,,,,,8,
27754,8/15/2000,5E,,,,,,,,8,
27754,11/26/1986,4A,,,,,,,,8,
27754,8/17/2013,,2B,,,,,,,8,
27754,5/31/2008,,8G,,,,,,,8,
;
RUN;
andreas_lds
Jade | Level 19

Please post the data in usable form. What exactly do you expect as result? The visit_date from the first obs (per pat_id)?

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1943 views
  • 1 like
  • 4 in conversation