BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
axescot78
Quartz | Level 8

I have patient data in SAS that looks like this (I used CSV to make empty cells load easier):


pat_ID,visit_date,fp_1,fp_2,fp_3,fp_4,fp_5,fp_6,fp_7,fp_8,ini_dx,
87103,2/23/2010,,3E,,,,,,,2,
87103,1/16/1983,2D,,,,,,,,2,
87103,6/20/2005,9E,,,,,,,,2,
87103,3/18/2019,3E,,,,,,,,2,
87103,4/23/1992,,2D,4E,,,,,,2,
87103,7/25/1997,,,,6H,,,,,2,
61291,1/12/2019,2B,,,,,,,,4,
61291,2/19/2016,,,,,,,2D,,4,
61291,6/17/1994,7G,,,,,,,,4,
61291,8/30/1994,,2C,,,,,,,4,
61291,9/30/1999,4H,,,,,,,,4,
61291,11/11/1992,4H,,,,,,,,4,
61291,4/19/1983,,,,9F,,,,,4,
61291,8/13/1995,3D,,,,,,,,4,
61760,10/4/2016,,5C,,,,,,,5,
61760,11/10/2019,1F,,,,,,,,5,
61760,11/7/1989,,2D,,,,,,,5,
61760,8/10/2015,5H,,,,,,,,5,
30098,3/19/1996,,8H,,,,,,,1,
30098,10/3/2003,,8H,,,,,,,1,
30098,7/24/1990,2G,1F,,,,,,,1,
30098,3/13/1987,,,,4C,,,,,1,
30098,10/12/2013,9F,,,,,,,,1,
30098,6/8/1984,,,,,,,,,1,
30098,6/20/2013,4D,,,,,,,,1,
30098,1/6/1990,3A,,,,,,,,1,
27754,7/11/1986,,,6E,,,,,,8,
27754,3/18/1995,,,5C,,,,,,8,
27754,9/18/1988,,,,1B,,,,,8,
27754,8/15/2000,5E,,,,,,,,8,
27754,11/26/1986,4A,,,,,,,,8,
27754,8/17/2013,,2B,,,,,,,8,
27754,5/31/2008,,8G,,,,,,,8,


I want to search the follow up variables(fp_1 thru fp_8) for the initial diagnosis (ini_dx) and then to return the 'visit_date' as well as that value. For example, the first row for patient 87103 does not match initial diagnosis '2' but the second row does (fp_1 has '2D') so return that 'visit_date' and '2D'. How could I do this?

1 ACCEPTED SOLUTION

Accepted Solutions
jimbarbour
Meteorite | Level 14

OK, so here's a little program, below.  What it does is identifies a diagnosis ("Addl_Dx") in the follow ups whose first character matches the initial diagnosis.  Then, I follow that with a Sort NoDupKey.  I used the Pat_ID and Addl_Dx as keys for my sort.  The results are shown below.  Notice that there are two Addl_Dx values for Pat_ID 61760.

 

DATA	Want	(KEEP=Pat_ID Ini_Dx Visit_Date2 Addl_Dx);
	DROP	_:;
	SET	Have;
	ARRAY	fp	{*}		fp_1  	-	fp_8;
	FORMAT	Visit_Date2	MMDDYYS10.;

	DO	_i						=	1	TO	DIM(fp);
		IF	INDEX(fp{_i}, STRIP(Ini_Dx))	THEN
			DO;
				Addl_Dx			=	fp{_i};
				Visit_Date2		=	Visit_Date;
			END;
	END;

	IF	MISSING(Addl_Dx)					THEN
		DELETE;
	ELSE
		OUTPUT;
RUN;

PROC	SORT	DATA=Want
				OUT	=Want_DeDup
				NODUPKEY;
	BY	PAT_ID	Addl_Dx;
RUN;

Results:

jimbarbour_0-1628229456090.png

 

Jim

View solution in original post

6 REPLIES 6
mkeintz
PROC Star

What do you want your output to look like? 

 

Would it be one obs per PAT_ID containing the first qualifying record?.  That's what the untested code below does.

 


data want;
  set have;
  by pat_id notsorted;
  where  trim(ini_dx)=:fp_1 or
         trim(ini_dx)=:fp_2 or
         trim(ini_dx)=:fp_3 or
         trim(ini_dx)=:fp_4 or
         trim(ini_dx)=:fp_5 or
         trim(ini_dx)=:fp_6 or
         trim(ini_dx)=:fp_7 or
         trim(ini_dx)=:fp_8  ;
  if first.pat_id;
run;

I say untested because your have not provided sample data in the form of a working data step.

 

This assumes that INI_DX and all the FP_ variables are read as character variables.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
axescot78
Quartz | Level 8

 


@mkeintz wrote:

What do you want your output to look like? 

 

Would it be one obs per PAT_ID containing the first qualifying record?.  That's what the untested code below does.

 


data want;
  set have;
  by pat_id notsorted;
  where  trim(ini_dx)=:fp_1 or
         trim(ini_dx)=:fp_2 or
         trim(ini_dx)=:fp_3 or
         trim(ini_dx)=:fp_4 or
         trim(ini_dx)=:fp_5 or
         trim(ini_dx)=:fp_6 or
         trim(ini_dx)=:fp_7 or
         trim(ini_dx)=:fp_8  ;
  if first.pat_id;
run;

I say untested because your have not provided sample data in the form of a working data step.

 

This assumes that INI_DX and all the FP_ variables are read as character variables.

 

 


I want it to create 2 more variables, one with the visit date and the other with the value ('2D' in my example). Then, I will keep the variables of interest (patient ID, initial dx, and the 2 new variables) and remove duplicate keys with NODUPKEY. Sorry about the data set. I thought the csv could be saved and imported into SAS. With code like this:

 

proc import datafile="Book1.csv"
        out=data
        dbms=csv
        replace;
run;

proc print data=work.data;
run;

 

 

 

jimbarbour
Meteorite | Level 14

OK, so here's a little program, below.  What it does is identifies a diagnosis ("Addl_Dx") in the follow ups whose first character matches the initial diagnosis.  Then, I follow that with a Sort NoDupKey.  I used the Pat_ID and Addl_Dx as keys for my sort.  The results are shown below.  Notice that there are two Addl_Dx values for Pat_ID 61760.

 

DATA	Want	(KEEP=Pat_ID Ini_Dx Visit_Date2 Addl_Dx);
	DROP	_:;
	SET	Have;
	ARRAY	fp	{*}		fp_1  	-	fp_8;
	FORMAT	Visit_Date2	MMDDYYS10.;

	DO	_i						=	1	TO	DIM(fp);
		IF	INDEX(fp{_i}, STRIP(Ini_Dx))	THEN
			DO;
				Addl_Dx			=	fp{_i};
				Visit_Date2		=	Visit_Date;
			END;
	END;

	IF	MISSING(Addl_Dx)					THEN
		DELETE;
	ELSE
		OUTPUT;
RUN;

PROC	SORT	DATA=Want
				OUT	=Want_DeDup
				NODUPKEY;
	BY	PAT_ID	Addl_Dx;
RUN;

Results:

jimbarbour_0-1628229456090.png

 

Jim

axescot78
Quartz | Level 8

Thank you for the detailed solution @jimbarbour ! I will be sure to include the data code for any future questions.

jimbarbour
Meteorite | Level 14

@mkeintz wrote:

you have not provided sample data in the form of a working data step.

@axescot78:

 

What @mkeintz and @andreas_lds are talking about is something like the below, a working data step.  We can take a working data step and post it into a SAS session in seconds.  We can then start helping you immediately.  If we have to grab a csv file, create a file on our server, then write a program or use a Proc Import to bring it in, that's tedious -- and we're answering dozens of questions like this a day.  Just as sort of a courtesy to those who would like to help you, it's nice if you'd just give us a working data step.

 

Jim

 

DATA	Have;
	INFILE	DATALINES	DSD	DLM=',';
	INPUT
		Pat_ID			$
		Visit_Date	: 	MMDDYY10.
		fp_1			$
		fp_2			$
		fp_3			$
		fp_4			$
		fp_5			$
		fp_6			$
		fp_7			$
		fp_8			$
		Ini_Dx			$
		;
	FORMAT	Visit_Date	MMDDYYS10.;
DATALINES;
87103,2/23/2010,,3E,,,,,,,2,
87103,1/16/1983,2D,,,,,,,,2,
87103,6/20/2005,9E,,,,,,,,2,
87103,3/18/2019,3E,,,,,,,,2,
87103,4/23/1992,,2D,4E,,,,,,2,
87103,7/25/1997,,,,6H,,,,,2,
61291,1/12/2019,2B,,,,,,,,4,
61291,2/19/2016,,,,,,,2D,,4,
61291,6/17/1994,7G,,,,,,,,4,
61291,8/30/1994,,2C,,,,,,,4,
61291,9/30/1999,4H,,,,,,,,4,
61291,11/11/1992,4H,,,,,,,,4,
61291,4/19/1983,,,,9F,,,,,4,
61291,8/13/1995,3D,,,,,,,,4,
61760,10/4/2016,,5C,,,,,,,5,
61760,11/10/2019,1F,,,,,,,,5,
61760,11/7/1989,,2D,,,,,,,5,
61760,8/10/2015,5H,,,,,,,,5,
30098,3/19/1996,,8H,,,,,,,1,
30098,10/3/2003,,8H,,,,,,,1,
30098,7/24/1990,2G,1F,,,,,,,1,
30098,3/13/1987,,,,4C,,,,,1,
30098,10/12/2013,9F,,,,,,,,1,
30098,6/8/1984,,,,,,,,,1,
30098,6/20/2013,4D,,,,,,,,1,
30098,1/6/1990,3A,,,,,,,,1,
27754,7/11/1986,,,6E,,,,,,8,
27754,3/18/1995,,,5C,,,,,,8,
27754,9/18/1988,,,,1B,,,,,8,
27754,8/15/2000,5E,,,,,,,,8,
27754,11/26/1986,4A,,,,,,,,8,
27754,8/17/2013,,2B,,,,,,,8,
27754,5/31/2008,,8G,,,,,,,8,
;
RUN;
andreas_lds
Jade | Level 19

Please post the data in usable form. What exactly do you expect as result? The visit_date from the first obs (per pat_id)?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 825 views
  • 1 like
  • 4 in conversation