I have patient data in SAS that looks like this (I used CSV to make empty cells load easier):
pat_ID,visit_date,fp_1,fp_2,fp_3,fp_4,fp_5,fp_6,fp_7,fp_8,ini_dx,
87103,2/23/2010,,3E,,,,,,,2,
87103,1/16/1983,2D,,,,,,,,2,
87103,6/20/2005,9E,,,,,,,,2,
87103,3/18/2019,3E,,,,,,,,2,
87103,4/23/1992,,2D,4E,,,,,,2,
87103,7/25/1997,,,,6H,,,,,2,
61291,1/12/2019,2B,,,,,,,,4,
61291,2/19/2016,,,,,,,2D,,4,
61291,6/17/1994,7G,,,,,,,,4,
61291,8/30/1994,,2C,,,,,,,4,
61291,9/30/1999,4H,,,,,,,,4,
61291,11/11/1992,4H,,,,,,,,4,
61291,4/19/1983,,,,9F,,,,,4,
61291,8/13/1995,3D,,,,,,,,4,
61760,10/4/2016,,5C,,,,,,,5,
61760,11/10/2019,1F,,,,,,,,5,
61760,11/7/1989,,2D,,,,,,,5,
61760,8/10/2015,5H,,,,,,,,5,
30098,3/19/1996,,8H,,,,,,,1,
30098,10/3/2003,,8H,,,,,,,1,
30098,7/24/1990,2G,1F,,,,,,,1,
30098,3/13/1987,,,,4C,,,,,1,
30098,10/12/2013,9F,,,,,,,,1,
30098,6/8/1984,,,,,,,,,1,
30098,6/20/2013,4D,,,,,,,,1,
30098,1/6/1990,3A,,,,,,,,1,
27754,7/11/1986,,,6E,,,,,,8,
27754,3/18/1995,,,5C,,,,,,8,
27754,9/18/1988,,,,1B,,,,,8,
27754,8/15/2000,5E,,,,,,,,8,
27754,11/26/1986,4A,,,,,,,,8,
27754,8/17/2013,,2B,,,,,,,8,
27754,5/31/2008,,8G,,,,,,,8,
I want to search the follow up variables(fp_1 thru fp_8) for the initial diagnosis (ini_dx) and then to return the 'visit_date' as well as that value. For example, the first row for patient 87103 does not match initial diagnosis '2' but the second row does (fp_1 has '2D') so return that 'visit_date' and '2D'. How could I do this?
OK, so here's a little program, below. What it does is identifies a diagnosis ("Addl_Dx") in the follow ups whose first character matches the initial diagnosis. Then, I follow that with a Sort NoDupKey. I used the Pat_ID and Addl_Dx as keys for my sort. The results are shown below. Notice that there are two Addl_Dx values for Pat_ID 61760.
DATA Want (KEEP=Pat_ID Ini_Dx Visit_Date2 Addl_Dx);
DROP _:;
SET Have;
ARRAY fp {*} fp_1 - fp_8;
FORMAT Visit_Date2 MMDDYYS10.;
DO _i = 1 TO DIM(fp);
IF INDEX(fp{_i}, STRIP(Ini_Dx)) THEN
DO;
Addl_Dx = fp{_i};
Visit_Date2 = Visit_Date;
END;
END;
IF MISSING(Addl_Dx) THEN
DELETE;
ELSE
OUTPUT;
RUN;
PROC SORT DATA=Want
OUT =Want_DeDup
NODUPKEY;
BY PAT_ID Addl_Dx;
RUN;
Results:
Jim
What do you want your output to look like?
Would it be one obs per PAT_ID containing the first qualifying record?. That's what the untested code below does.
data want;
set have;
by pat_id notsorted;
where trim(ini_dx)=:fp_1 or
trim(ini_dx)=:fp_2 or
trim(ini_dx)=:fp_3 or
trim(ini_dx)=:fp_4 or
trim(ini_dx)=:fp_5 or
trim(ini_dx)=:fp_6 or
trim(ini_dx)=:fp_7 or
trim(ini_dx)=:fp_8 ;
if first.pat_id;
run;
I say untested because your have not provided sample data in the form of a working data step.
This assumes that INI_DX and all the FP_ variables are read as character variables.
@mkeintz wrote:
What do you want your output to look like?
Would it be one obs per PAT_ID containing the first qualifying record?. That's what the untested code below does.
data want; set have; by pat_id notsorted; where trim(ini_dx)=:fp_1 or trim(ini_dx)=:fp_2 or trim(ini_dx)=:fp_3 or trim(ini_dx)=:fp_4 or trim(ini_dx)=:fp_5 or trim(ini_dx)=:fp_6 or trim(ini_dx)=:fp_7 or trim(ini_dx)=:fp_8 ; if first.pat_id; run;
I say untested because your have not provided sample data in the form of a working data step.
This assumes that INI_DX and all the FP_ variables are read as character variables.
I want it to create 2 more variables, one with the visit date and the other with the value ('2D' in my example). Then, I will keep the variables of interest (patient ID, initial dx, and the 2 new variables) and remove duplicate keys with NODUPKEY. Sorry about the data set. I thought the csv could be saved and imported into SAS. With code like this:
proc import datafile="Book1.csv" out=data dbms=csv replace; run; proc print data=work.data; run;
OK, so here's a little program, below. What it does is identifies a diagnosis ("Addl_Dx") in the follow ups whose first character matches the initial diagnosis. Then, I follow that with a Sort NoDupKey. I used the Pat_ID and Addl_Dx as keys for my sort. The results are shown below. Notice that there are two Addl_Dx values for Pat_ID 61760.
DATA Want (KEEP=Pat_ID Ini_Dx Visit_Date2 Addl_Dx);
DROP _:;
SET Have;
ARRAY fp {*} fp_1 - fp_8;
FORMAT Visit_Date2 MMDDYYS10.;
DO _i = 1 TO DIM(fp);
IF INDEX(fp{_i}, STRIP(Ini_Dx)) THEN
DO;
Addl_Dx = fp{_i};
Visit_Date2 = Visit_Date;
END;
END;
IF MISSING(Addl_Dx) THEN
DELETE;
ELSE
OUTPUT;
RUN;
PROC SORT DATA=Want
OUT =Want_DeDup
NODUPKEY;
BY PAT_ID Addl_Dx;
RUN;
Results:
Jim
Thank you for the detailed solution @jimbarbour ! I will be sure to include the data code for any future questions.
@mkeintz wrote:
you have not provided sample data in the form of a working data step.
What @mkeintz and @andreas_lds are talking about is something like the below, a working data step. We can take a working data step and post it into a SAS session in seconds. We can then start helping you immediately. If we have to grab a csv file, create a file on our server, then write a program or use a Proc Import to bring it in, that's tedious -- and we're answering dozens of questions like this a day. Just as sort of a courtesy to those who would like to help you, it's nice if you'd just give us a working data step.
Jim
DATA Have;
INFILE DATALINES DSD DLM=',';
INPUT
Pat_ID $
Visit_Date : MMDDYY10.
fp_1 $
fp_2 $
fp_3 $
fp_4 $
fp_5 $
fp_6 $
fp_7 $
fp_8 $
Ini_Dx $
;
FORMAT Visit_Date MMDDYYS10.;
DATALINES;
87103,2/23/2010,,3E,,,,,,,2,
87103,1/16/1983,2D,,,,,,,,2,
87103,6/20/2005,9E,,,,,,,,2,
87103,3/18/2019,3E,,,,,,,,2,
87103,4/23/1992,,2D,4E,,,,,,2,
87103,7/25/1997,,,,6H,,,,,2,
61291,1/12/2019,2B,,,,,,,,4,
61291,2/19/2016,,,,,,,2D,,4,
61291,6/17/1994,7G,,,,,,,,4,
61291,8/30/1994,,2C,,,,,,,4,
61291,9/30/1999,4H,,,,,,,,4,
61291,11/11/1992,4H,,,,,,,,4,
61291,4/19/1983,,,,9F,,,,,4,
61291,8/13/1995,3D,,,,,,,,4,
61760,10/4/2016,,5C,,,,,,,5,
61760,11/10/2019,1F,,,,,,,,5,
61760,11/7/1989,,2D,,,,,,,5,
61760,8/10/2015,5H,,,,,,,,5,
30098,3/19/1996,,8H,,,,,,,1,
30098,10/3/2003,,8H,,,,,,,1,
30098,7/24/1990,2G,1F,,,,,,,1,
30098,3/13/1987,,,,4C,,,,,1,
30098,10/12/2013,9F,,,,,,,,1,
30098,6/8/1984,,,,,,,,,1,
30098,6/20/2013,4D,,,,,,,,1,
30098,1/6/1990,3A,,,,,,,,1,
27754,7/11/1986,,,6E,,,,,,8,
27754,3/18/1995,,,5C,,,,,,8,
27754,9/18/1988,,,,1B,,,,,8,
27754,8/15/2000,5E,,,,,,,,8,
27754,11/26/1986,4A,,,,,,,,8,
27754,8/17/2013,,2B,,,,,,,8,
27754,5/31/2008,,8G,,,,,,,8,
;
RUN;
Please post the data in usable form. What exactly do you expect as result? The visit_date from the first obs (per pat_id)?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.