I have patient data in SAS that looks like this (I used CSV to make empty cells load easier):
pat_ID,visit_date,fp_1,fp_2,fp_3,fp_4,fp_5,fp_6,fp_7,fp_8,ini_dx,
87103,2/23/2010,,3E,,,,,,,2,
87103,1/16/1983,2D,,,,,,,,2,
87103,6/20/2005,9E,,,,,,,,2,
87103,3/18/2019,3E,,,,,,,,2,
87103,4/23/1992,,2D,4E,,,,,,2,
87103,7/25/1997,,,,6H,,,,,2,
61291,1/12/2019,2B,,,,,,,,4,
61291,2/19/2016,,,,,,,2D,,4,
61291,6/17/1994,7G,,,,,,,,4,
61291,8/30/1994,,2C,,,,,,,4,
61291,9/30/1999,4H,,,,,,,,4,
61291,11/11/1992,4H,,,,,,,,4,
61291,4/19/1983,,,,9F,,,,,4,
61291,8/13/1995,3D,,,,,,,,4,
61760,10/4/2016,,5C,,,,,,,5,
61760,11/10/2019,1F,,,,,,,,5,
61760,11/7/1989,,2D,,,,,,,5,
61760,8/10/2015,5H,,,,,,,,5,
30098,3/19/1996,,8H,,,,,,,1,
30098,10/3/2003,,8H,,,,,,,1,
30098,7/24/1990,2G,1F,,,,,,,1,
30098,3/13/1987,,,,4C,,,,,1,
30098,10/12/2013,9F,,,,,,,,1,
30098,6/8/1984,,,,,,,,,1,
30098,6/20/2013,4D,,,,,,,,1,
30098,1/6/1990,3A,,,,,,,,1,
27754,7/11/1986,,,6E,,,,,,8,
27754,3/18/1995,,,5C,,,,,,8,
27754,9/18/1988,,,,1B,,,,,8,
27754,8/15/2000,5E,,,,,,,,8,
27754,11/26/1986,4A,,,,,,,,8,
27754,8/17/2013,,2B,,,,,,,8,
27754,5/31/2008,,8G,,,,,,,8,
I want to search the follow up variables(fp_1 thru fp_8) for the initial diagnosis (ini_dx) and then to return the 'visit_date' as well as that value. For example, the first row for patient 87103 does not match initial diagnosis '2' but the second row does (fp_1 has '2D') so return that 'visit_date' and '2D'. How could I do this?
OK, so here's a little program, below. What it does is identifies a diagnosis ("Addl_Dx") in the follow ups whose first character matches the initial diagnosis. Then, I follow that with a Sort NoDupKey. I used the Pat_ID and Addl_Dx as keys for my sort. The results are shown below. Notice that there are two Addl_Dx values for Pat_ID 61760.
DATA Want (KEEP=Pat_ID Ini_Dx Visit_Date2 Addl_Dx);
DROP _:;
SET Have;
ARRAY fp {*} fp_1 - fp_8;
FORMAT Visit_Date2 MMDDYYS10.;
DO _i = 1 TO DIM(fp);
IF INDEX(fp{_i}, STRIP(Ini_Dx)) THEN
DO;
Addl_Dx = fp{_i};
Visit_Date2 = Visit_Date;
END;
END;
IF MISSING(Addl_Dx) THEN
DELETE;
ELSE
OUTPUT;
RUN;
PROC SORT DATA=Want
OUT =Want_DeDup
NODUPKEY;
BY PAT_ID Addl_Dx;
RUN;
Results:
Jim
What do you want your output to look like?
Would it be one obs per PAT_ID containing the first qualifying record?. That's what the untested code below does.
data want;
set have;
by pat_id notsorted;
where trim(ini_dx)=:fp_1 or
trim(ini_dx)=:fp_2 or
trim(ini_dx)=:fp_3 or
trim(ini_dx)=:fp_4 or
trim(ini_dx)=:fp_5 or
trim(ini_dx)=:fp_6 or
trim(ini_dx)=:fp_7 or
trim(ini_dx)=:fp_8 ;
if first.pat_id;
run;
I say untested because your have not provided sample data in the form of a working data step.
This assumes that INI_DX and all the FP_ variables are read as character variables.
@mkeintz wrote:
What do you want your output to look like?
Would it be one obs per PAT_ID containing the first qualifying record?. That's what the untested code below does.
data want; set have; by pat_id notsorted; where trim(ini_dx)=:fp_1 or trim(ini_dx)=:fp_2 or trim(ini_dx)=:fp_3 or trim(ini_dx)=:fp_4 or trim(ini_dx)=:fp_5 or trim(ini_dx)=:fp_6 or trim(ini_dx)=:fp_7 or trim(ini_dx)=:fp_8 ; if first.pat_id; run;I say untested because your have not provided sample data in the form of a working data step.
This assumes that INI_DX and all the FP_ variables are read as character variables.
I want it to create 2 more variables, one with the visit date and the other with the value ('2D' in my example). Then, I will keep the variables of interest (patient ID, initial dx, and the 2 new variables) and remove duplicate keys with NODUPKEY. Sorry about the data set. I thought the csv could be saved and imported into SAS. With code like this:
proc import datafile="Book1.csv"
out=data
dbms=csv
replace;
run;
proc print data=work.data;
run;
OK, so here's a little program, below. What it does is identifies a diagnosis ("Addl_Dx") in the follow ups whose first character matches the initial diagnosis. Then, I follow that with a Sort NoDupKey. I used the Pat_ID and Addl_Dx as keys for my sort. The results are shown below. Notice that there are two Addl_Dx values for Pat_ID 61760.
DATA Want (KEEP=Pat_ID Ini_Dx Visit_Date2 Addl_Dx);
DROP _:;
SET Have;
ARRAY fp {*} fp_1 - fp_8;
FORMAT Visit_Date2 MMDDYYS10.;
DO _i = 1 TO DIM(fp);
IF INDEX(fp{_i}, STRIP(Ini_Dx)) THEN
DO;
Addl_Dx = fp{_i};
Visit_Date2 = Visit_Date;
END;
END;
IF MISSING(Addl_Dx) THEN
DELETE;
ELSE
OUTPUT;
RUN;
PROC SORT DATA=Want
OUT =Want_DeDup
NODUPKEY;
BY PAT_ID Addl_Dx;
RUN;
Results:
Jim
Thank you for the detailed solution @jimbarbour ! I will be sure to include the data code for any future questions.
@mkeintz wrote:
you have not provided sample data in the form of a working data step.
What @mkeintz and @andreas_lds are talking about is something like the below, a working data step. We can take a working data step and post it into a SAS session in seconds. We can then start helping you immediately. If we have to grab a csv file, create a file on our server, then write a program or use a Proc Import to bring it in, that's tedious -- and we're answering dozens of questions like this a day. Just as sort of a courtesy to those who would like to help you, it's nice if you'd just give us a working data step.
Jim
DATA Have;
INFILE DATALINES DSD DLM=',';
INPUT
Pat_ID $
Visit_Date : MMDDYY10.
fp_1 $
fp_2 $
fp_3 $
fp_4 $
fp_5 $
fp_6 $
fp_7 $
fp_8 $
Ini_Dx $
;
FORMAT Visit_Date MMDDYYS10.;
DATALINES;
87103,2/23/2010,,3E,,,,,,,2,
87103,1/16/1983,2D,,,,,,,,2,
87103,6/20/2005,9E,,,,,,,,2,
87103,3/18/2019,3E,,,,,,,,2,
87103,4/23/1992,,2D,4E,,,,,,2,
87103,7/25/1997,,,,6H,,,,,2,
61291,1/12/2019,2B,,,,,,,,4,
61291,2/19/2016,,,,,,,2D,,4,
61291,6/17/1994,7G,,,,,,,,4,
61291,8/30/1994,,2C,,,,,,,4,
61291,9/30/1999,4H,,,,,,,,4,
61291,11/11/1992,4H,,,,,,,,4,
61291,4/19/1983,,,,9F,,,,,4,
61291,8/13/1995,3D,,,,,,,,4,
61760,10/4/2016,,5C,,,,,,,5,
61760,11/10/2019,1F,,,,,,,,5,
61760,11/7/1989,,2D,,,,,,,5,
61760,8/10/2015,5H,,,,,,,,5,
30098,3/19/1996,,8H,,,,,,,1,
30098,10/3/2003,,8H,,,,,,,1,
30098,7/24/1990,2G,1F,,,,,,,1,
30098,3/13/1987,,,,4C,,,,,1,
30098,10/12/2013,9F,,,,,,,,1,
30098,6/8/1984,,,,,,,,,1,
30098,6/20/2013,4D,,,,,,,,1,
30098,1/6/1990,3A,,,,,,,,1,
27754,7/11/1986,,,6E,,,,,,8,
27754,3/18/1995,,,5C,,,,,,8,
27754,9/18/1988,,,,1B,,,,,8,
27754,8/15/2000,5E,,,,,,,,8,
27754,11/26/1986,4A,,,,,,,,8,
27754,8/17/2013,,2B,,,,,,,8,
27754,5/31/2008,,8G,,,,,,,8,
;
RUN;
Please post the data in usable form. What exactly do you expect as result? The visit_date from the first obs (per pat_id)?
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.