Hello All,
I need to reorganize a data set such that observations within subjects can be reordered by date.
The data set looks like this:
Subject_ID Follow_Up_Time Interview_Date Previous_Interview_Date Adverse_Event_Report_Date
1 1 Feb10 Jan10 Nov10
1 2 Mar10 Feb10 .
1 3 Dec10 Nov10 .
2 1 Jun10 May10 May10
2 2 Jul10 Jun10 Jan11
2 3 Feb11 Jan11 .
The last variable in the data set is not correctly assigned to its corresponding time period (i.e, Adverse_Event_Report_Date should be within the period between Interview_Date and Previous_Interview_Date). For example, in the first case line, the Adverse_Event_Report_Date variable should be missing and should appear in the third row for Subject_ID = 1.
Please advice on the data steps required to restructure this data file.
Thank you in advance.
Sincerely,
Pronabesh
I, too, don't know if I correctly understand what you want, but the following is a data step method for doing something close to what Haikuo suggested. It differs, slightly, as it can assign multiple dates for a subject:
data have;
informat Interview_Date Previous_Interview_Date
Adverse_Event_Report_Date date9.;
format Interview_Date Previous_Interview_Date
Adverse_Event_Report_Date date9.;
input Subject_ID Follow_Up_Time Interview_Date
Previous_Interview_Date Adverse_Event_Report_Date;
cards;
1 1 10Feb2013 10Jan2013 10Nov2013
1 2 10Mar2013 10Feb2013 .
1 3 10Dec2013 10Nov2013 .
2 1 10Jun2013 10May2013 10May2013
2 2 10Jul2013 10Jun2013 11Jan2013
2 3 11Feb2013 11Jan2013 .
;
data want (drop=_:);
set have (rename=(Adverse_Event_Report_Date=_date));
retain _holddate;
format Adverse_Event_Report_Date date9.;
by Subject_ID;
if first.Subject_ID then call missing(_holddate);
if not missing(_date) then _holddate=_date;
if not missing(_holddate) and
(Previous_Interview_Date le _holddate le Interview_Date)
then Adverse_Event_Report_Date=_holddate;
run;
It would be ideal that you have an sample output for your sample input data, so here is only my guess, if it is not meeting your requirement, then please provide your sample output:
data have;
input Subject_ID Follow_Up_Time (Interview_Date Previous_Interview_Date Adverse_Event_Report_Date) (:monyy5.);
format Interview_Date Previous_Interview_Date Adverse_Event_Report_Date monyy5.;
cards;
1 1 Feb10 Jan10 Nov10
1 2 Mar10 Feb10 .
1 3 Dec10 Nov10 .
2 1 Jun10 May10 May10
2 2 Jul10 Jun10 Jan11
2 3 Feb11 Jan11 .
;
proc sql;
create table want as
select Subject_ID, Follow_Up_Time, Interview_Date, Previous_Interview_Date,
case when Previous_Interview_Date <= Adverse_Event_Report_Date <= Interview_Date
then Adverse_Event_Report_Date
when not (Previous_Interview_Date <= Adverse_Event_Report_Date <= Interview_Date )
and Follow_Up_Time = max(Follow_Up_Time) then max(Adverse_Event_Report_Date)
else . end as Adverse_Event_Report_Date format=monyy5.
from have
group by Subject_ID
order by Subject_ID,Follow_Up_Time
;
quit;
Haikuo
Thank You Haikuo. This is very helpful.
I, too, don't know if I correctly understand what you want, but the following is a data step method for doing something close to what Haikuo suggested. It differs, slightly, as it can assign multiple dates for a subject:
data have;
informat Interview_Date Previous_Interview_Date
Adverse_Event_Report_Date date9.;
format Interview_Date Previous_Interview_Date
Adverse_Event_Report_Date date9.;
input Subject_ID Follow_Up_Time Interview_Date
Previous_Interview_Date Adverse_Event_Report_Date;
cards;
1 1 10Feb2013 10Jan2013 10Nov2013
1 2 10Mar2013 10Feb2013 .
1 3 10Dec2013 10Nov2013 .
2 1 10Jun2013 10May2013 10May2013
2 2 10Jul2013 10Jun2013 11Jan2013
2 3 11Feb2013 11Jan2013 .
;
data want (drop=_:);
set have (rename=(Adverse_Event_Report_Date=_date));
retain _holddate;
format Adverse_Event_Report_Date date9.;
by Subject_ID;
if first.Subject_ID then call missing(_holddate);
if not missing(_date) then _holddate=_date;
if not missing(_holddate) and
(Previous_Interview_Date le _holddate le Interview_Date)
then Adverse_Event_Report_Date=_holddate;
run;
Thank You Arthur. This is exactly what I was looking for.
I thought I had the solution to this, but my problem is probably a little more complex. To demonstrate, I have listed one example from a single individual with multiple assessment points and described the variables below.
The purpose of this is to test the agreement between self_reported hospitalization (Hospitalization_date) and hospitalization from medical records (Admission_date). The assessment_date is the date of self reported assessment and lag_fu is the date of prior self reported assessment.
Now there are a couple of problems:
1. The admission_date variable is not properly merged to the corresponding period between the lag_fu and assessment_date
2. Each assessment date per person is listed thrice as there can be a max of three hospitalization records in that time period.
3. If there are multiple admission_date(s) at each assessment, all the dates need to be listed as shown in the output data set.
Your help is very much appreciated!
The input and output data set for one individual is:
Input:
Hospitalization_date | Record | Assessment_date | Admission_date | lag_fu |
. | 1 | 04NOV10 | 08DEC10 | 04OCT10 |
. | 2 | 04NOV10 | 20JAN11 | 04OCT10 |
. | 3 | 04NOV10 | 16DEC11 | 04OCT10 |
15DEC10 | 4 | 03JAN11 | 19DEC11 | 04NOV10 |
. | 5 | 03JAN11 | . | 04NOV10 |
. | 6 | 03JAN11 | . | 04NOV10 |
15JAN11 | 7 | 01MAR11 | . | 03JAN11 |
. | 8 | 01MAR11 | . | 03JAN11 |
. | 9 | 01MAR11 | . | 03JAN11 |
. | 10 | 27APR11 | . | 01MAR11 |
. | 11 | 27APR11 | . | 01MAR11 |
. | 12 | 27APR11 | . | 01MAR11 |
. | 13 | 29JUL11 | . | 27APR11 |
. | 14 | 29JUL11 | . | 27APR11 |
. | 15 | 29JUL11 | . | 27APR11 |
. | 16 | 06OCT11 | . | 29JUL11 |
. | 17 | 06OCT11 | . | 29JUL11 |
. | 18 | 06OCT11 | . | 29JUL11 |
. | 19 | 27MAR12 | . | 06OCT11 |
. | 20 | 27MAR12 | . | 06OCT11 |
. | 21 | 27MAR12 | . | 06OCT11 |
. | 22 | 16MAY12 | . | 27MAR12 |
. | 23 | 16MAY12 | . | 27MAR12 |
. | 24 | 16MAY12 | . | 27MAR12 |
Output:
Hospitalization_date | Record | Assessment_date | Admission_date | lag_fu |
. | 1 | 04NOV10 |
| 04OCT10 |
. | 2 | 04NOV10 |
| 04OCT10 |
. | 3 | 04NOV10 |
| 04OCT10 |
08DEC10 | 4 | 03JAN11 | 08DEC10 | 04NOV10 |
. | 5 | 03JAN11 |
| 04NOV10 |
. | 6 | 03JAN11 | . | 04NOV10 |
20JAN11 | 7 | 01MAR11 | 20JAN11. | 03JAN11 |
. | 8 | 01MAR11 | . | 03JAN11 |
. | 9 | 01MAR11 | . | 03JAN11 |
. | 10 | 27APR11 | . | 01MAR11 |
. | 11 | 27APR11 | . | 01MAR11 |
. | 12 | 27APR11 | . | 01MAR11 |
. | 13 | 29JUL11 | . | 27APR11 |
. | 14 | 29JUL11 | . | 27APR11 |
. | 15 | 29JUL11 | . | 27APR11 |
. | 16 | 06OCT11 | . | 29JUL11 |
. | 17 | 06OCT11 | . | 29JUL11 |
. | 18 | 06OCT11 | . | 29JUL11 |
. | 19 | 27MAR12 | 16DEC11 | 06OCT11 |
. | 20 | 27MAR12 | 19DEC11 | 06OCT11 |
. | 21 | 27MAR12 | . | 06OCT11 |
. | 22 | 16MAY12 | . | 27MAR12 |
. | 23 | 16MAY12 | . | 27MAR12 |
. | 24 | 16MAY12 | . | 27MAR12 |
Pronabesh: totally un-cool removing a correct rating because you realized that you had a different problem than the one that you originally posted. In the future, I'd really hope that you simply create a new discussion.
That said, while I may have misinterpreted what you are trying to do, the following appears to provide your posted desired output:
data want (drop=_:);
array _dates(99) _temporary_;
do until (last.patient);
set have;
by patient;
if first.patient then do;
call missing(of _dates(*));
_counter=0;
end;
if not missing(Admission_date) then do;
_counter+1;
_dates(_counter)=Admission_date;
call missing(Admission_date);
end;
end;
do until (last.patient);
set have (drop=Admission_date);
by patient;
if _counter gt 0 then do;
do _i=1 to _counter;
if (Assessment_date le _dates(_i) le lag_fu) or
(lag_fu le _dates(_i) le Assessment_date) then do;
Admission_date=_dates(_i);
call missing (_dates(_i));
_i=_counter;
end;
end;
end;
if not missing(Hospitalization_date) and not missing(Admission_date)
then Hospitalization_date=Admission_date;
else call missing(Hospitalization_date);
output;
call missing(Admission_date);
end;
run;
Hello Arthur,
Thank you for taking the time to help out on this. Sincere apologies for removing the correct rating. I am new to SAS communities and was under the false impression that a correct rating means that the discussion is closed. Will amend my mistake.
Once again your help and timely response is much appreciated!
Not a problem and I was pleased to see that you found my last post so quickly. Yes, a correct classification indicates that a question has been answered, however, if one discovers that the problem is more complex than originally posted, a new question/discussion should be started.
Thank you!
This also answers my question posted in the other thread about merging.
Best,
Pronabesh
Glad to hear that both of your posted question have been answered. I am all too familiar with the anguish involved in learning the expected protocol of a new site and completely understand. Hope you continue to post here.
Thank You Arthur.
This code worked!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.