Solved: Reorganizing data within subjects in a datafile

pronabesh · Posted 11-25-2013 04:37 PM

Hello All,

I need to reorganize a data set such that observations within subjects can be reordered by date.

The data set looks like this:

Subject_ID Follow_Up_Time Interview_Date Previous_Interview_Date Adverse_Event_Report_Date

1 1 Feb10 Jan10 Nov10

1 2 Mar10 Feb10 .

1 3 Dec10 Nov10 .

2 1 Jun10 May10 May10

2 2 Jul10 Jun10 Jan11

2 3 Feb11 Jan11 .

The last variable in the data set is not correctly assigned to its corresponding time period (i.e, Adverse_Event_Report_Date should be within the period between Interview_Date and Previous_Interview_Date). For example, in the first case line, the Adverse_Event_Report_Date variable should be missing and should appear in the third row for Subject_ID = 1.

Please advice on the data steps required to restructure this data file.

Thank you in advance.

Sincerely,

Pronabesh

art297 · Posted 11-25-2013 06:12 PM

I, too, don't know if I correctly understand what you want, but the following is a data step method for doing something close to what Haikuo suggested. It differs, slightly, as it can assign multiple dates for a subject:

data have;

informat Interview_Date Previous_Interview_Date

Adverse_Event_Report_Date date9.;

format Interview_Date Previous_Interview_Date

Adverse_Event_Report_Date date9.;

input Subject_ID Follow_Up_Time Interview_Date

Previous_Interview_Date Adverse_Event_Report_Date;

cards;

1 1 10Feb2013 10Jan2013 10Nov2013

1 2 10Mar2013 10Feb2013 .

1 3 10Dec2013 10Nov2013 .

2 1 10Jun2013 10May2013 10May2013

2 2 10Jul2013 10Jun2013 11Jan2013

2 3 11Feb2013 11Jan2013 .

;

data want (drop=_:);

set have (rename=(Adverse_Event_Report_Date=_date));

retain _holddate;

format Adverse_Event_Report_Date date9.;

by Subject_ID;

if first.Subject_ID then call missing(_holddate);

if not missing(_date) then _holddate=_date;

if not missing(_holddate) and

(Previous_Interview_Date le _holddate le Interview_Date)

then Adverse_Event_Report_Date=_holddate;

run;

View solution in original post

Haikuo · Posted 11-25-2013 05:28 PM

It would be ideal that you have an sample output for your sample input data, so here is only my guess, if it is not meeting your requirement, then please provide your sample output:

data have;

input Subject_ID Follow_Up_Time (Interview_Date Previous_Interview_Date Adverse_Event_Report_Date) (:monyy5.);

format Interview_Date Previous_Interview_Date Adverse_Event_Report_Date monyy5.;

cards;

1 1 Feb10 Jan10 Nov10

1 2 Mar10 Feb10 .

1 3 Dec10 Nov10 .

2 1 Jun10 May10 May10

2 2 Jul10 Jun10 Jan11

2 3 Feb11 Jan11 .

;

proc sql;

create table want as

select Subject_ID, Follow_Up_Time, Interview_Date, Previous_Interview_Date,

case when Previous_Interview_Date <= Adverse_Event_Report_Date <= Interview_Date

then Adverse_Event_Report_Date

when not (Previous_Interview_Date <= Adverse_Event_Report_Date <= Interview_Date )

and Follow_Up_Time = max(Follow_Up_Time) then max(Adverse_Event_Report_Date)

else . end as Adverse_Event_Report_Date format=monyy5.

from have

group by Subject_ID

order by Subject_ID,Follow_Up_Time

;

quit;

Haikuo

pronabesh · Posted 11-26-2013 09:34 AM

Thank You Haikuo. This is very helpful.

art297 · Posted 11-25-2013 06:12 PM

I, too, don't know if I correctly understand what you want, but the following is a data step method for doing something close to what Haikuo suggested. It differs, slightly, as it can assign multiple dates for a subject:

data have;

informat Interview_Date Previous_Interview_Date

Adverse_Event_Report_Date date9.;

format Interview_Date Previous_Interview_Date

Adverse_Event_Report_Date date9.;

input Subject_ID Follow_Up_Time Interview_Date

Previous_Interview_Date Adverse_Event_Report_Date;

cards;

1 1 10Feb2013 10Jan2013 10Nov2013

1 2 10Mar2013 10Feb2013 .

1 3 10Dec2013 10Nov2013 .

2 1 10Jun2013 10May2013 10May2013

2 2 10Jul2013 10Jun2013 11Jan2013

2 3 11Feb2013 11Jan2013 .

;

data want (drop=_:);

set have (rename=(Adverse_Event_Report_Date=_date));

retain _holddate;

format Adverse_Event_Report_Date date9.;

by Subject_ID;

if first.Subject_ID then call missing(_holddate);

if not missing(_date) then _holddate=_date;

if not missing(_holddate) and

(Previous_Interview_Date le _holddate le Interview_Date)

then Adverse_Event_Report_Date=_holddate;

run;

pronabesh · Posted 11-26-2013 09:34 AM

Thank You Arthur. This is exactly what I was looking for.

pronabesh · Posted 12-03-2013 04:32 PM

I thought I had the solution to this, but my problem is probably a little more complex. To demonstrate, I have listed one example from a single individual with multiple assessment points and described the variables below.

The purpose of this is to test the agreement between self_reported hospitalization (Hospitalization_date) and hospitalization from medical records (Admission_date). The assessment_date is the date of self reported assessment and lag_fu is the date of prior self reported assessment.

Now there are a couple of problems:

1. The admission_date variable is not properly merged to the corresponding period between the lag_fu and assessment_date

2. Each assessment date per person is listed thrice as there can be a max of three hospitalization records in that time period.

3. If there are multiple admission_date(s) at each assessment, all the dates need to be listed as shown in the output data set.

Your help is very much appreciated!

The input and output data set for one individual is:

Input:

Hospitalization_date	Record	Assessment_date	Admission_date	lag_fu
.	1	04NOV10	08DEC10	04OCT10
.	2	04NOV10	20JAN11	04OCT10
.	3	04NOV10	16DEC11	04OCT10
15DEC10	4	03JAN11	19DEC11	04NOV10
.	5	03JAN11	.	04NOV10
.	6	03JAN11	.	04NOV10
15JAN11	7	01MAR11	.	03JAN11
.	8	01MAR11	.	03JAN11
.	9	01MAR11	.	03JAN11
.	10	27APR11	.	01MAR11
.	11	27APR11	.	01MAR11
.	12	27APR11	.	01MAR11
.	13	29JUL11	.	27APR11
.	14	29JUL11	.	27APR11
.	15	29JUL11	.	27APR11
.	16	06OCT11	.	29JUL11
.	17	06OCT11	.	29JUL11
.	18	06OCT11	.	29JUL11
.	19	27MAR12	.	06OCT11
.	20	27MAR12	.	06OCT11
.	21	27MAR12	.	06OCT11
.	22	16MAY12	.	27MAR12
.	23	16MAY12	.	27MAR12
.	24	16MAY12	.	27MAR12

Output:

Hospitalization_date	Record	Assessment_date	Admission_date	lag_fu
.	1	04NOV10		04OCT10
.	2	04NOV10		04OCT10
.	3	04NOV10		04OCT10
08DEC10	4	03JAN11	08DEC10	04NOV10
.	5	03JAN11		04NOV10
.	6	03JAN11	.	04NOV10
20JAN11	7	01MAR11	20JAN11.	03JAN11
.	8	01MAR11	.	03JAN11
.	9	01MAR11	.	03JAN11
.	10	27APR11	.	01MAR11
.	11	27APR11	.	01MAR11
.	12	27APR11	.	01MAR11
.	13	29JUL11	.	27APR11
.	14	29JUL11	.	27APR11
.	15	29JUL11	.	27APR11
.	16	06OCT11	.	29JUL11
.	17	06OCT11	.	29JUL11
.	18	06OCT11	.	29JUL11
.	19	27MAR12	16DEC11	06OCT11
.	20	27MAR12	19DEC11	06OCT11
.	21	27MAR12	.	06OCT11
.	22	16MAY12	.	27MAR12
.	23	16MAY12	.	27MAR12
.	24	16MAY12	.	27MAR12

art297 · Posted 12-03-2013 08:23 PM

Pronabesh: totally un-cool removing a correct rating because you realized that you had a different problem than the one that you originally posted. In the future, I'd really hope that you simply create a new discussion.

That said, while I may have misinterpreted what you are trying to do, the following appears to provide your posted desired output:

data want (drop=_:);

array _dates(99) _temporary_;

do until (last.patient);

set have;

by patient;

if first.patient then do;

call missing(of _dates(*));

_counter=0;

end;

if not missing(Admission_date) then do;

_counter+1;

_dates(_counter)=Admission_date;

call missing(Admission_date);

end;

do until (last.patient);

set have (drop=Admission_date);

by patient;

if _counter gt 0 then do;

do _i=1 to _counter;

if (Assessment_date le _dates(_i) le lag_fu) or

(lag_fu le _dates(_i) le Assessment_date) then do;

Admission_date=_dates(_i);

call missing (_dates(_i));

_i=_counter;

end;

if not missing(Hospitalization_date) and not missing(Admission_date)

then Hospitalization_date=Admission_date;

else call missing(Hospitalization_date);

output;

call missing(Admission_date);

end;

run;

pronabesh · Posted 12-03-2013 08:32 PM

Hello Arthur,

Thank you for taking the time to help out on this. Sincere apologies for removing the correct rating. I am new to SAS communities and was under the false impression that a correct rating means that the discussion is closed. Will amend my mistake.

Once again your help and timely response is much appreciated!

art297 · Posted 12-03-2013 08:41 PM

Not a problem and I was pleased to see that you found my last post so quickly. Yes, a correct classification indicates that a question has been answered, however, if one discovers that the problem is more complex than originally posted, a new question/discussion should be started.

pronabesh · Posted 12-03-2013 08:46 PM

Thank you!

This also answers my question posted in the other thread about merging.

Best,

Pronabesh

art297 · Posted 12-03-2013 09:04 PM

Glad to hear that both of your posted question have been answered. I am all too familiar with the anguish involved in learning the expected protocol of a new site and completely understand. Hope you continue to post here.

pronabesh · Posted 12-04-2013 11:13 AM

Thank You Arthur.

This code worked!

Reorganizing data within subjects in a datafile

Re: Reorganizing data within subjects in a datafile

Re: Reorganizing data within subjects in a datafile

Re: Reorganizing data within subjects in a datafile

Re: Reorganizing data within subjects in a datafile

Re: Reorganizing data within subjects in a datafile

Re: Reorganizing data within subjects in a datafile

Re: Reorganizing data within subjects in a datafile

Re: Reorganizing data within subjects in a datafile

Re: Reorganizing data within subjects in a datafile

Re: Reorganizing data within subjects in a datafile

Re: Reorganizing data within subjects in a datafile

Re: Reorganizing data within subjects in a datafile

Registration is open