Re: Transpose then merge the two data sets

elias_2020 · Posted 12-11-2017 06:34 PM

data offenders;
   input  id   offence_DATE mmddyy10. sex$;
   format offence_DATE  mmddyy10.;
   datalines;
1 05/23/05  M 
1 08/01/09  M
2 04/18/01  F
2 06/10/12  F
3 01/01/11  M
;
run;


proc transpose data=offenders out=offenders1 (drop=_name_ _label_) prefix=offence_DATE;
 by id;
 var offence_DATE;
run; 
 


data HOSPITAL;
   input  id   ADMIT_date mmddyy10. ;
   format ADMIT_date mmddyy10.;
   datalines;
1 10/21/03      
1 06/29/05      
1 02/03/07 
1 09/21/08
1 08/17/09         
3 12/31/10      
3 01/02/13      
;
run;

proc transpose data=HOSPITAL out=HOSPITAL1 (drop=_name_ _label_) prefix=ADMIT_date;
 by id;
 var ADMIT_date;
run;

I tried the following code, but it doesn't work because I have a multiple admit_date

/*

data want;

merge offenders1 HOSPITAL1;

where i=1 to end(admit_date);

by id;

if offence_DATE1 < ADMIT_date_i < offence_DATE2 then flag_admit=1;

else flag_admit=0;

run;

*/

The result will be one row for each id such as

id offence_date1 offence_date2 admit_date1 admit_date2 admit_date3...

1

2

3

Could you please suggest for me the best method for doing it.

novinosrin · Posted 12-11-2017 06:57 PM

May i ask what does this where i=1 to end(admit_date); statement accomplish?

/*

data want;

merge offenders1 HOSPITAL1;

where i=1 to end(admit_date);

by id;

if offence_DATE1 < ADMIT_date_i < offence_DATE2 then flag_admit=1;

else flag_admit=0;

run;

*/

The result will be one row for each id such as

id offence_date1 offence_date2 admit_date1 admit_date2 admit_date3...

1

2

3

DaveBirch · Posted 12-11-2017 07:48 PM

I am also quite mystified as to what you're aiming to achieve with the statement:

where i=1 to end(admit_date);

The variable i doesn't appear in either of your input datasets. My guess is you're trying to analyse the correspondence between offence_date and offence_date using array processing, but don't have an array statement. In any case, array are not good at handling highly variable data (you need to define a fixed number of array elements in the datastep).

It is a good idea when asking for technical assistance in a forum like this, to step back and describe your problem in business terms rather than in technical details of the approach you're first attempting. This allows others to offer solutions that may be better able to solve your underlying problem.

If you are trying to analyse the correspondence between offence_date and offence_date, then I'd suggest renaming, or duplicating the date variables (to event_date ?), merging by id event_date, and maybe using lag functions to assist in identifying the correspondence between offence and hospital admission.

Hope this helps.

elias_2020 · Posted 12-11-2017 07:52 PM

Could you please show me how I can do it.

DaveBirch · Posted 12-11-2017 10:28 PM

You could try something like this:

proc sort data=offenders;
by id offence_date;
run;
data offenders1;
set offenders;
by id offence_date;
prev_offence_date = lag(offence_date);
if first.id then delete;
/* assuming only interested in offenders with multiple offences */
/* otherwise we may need the following line and others */
/*if first.id then prev_offence_date = .;*/
run;

proc sort data=hospital;
by id admit_date;
run;

/* SQL handles M:N instances better than data step MERGE */
proc sql noprint;
create table want1 as
select id, prev_offence_date, offence_date,
admit_date, sex
from offenders1 t1
left join hospital t2
on (t1.id eq t2.id)
where (t1.prev_offence_date le t2.admit_date lt t1.offence_date)
order by t1.id, t1.offence_date, t2.admit_date;
quit;

You now have a dataset with a row for every hospital admission that occurs "between" offences for offenders. This is in many ways easier to analyse (and more flexible) than a more "flattened" structure. If you do desire to flatten it, there are many business issues still to be resolved to determine the best approach.

Cheers

ballardw · Posted 12-11-2017 06:58 PM

You might describe what you are attempting to accomplish.

Combining two data sets has many approaches depending on what the desired result may be.

Hopefully your result is small enough that you could post a third data set showing what the result should look like and describe the purpose.

Such as should anything special be done with records with a hospital date but not and offense date?

Offense date without hospital date? What about a hospital date between multiple offenses?

Since it appears that you may be wanting to compare multiple values of offense and admin dates then you likely want one array to hold each of those. And since it would not make sense to have a single flag to hold all of the comparisons you likely need another array to hold the flags. But to make a more concrete example we really should see what the full set of comparison results would look like.

elias_2020 · Posted 12-11-2017 07:49 PM

sorry, but I'm trying to make as loop

after transpose the data from multiple row for each id to multiple column (one row) for each id.

I have such as ADMIT_date1 ADMIT_date2 ADMIT_date3 ....

I would like to know if the offender has any admission to the hospital between the first offence and the second offence.

if YES, when ? how many times ?

Thanks

novinosrin · Posted 12-11-2017 08:29 PM

Please give us(the community-you included) a sample data set of your input and a sample data set of your wanted output. We can help you with the code you may require

PS give a brief description of the logic for us to implement

elias_2020 · Posted 12-11-2017 09:34 PM

data offenders;
   input  id   offence_DATE mmddyy10. sex$;
   format offence_DATE  mmddyy10.;
   datalines;
1 05/23/05  M 
1 08/01/09  M
2 04/18/01  F
2 06/10/12  F
3 01/01/11  M
;
run;


proc transpose data=offenders out=offenders1 (drop=_name_ _label_) prefix=offence_DATE;
 by id;
 var offence_DATE;
run; 
 


data HOSPITAL;
   input  id   ADMIT_date mmddyy10. ;
   format ADMIT_date mmddyy10.;
   datalines;
1 10/21/03      
1 06/29/05      
1 02/03/07 
1 12/28/09         
3 12/31/10      
3 01/02/13      
;
run;

proc transpose data=HOSPITAL out=HOSPITAL1 (drop=_name_ _label_) prefix=ADMIT_date;
 by id;
 var ADMIT_date;
run;

the result shop be look like as below:

id offence_DATE1 offence_DATE2 ADMIT_date1 ADMIT_date2 ADMIT_date3 contact between offences #

1 05/23/05 08/01/09 10/21/03 06/29/05 12/28/09 yes 1

2 04/18/01 06/10/11 . . . no 0

3 01/01/11 . 12/31/10 01/02/13 . yes 1

/*** # means the count number of admission to the hospital between the two offences if the offender has two or after the first offence when the offender didn't have the second offence*/

Thank you

Tom · Posted 12-12-2017 02:17 AM

It is probably going to be easier to combine the files in their original format instead of the transposed format. Perhaps something like this?

data want ;
  row+1;
  length id date_no 8;
  set offenders(in=in1 rename=(offence_date=DATE))
      hospital (in=in2 rename=(admit_date=DATE))
  ;
  by id date ;
  if in1 then offence_date=date ;
  if in2 then admit_date=date ;
  date_no + first.date;
  if first.id then date_no=1 ;
  format date admit_date offence_date yymmdd10. ;
run;

So what does your new FLAG variable represent? Which rows do you want to FLAG for this sample data?

Registration is open