Solved: Re: Merging Data sets

jacksonan123 · Posted 10-09-2017 02:26 PM

I have a data set1:

ID TIME PRED    new_study
1 0 0    1
1 0.25    1.1063    1
1    0.5 5.1534    1
2    0 0    1
2    0.25    1.0462 1
2 0.5    4.8104    1
1 0    0    2
1    0.25 1.1063    2
1 0.5 5.1534 2
2    0    0 2
2    0.25 1.0462    2
2    0.5    4.8104 2

and a data set2

id time ratio
1    0    0
1 0.25 0.937629938
1 0.5 0.925815966
2    0    0
2    0.25    0.991493022
2    0.5    0.991830201

I would like to merge these data sets into a data set final (example below) in which the same ratio values from dataset2 are repeated for each subject in each new_study

ID TIME PRED    new_study    ratio
1    0    0 1    0
1 0.25 1.1063    1 0.937629938
1 0.5 5.1534 1 0.925815966
2 0 0 1 0
2 0.25 1.0462 1 0.991493022
2    0.5    4.8104 1    0.991830201
1 0    0    2    0
1    0.25 1.1063    2    0.937629938
1 0.5 5.1534 2    0.925815966
2 0 0    2 0
2 0.25 1.0462    2 0.991493022
2    0.5 4.8104 2    0.991830201

I have tried some merges but each time the study and ratios get sorted which is not what I want. I need the newstudy and ratio values not to be sorted. I have included my simple merge code.

proc sort data=data1; by id time;run;
proc sort data=data2; by id time; run;


Data final;
merge data1 data2;
by id time;
run;

novinosrin · Posted 10-09-2017 02:45 PM

data have;

input ID TIME PRED new_study;

datalines;

1 0 0 1

1 0.25 1.1063 1

1 0.5 5.1534 1

2 0 0 1

2 0.25 1.0462 1

2 0.5 4.8104 1

1 0 0 2

1 0.25 1.1063 2

1 0.5 5.1534 2

2 0 0 2

2 0.25 1.0462 2

2 0.5 4.8104 2

;

data have1;

input id time ratio;

datalines;

1 0 0

1 0.25 0.937629938

1 0.5 0.925815966

2 0 0

2 0.25 0.991493022

2 0.5 0.991830201

;

data final;

if _N_ = 1 then do;

if 0 then do;

set have;

set have1;

end;

declare hash myhash(dataset:'have1', multidata:'yes' );

myhash.defineKey('id','time');

myhash.defineData('ratio');

myhash.defineDone( );

end;

set have;

if myhash.find() ne 0 then ratio=0;

run;

View solution in original post

novinosrin · Posted 10-09-2017 02:37 PM

Most simple is hash find method. Is it ok for you to implement hash?

Astounding · Posted 10-09-2017 02:37 PM

So far, you're doing the right thing (although using PROC SQL might simplify the process). All you are missing is a final PROC SORT to put the data back into the right order:

proc sort data=final;

by new_study id time;

run;

novinosrin · Posted 10-09-2017 02:45 PM

data have;

input ID TIME PRED new_study;

datalines;

1 0 0 1

1 0.25 1.1063 1

1 0.5 5.1534 1

2 0 0 1

2 0.25 1.0462 1

2 0.5 4.8104 1

1 0 0 2

1 0.25 1.1063 2

1 0.5 5.1534 2

2 0 0 2

2 0.25 1.0462 2

2 0.5 4.8104 2

;

data have1;

input id time ratio;

datalines;

1 0 0

1 0.25 0.937629938

1 0.5 0.925815966

2 0 0

2 0.25 0.991493022

2 0.5 0.991830201

;

data final;

if _N_ = 1 then do;

if 0 then do;

set have;

set have1;

end;

declare hash myhash(dataset:'have1', multidata:'yes' );

myhash.defineKey('id','time');

myhash.defineData('ratio');

myhash.defineDone( );

end;

set have;

if myhash.find() ne 0 then ratio=0;

run;

jacksonan123 · Posted 10-09-2017 04:51 PM

It did give me the result that I sought except that it only did so for
new_study, new_study 2 was not output.

jacksonan123 · Posted 10-09-2017 05:56 PM

I made an error in coding in your response. When corrected it worked perfectly by listing all of the studies..

Thanks for the help.

jacksonan123 · Posted 10-09-2017 04:53 PM

It performed the sort as you stated but it only output new_study1 there was no new_study 2 output.

Classroom Training Available!