09-16-2016 06:10 PM
I am trying to do a many to many join on the datasets AE and patdata. I am trying to figure out if there is a way to reduce the number of iterations of inner loop. The key on which join is performed is subject.
I am planning to split the patdata into few subset based on the value of subject using the code i got from the internet. i tried creating the subsets with the name patdata_001 and patdata_002.
During the iteration i am trying to extract the value of subject from outer record (record from AE) into a macro variable value.
In the same step I am trying to read the corresponding data
but here the dataset name is not resolving .
but when i am trying to substitute the value directly eg : patdata_001 it does work.
Is it because macro code gets compiled first? is there any alternate way for this.
Thanks in advance,
proc sql noprint;
select distinct SUBJECT into :subject_code
separated by '|'
options mprint mlogic symbolgen;
%macro splitchar(string=,wordpfx=WORD,dlm=%str( ));
%do cnt=1 %to %sysfunc(countw(&string,&dlm));
%let &wordpfx&cnt = %scan(&string,&cnt,%str(&dlm));
%* echo macro var result to log window;
%do i = 1 %to %sysfunc(countw(&string,&dlm));
create table patdata_&&&wordpfx&i
as select * from
where subject = %unquote(%str(%')&&&wordpfx&i%str(%'));;
call symput ('value', subject);
DROP _: match; * Drop temporary variables;
/*Below I am looping the second dataset completely. is there any way to subset it?*/
DO i=1 TO xnobs;
SET patdata_&value(rename=(subject=_subject)) NOBS=xnobs POINT=i;
if subject=_subject THEN
IF match=0 THEN DO; * Output AE record if no match in CM;
09-16-2016 07:13 PM - edited 09-16-2016 07:14 PM
What are you actually trying to create? Why do you think that you need to create multiple datasets?
You can easily do a many-to-many join in PROC SQL.
proc sql; create table alldata as select * from patdata a full join ae b on a.subject = b.subject ; quit;
09-17-2016 09:02 PM
09-18-2016 10:15 AM
Yes. You cannot change the name of the dataset used in a SET statement in the middle of a data step. That is set when the step begins.
But I still do not understand what you are trying to do with that complicated piece of code. Or why it is necessary to make many seperate datasets. Does what you do to the data depend on the id value of the subject? If not the just use one data step and apply the same rules to every body. Use by group processing is much easier than having to maintain hundreds or thousands of individual data sets.
First if PATDATA has just one record per subject then just merge to the two datasets. You can decide whether to keep everyone or only those that appear in both datasets by using the IN= data set option to create flag variables.
data want ; merge patdata(in=in1) ae(in=in2) ; by subject ; if in1 and in2; .... run;
Second if PATDATA has more than one record per subject then use proc sql to join them. You can do more later in another step if you need to. You can even make the join as a VIEW so that you only process the data in the DATA step.
proc sql ; create view patdata_ae as select * from patdata a inner join ae b on a.subject = b.subject order by a.subject ; quit; data want ; set patdata_ae ; by subject ; .... run;
09-19-2016 04:39 AM
Here is a way to implement a many to many merge with just data steps.
First calculate the first and last observataion number for each subject for one of that datasets. Then you can do a one to many merge with that new table and use the POINT= option on a SET statement inside of a DO loop to pull in the mulitple observations.
data pointers ; set patdata ; by subject ; if first.subject then first=_n_; retain first; if last.subject then last=_n_; if last.subject then output; keep subject first last ; run; data alldata0; merge ae pointers (in=in2) ; by subject ; if not in2 then do; match=0; output; end; else do p=first to last; match=1; set patdata point=p ; output; end; run;