Re: Macro with merge statement

CathyVI · Posted 04-21-2023 10:25 AM

Hello,

I have a dataset (temp: cancer file) and a yearly mbsf_abcd files (2006-2019 demographic file).

I will like to merge temp with yearly files by patient_id in the year of their diagnosis to identify demographic and baseline characteristics of individuals who had evidence of cancer.

The yearly files are large so I only want to select variables of interest and pull in demographic variables as of the year of cancer diagnoses. If data are missing in the year of cancer diagnosis, I will pull data from year prior to cancer diagnosis.

I tried to use macro but am not proficient enough with it and I don't know how to do section in bold.

%macro around(start_year=,end_year=);
    %do year=&start_year %to &end_year;
data want;
keep PATIENT_ID BENE_BIRTH_DT BENE_DEATH_DT BENE_ENROLLMT_REF_YR AGE_AT_END_REF_YR  BENE_RACE_CD COUNTY_CD RTI_RACE_CD SAMPLE_GROUP SEX_IDENT_CD STATE_CODE ZIP_CD dsn VALID_DEATH_DT_SW;
merge temp a.mbsf_abcd_&year.;
by patient_id;
if year_of_diagnosis=. then call missing (year_of_diagnosis);
else year_of_diagnosis=&year.;
    %end;
%mend;
%around(start_year=2006,end_year=2019)

Appreciate your help.

Tom · Posted 04-21-2023 10:29 AM

I suspect you are over complicating things.

data want;
  set a.mbsf_abcd_2006 - a.mbsf_abcd_2019 ;
  where not missing(year_of_diagnosis);
run;

CathyVI · Posted 04-21-2023 11:22 AM

@Tom I seems to agree with you that am overthinking it. But first I want to merge temp with msdf_abcd not a.mbsf_abcd_2006 to a.mbsf_abcd_2019. The year_of_diagnosis is in the temp file not mbsf_abcd files. How will your code works?

ballardw · Posted 04-21-2023 01:19 PM

Something like this perhaps:

data easy;
  set a.mbsf_abcd_2006 - a.mbsf_abcd_2019 ;
  where not missing(year_of_diagnosis);
run;

proc sql;
   create data want as
   select a.*, b.* 
   from temp as a
        left join
        easy as b
        on a.patient_id = b.patient_id
           and a.year_of_diagnosis =b.year_of_diagnosis
   ;
quit;

You will get a note about patient_id and year_of_diagnosis already on the data set. But without knowing the names of the variable in the combined data of all the records that you want I can't provide any of these.

The a.* refers to all the variables in set aliased as A. So if there are none of the other variables from the details data set then there weren't any matches found. This will return multiple records if the patient had multiple records with the same year of diagnosis.

CathyVI · Posted 04-22-2023 06:01 PM

@ballardw Thank you. The temp file is a unique ID file(no dup record) The mbsf_abcd_YYYY files has the demographics for each patient but no year of diagnosis. This is why I want to map each patient demographic with their year of diagnosis.

Your sql code is great but I cannot do this below because temp is the only file that has year of diagnosis.

and a.year_of_diagnosis =b.year_of_diagnosis

Kurt_Bremser · Posted 04-24-2023 02:58 AM

Concatenate all yearly datasets into one (use a data step view, and derive the year with INDSNAME=), merge with your unique dataset, and select the demographics data in a DO UNTIL loop.

For code, please supply usable example data in data steps with DATALINES.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tom · Posted 04-21-2023 09:35 PM

Not sure what you are "merging", but if you just want to take the YEAR from the TEMP then just list it first and remember it into another variable. THen just test if the year matches the suffix on the current dataset's name.

data want;
   set temp(in=in1) x_2006-x_2019 indsname=dsname ;
   by id;
   retain year;
   if first.id then call missing(year);
   if in1 then year=year_of_diagnosis;
   else if year = input(scan(dsname,-1,'_'),32.) ;
run;

Macro with merge statement