Hi Tom:
I answered all questions in the comment section in the code.
please check if this is clear now.
thanks
Purple
options mprint mlogic;
%macro pd(ord,ds,ovar,fvar,v1,v2,v3,v4);
/*-----Is the list of datasets always A B C D E ?:
***Answer: There are 30 datasets with different names. I just give an example.
-----Is the list of variable names always some letter followed by DAT_ and either YYYY, MM or DD ?
***Answer: Yes , eg. birthdat_yyyy , birthdat_mm, birthdat_dd
-----Why are there three variables per date? (is there any hope of make simplifications before the data gets to SAS)
*** Answer: there are about 30 data sets, the below is the code I used to get the date from yyyy, dd, mm. it is not pretty though.
---Why are you passing in that -2,-1,1,2,3 parameter when the macro can just COUNT how many variables you passed in?
It seems that 1,2,3 is just a count of how many variables you passed.
***Right, we can get a better way.
---What do the negative numbers mean? What is difference between -1 and 1? Between -2 and -1?
Answer: I updated the code. this ord is used to count number of variables.
---Why no code to handle zero?
Answer: what do you mean ?
---What does that first data step do? The one with so much macro code being used to generate it.
Answer: there are many variables in each dataset, I only need to Keep a few of them and the number of variables are different in the data sets.
---How is it helping to get to the answer? Does it just convert the M,D,Y variables into actual dates? Does it transpose the data? Does it find the max? What do the multiple PROC SORT steps do?
Answer: First need Actual dates, the ultimate goal is to find from the 30 data sets the latest date by Subject.
---Why are you creating 4 different datasets in the last block of macro code that is being driven by the ORD variable?
Now the ORD variable seems to have a different meaning than it did in the first data step.
Rather than just meaning how many variables to process it seems to meaning what to name the output dataset.
If the name of the output dataset changes why not just pass it in as a parameter like the input dataset?
Or derive it based on the value of the ORD parameter? You seem to be mapping -2 -> AF , -1 ->AF, 1->BF ...
Answer: Need to append all 30 datasets, please see Step 3, if I only use one proc append, there is a log WARNING: Variable LAST4 was not found on DATA file.
/* Step 1). IMPUTE PARTIAL DATES FROM MANY DATA SETS WHICH HAD DIFFERENT NUMBER OF DATES; */
data &ds (keep=subject &fvar.: ds _lastf);
length ds $10.;
set ae06june.&ds(keep= subject
%if &ord=1 %then %do; &v1.: %end;
%else %if &ord=2 %then %do; &v1.: &v2.: %end;
%else %if &ord=3 %then %do; &v1.: &v2.: &v3.: %end;
%else %if &ord=4 %then %do; &v1.: &v2.: &v3.: &v4.:%end;
);
/*IF USE MDY , THEN LOG HAS "NOTE: MISSING VALUES, HOW TO REMOVE THE LOG NOTE OF "MISSING VALUES...."*/
if &v1._dd ^=. then &ovar.=put(&v1._yyyy, 4.)||'-'||put(&v1._mm, z2.)||'-'||put(&v1._dd, z2.);
else if &v1._mm ^=. then &ovar.=put(&v1._yyyy, 4.)||'-'||put(&v1._mm, z2.)||'-01';
else if &v1._yyyy^=. then &ovar.=put(&v1._yyyy, 4.)||'-01'||'-01';
&fvar.=input(&ovar.,yymmdd10.); format &fvar. yymmdd10.;
_lastf=&fvar.;
%if &ord>=2 %then %do;
if &v2._dd^=. then &ovar.2=put(&v2._yyyy, 4.)||'-'||put(&v2._mm, z2.)||'-'||put(&v2._dd, z2.);
else if &v2._mm^=. then &ovar.2=put(&v2._yyyy, 4.)||'-'||put(&v2._mm, z2.)||'-01';
else if &v2._yyyy^=. then &ovar.2=put(&v2._yyyy, 4.)||'-01'||'-01';
&fvar.2=input(&ovar.2,yymmdd10.); format &fvar.2 yymmdd10.;
array dt &fvar. &fvar.2;
do over dt;
if dt=. then dt="01JAN1900"d;
end;
_lastf=max(of dt[*]);
%end;
%if &ord>=3 %then %do;
if &v3._dd^=. then &ovar.3=put(&v3._yyyy, 4.)||'-'||put(&v3._mm, z2.)||'-'||put(&v3._dd, z2.);
else if &v3._mm^=. then &ovar.3=put(&v3._yyyy, 4.)||'-'||put(&v3._mm, z2.)||'-01';
else if &v3._yyyy^=. then &ovar.3=put(&v3._yyyy, 4.)||'-01'||'-01';
&fvar.3=input(&ovar.3,yymmdd10.); format &fvar.3 yymmdd10.;
array dt2 &fvar. &fvar.2 &fvar.3;
do over dt2;
if dt2=. then dt2="01JAN1900"d;
end;
_lastf=max(of dt2[*]);
%end;
%if &ord=4 %then %do;
if &v4._dd^=. then &ovar.4=put(&v4._yyyy, 4.)||'-'||put(&v4._mm, z2.)||'-'||put(&v4._dd, z2.);
else if &v4._mm^=. then &ovar.4=put(&v4._yyyy, 4.)||'-'||put(&v4._mm, z2.)||'-01';
else if &v4._yyyy^=. then &ovar.4=put(&v4._yyyy, 4.)||'-01'||'-01';
&fvar.4=input(&ovar.4,yymmdd10.); format &fvar.4 yymmdd10.;
array dt3 &fvar. &fvar.2 &fvar.3 &fvar.4;
do over dt3;
if dt3=. then dt3="01JAN1900"d;
end;
_lastf=max(of dt3[*]);
%end;
format _lastf yymmdd10.;
run;
/* Step 2).GET THE LATEST DATE FOR EACH DATA SET;*/
proc sort;by subject decending _lastf;run;
proc sort data=&ds. out=&ds.f nodupkey;by subject;run;
run;
/* Step 3). APPEND ALL DATA SETS; BUT THIS STEP HAS TO USE PROC APPEND */
data all;
set af bf cf df ef ;
run;
/* Step 4) FIND THE LASTEST DATE FROM ALL DATA SETS*/
proc sort data=all;by subject decending _lastf;run;
data all;
set all;
by subject;
if last.subject;
run;
%mend;
/*pd(ord,ds, nvar,ovar,fvar,v1,v2,v3,v4);*/
%pd(4,e, lstdt, last, bicdat, cicdat,picdat,ficdat)
%pd(3,d, lstdt,last, hopstdat,hopendat,testdat)
%pd(2,c, lstdt, last, castdat, caendat, , )
%pd(1,b, lstdt, last, bmdat , , , )
%pd(1,a, lstdt, last, cldat , , , )
... View more