I was trying to create some macro to select the first non missing value for every variable of interest per subject.
After sorting the data...
%fstnmissvalue(dsn,var);
data temp..&var ;
set &dsn (keep=bene_id A0310F_ENTRY_DSCHRG_CD TRGT_DT rename=(TRGT_DT=&VAR._DT A0310F_ENTRY_DSCHRG_CD=&VAR._ENTRY_DSCHRG_CD)) ;
where &var ne . and &var ne "";
by bene_id;
if first.bene_id;
run;
%mend;
%macro fstnmissvalue_loop(varlist);
%let n1=%sysfunc(countw(&varlist));
%let dsn1=temp.mds_asmt11_sum;
%do i=1 %to &n1;
%let var1=%scan(&varlist, &i);
%fstnmissvalue(&dsn1,&var1);
%end;
%mend;
%fstnmissvalue_loop(G0110H1_EATG_SELF_CD G0110B1_TRNSFR_SELF_CD G0110I1_TOILTG_SELF_CD G0120A_BATHG_SELF_CD );
Notes from the log:
785 %fstnmissvalue(dsn,var); - 180 WARNING: Apparent invocation of macro FSTNMISSVALUE not resolved. ERROR 180-322: Statement is not valid or it is used out of proper order. 786 data temp..&var ; ----- 22 201 WARNING: Apparent symbolic reference VAR not resolved. 787 set &dsn (keep=bene_id A0310F_ENTRY_DSCHRG_CD TRGT_DT rename=(TRGT_DT=&VAR._DT - ---------------------- 214 79 23 ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, /, ;, _DATA_, _LAST_, _NULL_. ERROR 201-322: The option is not recognized and will be ignored. ERROR 214-322: Variable name & is not valid. ERROR 79-322: Expecting a =. ERROR 23-7: Invalid value for the RENAME option. ------- 23 787! set &dsn (keep=bene_id A0310F_ENTRY_DSCHRG_CD TRGT_DT rename=(TRGT_DT=&VAR._DT ------- 214 787! A0310F_ENTRY_DSCHRG_CD=&VAR._ENTRY_DSCHRG_CD)) ; WARNING: Apparent symbolic reference VAR not resolved. WARNING: Apparent symbolic reference VAR not resolved. ERROR 214-322: Variable name VAR._DT is not valid. 788 where &var ne . and &var ne ""; WARNING: Apparent symbolic reference VAR not resolved. 788 where &var ne . and &var ne ""; - 180 ERROR 180-322: Statement is not valid or it is used out of proper order. WARNING: Apparent symbolic reference VAR not resolved. WARNING: Apparent symbolic reference VAR not resolved. ERROR: Syntax error while parsing WHERE clause. 789 by bene_id; 790 if first.bene_id; 791 run; NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.VAR may be incomplete. When this step was stopped there were 0 observations and 0 variables. WARNING: Data set WORK.VAR was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.09 seconds cpu time 0.03 seconds ERROR: No matching %MACRO statement for this %MEND statement. 792 %mend; 793 794 795 %macro fstnmissvalue_loop(varlist); 796 %let n1=%sysfunc(countw(&varlist)); 797 %let dsn1=temp.mds_asmt11_sum; 798 %do i=1 %to &n1; 799 %let var1=%scan(&varlist, &i); 800 %fstnmissvalue(&dsn1,&var1); 801 %end; 802 %mend; 803 804 /*%let dsn=temp.mds_asmt11_sum;*/ 805 %fstnmissvalue_loop(G0110H1_EATG_SELF_CD G0110B1_TRNSFR_SELF_CD G0110I1_TOILTG_SELF_CD G0120A_BATHG_SELF_CD ); NOTE: Line generated by the invoked macro "FSTNMISSVALUE_LOOP". 1 %fstnmissvalue(&dsn1,&var1); - 180 WARNING: Apparent invocation of macro FSTNMISSVALUE not resolved. ERROR 180-322: Statement is not valid or it is used out of proper order. WARNING: Apparent invocation of macro FSTNMISSVALUE not resolved. NOTE: Line generated by the invoked macro "FSTNMISSVALUE_LOOP". 3 %fstnmissvalue(&dsn1,&var1); - 180 ERROR 180-322: Statement is not valid or it is used out of proper order. WARNING: Apparent invocation of macro FSTNMISSVALUE not resolved. NOTE: Line generated by the invoked macro "FSTNMISSVALUE_LOOP". 5 %fstnmissvalue(&dsn1,&var1); - 180 ERROR 180-322: Statement is not valid or it is used out of proper order. WARNING: Apparent invocation of macro FSTNMISSVALUE not resolved. NOTE: Line generated by the invoked macro "FSTNMISSVALUE_LOOP". 7 %fstnmissvalue(&dsn1,&var1); - 180 ERROR 180-322: Statement is not valid or it is used out of proper order.
I have no idea where went wrong with my code. Is there any one that could help with my problem? I'll be very appreicated about your help. Thanks!!!
Now it works with the modified code as below
%macro fstnmissvalue(dsn,var);
data temp.&var ;
set &dsn (keep=bene_id A0310F_ENTRY_DSCHRG_CD TRGT_DT &var
rename=(TRGT_DT=&VAR._DT A0310F_ENTRY_DSCHRG_CD=&VAR._ENDS_CD)) ;
where &var ne "-" and &var ne "";
by bene_id;
if first.bene_id;
run;
%mend;
%macro fstnmissvalue_loop(varlist);
%let n1=%sysfunc(countw(&varlist));
%let dsn1=TEMP.MDS_ASMT11_SUM;
%do i=1 %to &n1;
%let var1=%scan(&varlist, &i);
%fstnmissvalue(&dsn1,&var1);
%end;
%mend;
/*%let dsn=temp.mds_asmt11_sum;*/
%fstnmissvalue_loop(G0110H1_EATG_SELF_CD G0110B1_TRNSFR_SELF_CD G0110I1_TOILTG_SELF_CD G0120A_BATHG_SELF_CD );
Macro definition starts with %macro . The first time you used %fstnmissvalue(dsn,var); was apparently in the definition and should be
%macro fstnmissvalue(dsn,var);
No comments about the rest of the code as I stopped with the first error. fixing that will likely clear up many of the error messages.
Oooooops, I did that again! Thank you so much for pointing it out! I fixed some other issue in my code but still have error.
%macro fstnmissvalue(dsn,var);
data temp.&var ;
set &dsn (keep=bene_id A0310F_ENTRY_DSCHRG_CD TRGT_DT rename=(TRGT_DT=&VAR._DT A0310F_ENTRY_DSCHRG_CD=&VAR._ENDS_CD)) ;
where &var ne . and &var ne "";
by bene_id;
if first.bene_id;
run;
%mend;
%macro fstnmissvalue_loop(varlist);
%let n1=%sysfunc(countw(&varlist));
%let dsn1=TEMP.MDS_ASMT11_SUM;
%do i=1 %to &n1;
%let var1=%scan(&varlist, &i);
%fstnmissvalue(&dsn1,&var1);
%end;
%mend;
%fstnmissvalue_loop(G0110H1_EATG_SELF_CD G0110B1_TRNSFR_SELF_CD G0110I1_TOILTG_SELF_CD G0120A_BATHG_SELF_CD );
ERROR: Variable G0110H1_EATG_SELF_CD is not on file TEMP.MDS_ASMT11_SUM.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set TEMP.G0110H1_EATG_SELF_CD may be incomplete. When this step was stopped there were 0 observations and 3
variables.
WARNING: Data set TEMP.G0110H1_EATG_SELF_CD was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.10 seconds
cpu time 0.03 seconds
ERROR: Variable G0110B1_TRNSFR_SELF_CD is not on file TEMP.MDS_ASMT11_SUM.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set TEMP.G0110B1_TRNSFR_SELF_CD may be incomplete. When this step was stopped there were 0 observations and
3 variables.
WARNING: Data set TEMP.G0110B1_TRNSFR_SELF_CD was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.01 seconds
ERROR: Variable G0110I1_TOILTG_SELF_CD is not on file TEMP.MDS_ASMT11_SUM.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set TEMP.G0110I1_TOILTG_SELF_CD may be incomplete. When this step was stopped there were 0 observations and
3 variables.
WARNING: Data set TEMP.G0110I1_TOILTG_SELF_CD was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.01 seconds
ERROR: Variable G0120A_BATHG_SELF_CD is not on file TEMP.MDS_ASMT11_SUM.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set TEMP.G0120A_BATHG_SELF_CD may be incomplete. When this step was stopped there were 0 observations and 3
variables.
WARNING: Data set TEMP.G0120A_BATHG_SELF_CD was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.00 seconds
I double checked with those variables in my dataset, they do exist with such variable names.
878 proc print data= TEMP.MDS_ASMT11_SUM (obs=1);
879 var bene_id G0110H1_EATG_SELF_CD;
880 run;
NOTE: There were 1 observations read from the data set TEMP.MDS_ASMT11_SUM.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.06 seconds
cpu time 0.00 seconds
Now it works with the modified code as below
%macro fstnmissvalue(dsn,var);
data temp.&var ;
set &dsn (keep=bene_id A0310F_ENTRY_DSCHRG_CD TRGT_DT &var
rename=(TRGT_DT=&VAR._DT A0310F_ENTRY_DSCHRG_CD=&VAR._ENDS_CD)) ;
where &var ne "-" and &var ne "";
by bene_id;
if first.bene_id;
run;
%mend;
%macro fstnmissvalue_loop(varlist);
%let n1=%sysfunc(countw(&varlist));
%let dsn1=TEMP.MDS_ASMT11_SUM;
%do i=1 %to &n1;
%let var1=%scan(&varlist, &i);
%fstnmissvalue(&dsn1,&var1);
%end;
%mend;
/*%let dsn=temp.mds_asmt11_sum;*/
%fstnmissvalue_loop(G0110H1_EATG_SELF_CD G0110B1_TRNSFR_SELF_CD G0110I1_TOILTG_SELF_CD G0120A_BATHG_SELF_CD );
Now my problem is sovled. Thanks again!
Alternatively, you could scrap all the macro language. Here's a program to get the first nonmissing value of every variable by BENE_ID:
data temp;
set have;
obsno=_n_;
run;
proc sort data=temp;
by bene_id descending obsno;
run;
data want;
update temp (obs=0 drop=obsno) temp (drop=obsno);
by bene_id;
run;
You don't need to know the variable names, which ones are character and which are numeric ... just let UPDATE handle it.
That looks like a briliant idea to me. Is there any way to also catch the date when the value of each target variable became non missing with your code? My original thinking was to rename the datetime variable and merge them together to achieve one record for each id. I'd like to know about your way.
You would need to jump through quite a few hoops to make this happen. But it could be done since a DATA step (including one that uses UPDATE) permits data manipulation statements. Here's the idea of what you would need to do.
For each target variable ...
On each observation:
Other hoops:
Drop the target variable(s). Rename the "copy of the target variable" to be the "target variable".
So it can be done within the context of a single DATA step. But it might be easier to create the DATE variables in a separate DATA step that processes the observations in the original order, then merge with the results from this DATA step.
I think you can take advantage of a preceding SQL, and a two-line modification to your "data temp" step. Everything else in your program could be left unchanged:
Assuming you have a variable named daterec in each record, then this should work:
proc sql noprint;
select distinct cats('if not(missing(',name,')) then firstdate_',name,'=daterec;')
into :myrecode separated by ' '
from dictionary.columns where libname='WORK' and memname='HAVE';
quit;
data temp;
set have;
obsno=_n_;
&myrecode ;
format firstdate_: yymmddn8. ;
run;
** rest of @astounding's code here **;
So for variables X Y Z in dataset have, three new variables firstdata_X firstdate_Y firstdate_z are created. If you are worried about the lengh of the varnames change "firstdate_" in the above to "fd_".
Excellent idea. So the plan is to create a "copy of the date variable", populated only when the "variable of interest" is nonmissing. Then let the sorting and UPDATE approach pick the earliest populated "copy of the date variable". I like it.
I have several extended questions @ballardwon the codes shared by you and @mkeintz.
1. If I want to select the first non-missing and not equal to ''-'' value (or any other specified value) for every target vaiable and the related dates, is there any way to make it work with your plan?
2. What if i want to select the last non-missing values?
3. Or select the last non-missing and not equal to "-" value?
4. I'm wondering if the data step would take more time for processing compared to my original plan, if I'm working on some huge dataset, let's say population size is 1 million.
Thanks everyone, for your help and sharing your valuable thoughts!
I guess one way to select non-missing and not equal certain spcified value is to recode the specified value to missing. Any other thoughts?
I
Question 1: By asking for first/last non-missing that is also not a particular value, you will defeat the update statement, which provides the last non-missing value in this program. Of course, it's actually the earliest non-missing value, since the dataset is sorted by descending obsn. Your speculation of recoding offending values to missing would solve the problem.
Question 2: Answer is embedded in answer above.
Question 3: Answer is embedded in answer above.
Question 4: Your original plan was to read the dataset once per variable, so 10 vars means reading 10 times. @Astounding,s program reads it once, although there is an expensive sort to conduct. But then your home free.
As an efficiency suggestion.
Instead of
DATE TEMP;
Do
DATA VTEMP/view=VTEMP;
And then modify the proc sort statement to
PROC SORT DATA=VTEMP OUT=TEMP;
The "data vtemp/view=vtemp" construct a data set VIEW rather than a data set FILE. As such, the view is not implemented until called later (in this case by proc sort). And when it is called, it outputs to the calling proc, NOT to the DISK. As a result you save (1) writing VTEMP to disk, and (2) re-reading it from disk into proc sort. That should help a lot.
Also do a test on restricted subset of your data.
Sorry, I have to ask again. I tried your code by replacing daterec with my date variable
TRGT_DT
proc sql noprint;
select distinct cats('if not(missing(',name,')) then fd_',name,'=TRGT_DT;')
into :myrecode separated by ' '
from dictionary.columns where libname='WORK' and memname='have';
quit;
But in the log, it says
1081 proc sql noprint;
1082 select distinct cats('if not(missing(',name,')) then fd_',name,'=TRGT_DT;')
1083 into :myrecode separated by ' '
1084 from dictionary.columns where libname='WORK' and memname='have';
NOTE: No rows were selected.
1085 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
1086 data temp;
1087 set have;
1088 obsno=_n_;
1089 &myrecode ;
-
180
WARNING: Apparent symbolic reference MYRECODE not resolved.
ERROR 180-322: Statement is not valid or it is used out of proper order.
Could you please help me out with this? Thank you.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.