Column date to column y5 are what I have.
I need to take the values during different periods of time from different columns. The column z is what I try to generate.
date | y1 | y2 | y3 | y4 | y5 | z |
1 | 2 | . | . | . | . | 2 |
2 | 4 | . | . | . | . | 4 |
13 | . | 2 | . | . | . | 2 |
15 | . | 3 | . | . | . | 3 |
24 | . | . | 9 | . | . | 9 |
26 | . | . | 10 | . | . | 10 |
31 | . | . | . | 5 | . | 5 |
33 | . | . | . | 7 | . | 7 |
47 | . | . | . | . | 7 | 7 |
49 | . | . | . | . | 5 | 5 |
I already wrote a simple program as follows and but it does not give me what I wanted.
data test;
set test;
array cutoff(5) _temporary_(10 20 30 40 50);
do j=1 to 5;
do while (date<=cutoff(j));
set test;
array y(*) y1-y5;
current=y(j);
output;
end;
end;
run;
This is what I have:
date | y1 | y2 | y3 | y4 | y5 | z |
1 | 2 | . | . | . | . | 2 |
2 | 4 | . | . | . | . | 4 |
13 | . | 2 | . | . | . | . |
15 | . | 3 | . | . | . | 3 |
24 | . | . | 9 | . | . | . |
26 | . | . | 10 | . | . | 10 |
31 | . | . | . | 5 | . | . |
33 | . | . | . | 7 | . | 7 |
47 | . | . | . | . | 7 | . |
49 | . | . | . | . | 5 | 5 |
Since I have a number of such datasets to process, I try to change it to a macro:
%macro process;
data test
set test;
array cutoff(5) _temporary_(10 20 30 40 50);
%do j=1 %to 36;
%do %while (date1<=cutoff(j));
array y(*) y1-y36;
current=y(j);
output;
%end;
%end;
run;
%end;
/*keep the current value*/
%mend process;
%process
However, when I execute the macro, I got error messages:
ERROR: Required operator not found in expression: date1<=cutoff(j)
ERROR: The condition in the %DO %WHILE loop, date1<=cutoff(j), yielded an invalid or missing
value, . The macro will stop executing.
ERROR: The macro PROCESS will stop executing.
Can anybody help me with this? Thanks!
Here is a way to calculate the index using a FORMAT.
proc format ;
value idx low-10=1 10-20=2 20-30=3 30-40=4 40-50=5 ;
run;
data want ;
set have ;
array y(*) y1-y5;
j= input(put(date,idx.),5.);
if 1 <= j <= dim(y) then current=y(j);
run;
Since I have a number of such datasets to process
What is it that varies between the datasets? If nothing varies then just pass the dataset names to the macro.
Is it the list of date ranges that change? You could generate the format above with a macro loop.
%macro idxgen(list);
%local low high i n ;
%let n=%sysfunc(countw(&list));
%let low=low ;
proc format ;
value idx
%do i=1 %to &n ;
%let high=%scan(&list,&i) ;
&low - &high = &i
%let low=&high;
%end;
&low - high = %eval(&n+1);
;
run;
%mend idxgen ;
Does not look like a problem that would be helped by using macro programming.
If you your data really only has one non-missing value per row the COALESCE() function is all you need.
data want ;
set have ;
newz = coalesce(of y1-y5);
run;
The example I showed is a simplified version. Most of the times I have more than two non-missing values in each row. When that happens, I need to choose a value according to the date. You actually helped me write the data step program when I posted the question last time:) Thanks very much!!
Aside from what Tom said, you tried to use data step variables in the macro %while statement. During macro execution, these do not yet exist. Keep in mind that macro processing occurs BEFORE the data step is even compiled.
Thanks a lot! Then what's the correct way? Would it help if I take out the % signs in the do while loop?
I have some problems making sense of your code.
- j iterates 1 to 36, but array cutoff() has only 5 elements?
- where does date1 come from?
Aside from that, you may be able to solve your problem without macro processing.
Sorry, I made some mistakes here. The iteration is 5, and the variable should be date. Thanks for your time!
Here is a way to calculate the index using a FORMAT.
proc format ;
value idx low-10=1 10-20=2 20-30=3 30-40=4 40-50=5 ;
run;
data want ;
set have ;
array y(*) y1-y5;
j= input(put(date,idx.),5.);
if 1 <= j <= dim(y) then current=y(j);
run;
Since I have a number of such datasets to process
What is it that varies between the datasets? If nothing varies then just pass the dataset names to the macro.
Is it the list of date ranges that change? You could generate the format above with a macro loop.
%macro idxgen(list);
%local low high i n ;
%let n=%sysfunc(countw(&list));
%let low=low ;
proc format ;
value idx
%do i=1 %to &n ;
%let high=%scan(&list,&i) ;
&low - &high = &i
%let low=&high;
%end;
&low - high = %eval(&n+1);
;
run;
%mend idxgen ;
This is exactly what I need!! I've been stuck here for all day~Thanks so much! I just have one more confusion: when we write macros, sometimes we need to put %do and %end. But sometimes, we need only do-end without the %sign. What's the difference here?
do - end, written without "%", is just text that the macro processor will hand over to the main SAS interpreter. To the macro processor, this is only "garble-garble-garble". %do and %end are macro statements that the macro processor actually "understands" and acts upon.
I see~ Thank you very much!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.