BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
HeyLyla90
Calcite | Level 5

Hi everybody,

I have the following data:

 

DATA FLAGDEMO;
INPUT ID YEAR_T0 FLAG_2008 FLAG_2009 FLAG_2010 FLAG_2011 FLAG_2012 FLAG_2013 FLAG_2014 /*DESIRED FLAGS*/ FLAG_2Y FLAG_3Y;
DATALINES;
1 2008 0 0 0 0 0 1 . 0 0
1 2009 0 0 0 0 0 1 . 0 0
1 2010 0 0 0 0 0 1 . 0 1
1 2011 0 0 0 0 0 1 . 1 1
1 2012 0 0 0 0 0 1 . 1 1
1 2013 0 0 0 0 0 1 . 1 1
2 2010 0 0 0 0 0 . 1 0 0
2 2011 0 0 0 0 0 . 1 0 1
2 2012 0 0 0 0 0 . 1 1 1
2 2013 0 0 0 0 0 . 1 1 1
2 2014 0 0 0 0 0 . 1 1 1
3 2009 0 1 . . . . . 1 1
3 2010 0 1 . . . . . 1 1
;
RUN;

 

I need to create the /*DESIRED FLAGS*/ FLAG_2Y and FLAG_3Y (I inserted them manually to better explain what I want); these flags are equal to the maximum between FLAG_(n) and FLAG_(n+2) o FLAG_(n+3) for each YEAR_T0=n. So for example, if YEAR_T0=2008 i want FLAG_2Y to be the maximum between flag_2008, flag_2009,flag_2010.

I tried several combination of %do and array but I can't figure out the correct sintax or logic.

Examples of attempt:

 

First Attempt

 

%macro flag_year_v1(t0,t2);
data output;
set input;
if YEAR_T0=&T0. then flag_2y=max(FLAG_&t0.-FLAG_&t2.);
run;

%mend flag_year_v1;

%flag_year_v1(2008,2010);
%flag_year_v1(2009,2011);
%flag_year_v1(2010,2012);
%flag_year_v1(2011,2013);
%flag_year_v1(2012,2014);
%flag_year_v1(2013,2014);
%flag_year_v1(2014,2014);

 

Second Attempt

 

%MACRO FLAG_2Y_3Y(START,STOP);
%DO i=(&start.) %TO (&stop.);
%DO j=(&start.+2) %TO (&stop.);
%DO h=(&start.+3) %TO (&stop.);

DATA output;
SET input;
ARRAY FLAG_Y(*) FLAG_&i.-FLAG_&h.;
flag_default_2y=max(FLAG_Y(&i.)-FLAG_Y(&j.));
flag_default_3y=max(FLAG_Y(&i.)-FLAG_Y(&h.));
%END;
%END;
%END;
run;

%MEND FLAG_2Y_3Y;

%FLAG_2Y_3Y(2008,2014);

 

As much as i try, I can't obtain my desired results, and I would like to write a code similar to the second but I am not super proficient with iteratives loop.

Hope somebody can help me Smiley Frustrated

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Assign the flag variables to an array indexed by year.  Append 3 extra dummy variables to that array (thereby extending the upper bound of the array index to 3 plus the highest expected value of YEAR_T0.  Then it's simple:

 

DATA FLAGDEMO;
INPUT ID YEAR_T0 FLAG_2008 FLAG_2009 FLAG_2010 FLAG_2011 FLAG_2012 FLAG_2013 FLAG_2014 /*DESIRED FLAGS*/ FLAG_2Y FLAG_3Y;
DATALINES;
1 2008 0 0 0 0 0 1 . 0 0
1 2009 0 0 0 0 0 1 . 0 0
1 2010 0 0 0 0 0 1 . 0 1
1 2011 0 0 0 0 0 1 . 1 1
1 2012 0 0 0 0 0 1 . 1 1
1 2013 0 0 0 0 0 1 . 1 1
2 2010 0 0 0 0 0 . 1 0 0
2 2011 0 0 0 0 0 . 1 0 1
2 2012 0 0 0 0 0 . 1 1 1
2 2013 0 0 0 0 0 . 1 1 1
2 2014 0 0 0 0 0 . 1 1 1
3 2009 0 1 . . . . . 1 1
3 2010 0 1 . . . . . 1 1
;
RUN;
data want (drop=dummy);
  set flagdemo;
  array flg{2008:2017} flag_2008-flag_2014 dummy dummy dummy;
  f2y=max(flg{year_t0},flg{year_t0+1},flg{year_t0+2});
  f3y=max(f2y,flg{year_t0+3});
run;

The "trick" here is to set the lower and upper bounds of the index for array flg (2008 and 2017 in this case).  And since the extra dummy variables are all missing, they don't modify the result of the max functions.

 

 

By the way, this program does not reproduce the last record of your sample output, which I suspect is in error.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

11 REPLIES 11
gamotte
Rhodochrosite | Level 12

Hello,

 

Use the keyword "of" when you reref to a set of variables :

max(of FLAG_&t0.-FLAG_&t2.)

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You can use arrays for this, no need macro which just complicates things.

data want;
  set have;
  array f flag_:;
  do i=1 to dim(f);
    if vname(f{i})=cats("FLAG_",year_to) then flag_default_2y=max(f{i},f{i+1},f{i+2});
  end;
run;

This is not tested, as you have not provided any test data in a datastep, and looking at it again, then you would also need to check if there are elements available for the max, i.e. if the last element then you do not have +1 and +2, so what would you do in those instances?

 

This would of course - as always - be far simpler if you modelled you data in a usable way and avoid Excel thinking.  

ID  YEAR_TO  YEAR  FLAG

...

 

With this, you can simply compare if year_to <= year <= year_to+2.  Much simpler coding and logic.

HeyLyla90
Calcite | Level 5

Hi RW9,

thank you for your answer.

I run your code with my data (i edited my answer providing the test data) and it partially work.

The problem is that my real data do not have the two final FLAG_2Y and FLAG_3Y so when SAS compile for i=2013, it doesn't find the i+2 variable because these variables stops at FLAG_2014 and provide the error "ARRAY subscript out of range".

 

I solved the issue by writing:

data want_2y;
  set FLAGDEMO;
  array f flag_:;
  do i=1 TO 5;
    if vname(f{i})=cats("FLAG_",year_t0) then flag_default_2y=max(f{i},f{i+1},f{i+2});
  end;
    if vname(f{6})=cats("FLAG_",year_t0) then flag_default_2y=max(f{6},f{7});
	if vname(f{7})=cats("FLAG_",year_t0) then flag_default_2y=f{7};
run;

However, I'm sure there is a more efficient way to write this, so that if the number of years should increase, it would be not necessary to manually insert the numbers. Of course writing it as a macro, but since I am not that expert about dos and arrays I wonder if there is a more straightforward solution.

 

Otherwise, my original data looked like this:

DATA FLAGDEMO;
INPUT ID YEAR_T0 FLAG_t0;
DATALINES;
1 2008 0
1 2009 0
1 2010 0
1 2011 0
1 2012 0
1 2013 1
2 2010 0
2 2011 0
2 2012 0
2 2013 0
2 2014 1
3 2009 1
;
RUN;

What would you have done in this case to calculate the same flags? I transposed this table in what I posted because I thought it would have been easier, but any better suggestion would approciated 🙂

ballardw
Super User

@HeyLyla90 wrote:

Hi RW9,

 

I solved the issue by writing:

However, I'm sure there is a more efficient way to write this, so that if the number of years should increase, it would be not necessary to manually insert the numbers. Of course writing it as a macro, but since I am not that expert about dos and arrays I wonder if there is a more straightforward solution.


Yes there is. It is called data normalization. Instead of having actual data in your variable names (the year) create a data set that has a variable to hold that information and one record per value.

Is that data example the way you read an external file or did you create flag_2008 etc in a previous step?

 

 

HeyLyla90
Calcite | Level 5

From the data I provided in the reply, I did:

 

proc transpose data =FLAGDEMO out =FLAGDEMO2 prefix = FLAG_;
	by ID YEAR_T0;
	id YEAR_T0;
	var FLAG_T0;
run;

Then joined FLAGDEMO2 with FLAGDEMO to have the matrix I initially posted, but the original data is the second one!

RW9
Diamond | Level 26 RW9
Diamond | Level 26

From your original data (note I corrected it to avoid shouting):

data flagdemo;
  input id year_t0 flag_t0;
datalines;
1 2008 0
1 2009 0
1 2010 0
1 2011 0
1 2012 0
1 2013 1
2 2010 0
2 2011 0
2 2012 0
2 2013 0
2 2014 1
3 2009 1
;
run;

You haven't posted what you want as output, so I will just show an output routine which will for each row of the above, take the next two rows max:

data want;
  set flagdemo;
  if lag(id)=id and lag2(id)=id then flag_2y=max(lag(flag_t0),lag2(flag_t0));
run;

You will note how flag_2y is only created where there are two previous results.  There are many ways of doing this part, its merely to show you that working with normalised data is far simpler.  If you can provide what you want as output (as your first required output doesn't make sense, only one 2yr flag for many years) then I can look on Monday.

 

HeyLyla90
Calcite | Level 5

My desired output is always the one I posted from the beginning, that would be:

 

 

data flagdemo;
input id year_t0 flag_t0 wanted_flag_2y;
datalines;
1 2008 0 0
1 2009 0 0
1 2010 0 0
1 2011 0 1
1 2012 0 1
1 2013 1 1
2 2010 0 0
2 2011 0 0
2 2012 0 1
2 2013 0 1
2 2014 1 1
3 2009 1 1
;
run;

 

 

Your code gets different result(*), what I need to do is always to obtain a flag that for each id indicates the maximum of flag_t0 in the two and three years following the t0 so for 2008 the maximum among 2008, 2009 and 2010 and so on.

I realize I have still an excel mindset, but I really try to do my best as a beginner and I know there are a lot of functions out there for me to discover 🙂

 

(*)Results with your last code:

data want;
input id year_t0 flag_t0 wanted_flag_2y;
datalines
1 2008 0 .
1 2009 0 .
1 2010 0 .
1 2011 0 0
1 2012 0 0
1 2013 1 0
2 2010 0 .
2 2011 0 .
2 2012 0 1
2 2013 0 1
2 2014 1 0
3 2009 1 .
;
run;
Tom
Super User Tom
Super User

Looking ahead is harder than looking backwards since you need to examine observations that you have not yet read from the disk.

But you can do it by adding extra SET statements.

data want ;
  set flagdemo end=eof1 ;
  if eof1 then call missing(id1);
  else set flagdemo (firstobs=2 keep=id flag_t0 rename=(id=id1 flag_t0=flag_t1)) end=eof2;
  if eof2 then call missing(id2);
  else set flagdemo (firstobs=3 keep=id flag_t0 rename=(id=id2 flag_t0=flag_t2));
  if id ne id1 then call missing(flag_t1);
  if id ne id2 then call missing(flag_t2);

  flag_2y = flag_t0 or flag_t1 or flag_t2 ;
run;

You could also use PROC SQL to join the table with itself.

proc sql ;
create table want as
  select a.*,max(b.flag_t0) as flag_2y
  from flagdemo a
  left join flagdemo b
  on a.id = b.id and b.year_t0 between a.year_t0 and a.year_t0+2
  group by 1,2,3
  order by 1,2
;
quit;

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ok, something like:

data flagdemo;
  input id year_t0 flag_t0;
datalines;
1 2008 0
1 2009 0
1 2010 0
1 2011 0
1 2012 0
1 2013 1
2 2010 0
2 2011 0
2 2012 0
2 2013 0
2 2014 1
3 2009 1
;
run;

proc sql;
  create table want as
  select a.*,
         (select max(flag_t0) from (select * from flagdemo where id=a.id and a.year_t0 <= year_t0 <= a.year_t0 + 2) group by id) as flag_2y
  from   flagdemo a;
quit;

Or you could use the array code I presented earlier.

mkeintz
PROC Star

Assign the flag variables to an array indexed by year.  Append 3 extra dummy variables to that array (thereby extending the upper bound of the array index to 3 plus the highest expected value of YEAR_T0.  Then it's simple:

 

DATA FLAGDEMO;
INPUT ID YEAR_T0 FLAG_2008 FLAG_2009 FLAG_2010 FLAG_2011 FLAG_2012 FLAG_2013 FLAG_2014 /*DESIRED FLAGS*/ FLAG_2Y FLAG_3Y;
DATALINES;
1 2008 0 0 0 0 0 1 . 0 0
1 2009 0 0 0 0 0 1 . 0 0
1 2010 0 0 0 0 0 1 . 0 1
1 2011 0 0 0 0 0 1 . 1 1
1 2012 0 0 0 0 0 1 . 1 1
1 2013 0 0 0 0 0 1 . 1 1
2 2010 0 0 0 0 0 . 1 0 0
2 2011 0 0 0 0 0 . 1 0 1
2 2012 0 0 0 0 0 . 1 1 1
2 2013 0 0 0 0 0 . 1 1 1
2 2014 0 0 0 0 0 . 1 1 1
3 2009 0 1 . . . . . 1 1
3 2010 0 1 . . . . . 1 1
;
RUN;
data want (drop=dummy);
  set flagdemo;
  array flg{2008:2017} flag_2008-flag_2014 dummy dummy dummy;
  f2y=max(flg{year_t0},flg{year_t0+1},flg{year_t0+2});
  f3y=max(f2y,flg{year_t0+3});
run;

The "trick" here is to set the lower and upper bounds of the index for array flg (2008 and 2017 in this case).  And since the extra dummy variables are all missing, they don't modify the result of the max functions.

 

 

By the way, this program does not reproduce the last record of your sample output, which I suspect is in error.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
HeyLyla90
Calcite | Level 5
Yes it is, I also though of this solution, thank you!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1416 views
  • 0 likes
  • 6 in conversation