BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
saweheh
Fluorite | Level 6

Can you help simplify the following if-else statements with a do loop or something? The reason I'm asking is because I have up to 60 statements and it's quite labour intensive if I do this manually. Thanks for your help again. Here is the updated full code I used including the macro. I'm trying to do the following.

From Table1:

Datesample_IDStartEnd
28-Mar-1718/05/2017.
20-Mar-17210/03/20171/06/2017
20-Mar-17215/11/201816/04/2020
20-Mar-1721/12/2020.
22-Apr-16314/04/20164/01/2017
22-Apr-16327/02/201722/03/2019
24-May-16417/11/2020.
To Table 2:
Datesample_IDStartEndM0M1M2M3M4M5
28-Mar-1718/05/2017.111111
20-Mar-17210/03/20171/06/2017100000
20-Mar-17215/11/201816/04/2020000000
20-Mar-1721/12/2020.000000
22-Apr-16314/04/20164/01/2017111111
22-Apr-16327/02/201722/03/2019000000
24-May-16417/11/2020.000000
%Let M0 =date+28; /*this date derived from the previous table*/
%Let M1 =date+56;
%Let M2 =date+84;
%Let M3 =date+112;
%Let M4 =date+140;
%Let M5 =date+168;

data want;
	set have;
		if (End < &M1 and End ^= .)  or Start > &M1 then M0 = 0; else M0=1;
		if (End < &M2 and End ^= .)  or Start > &M2 then M1 = 0; else M1=1;
		if (End < &M3 and End ^= .)  or Start > &M3 then M2 = 0; else M2=1;
		if (End < &M4 and End ^= .)  or Start > &M4 then M3 = 0; else M3=1;
		if (End < &M5 and End ^= .)  or Start > &M5 then M4 = 0; else M4=1;
	
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

I


@saweheh wrote:

that's correct. i still can't get it right in the code below.

 

data have; 
infile datalines expandtabs; 
input (ID Date Start End) ($); 
datalines; 
1	28-Mar-17	28-Apr-17	28-Jun-17 
1	28-Mar-17	28-Apr-17	28-Jun-17 
1	28-Mar-17	28-Apr-17	28-Jun-17 
2	20-Mar-17	28-May-17	28-Jul-17 
2	20-Mar-17	28-May-17	28-Jul-17 
3	22-Apr-16	28-Jun-17	28-Jun-17 
3	22-Apr-16	28-Jun-17	28-Jun-17 
4	22-May-16	28-Jul-17	28-Jul-17 
4	22-May-16	28-Jul-17	28-Jul-17 
4	22-May-16	28-Jul-17	28-Jul-17 
; 
 
Data want; 
   set have; 
  /* your variables*/ 
   array v (*) m0 - m5; 
   /* the limits. Assumes these are numeric as you have cleverly hidden the values in macro variables. Note, the*/ 
   array t(5) _temporary_ (m1=Date+28 m2=Date+56 m3=Date+84 m4=Date+112 m5=Date+140); 
   do i=1 to dim(v); 
      if (End < t[i] and End ^= .)  or Start > t[i] then V[i] = 0; else V[i] =1; 
   end; 
   drop i; 
run;

I was attempting to reduce your code, which used those M0, M1 etc variables. I.E "simplifying multiple if-else statements". A request to "simplify" code usually means that the code that is the target of simplifying actually works.

So you did not actually request code to "simplify".

Additionally there is almost no way that character date values work for any sort of computation. And if

%Let M0 =date+28;

 Then the values compared would have been : Start > date+28. There was no numeric variable "date" to add to and would have generated a bunch of errors.

 

data have; 
infile datalines expandtabs; 
input Date :date9. ID $ Start:ddmmyy10. End:ddmmyy10.; 
format date start end date9.;
datalines; 
28-Mar-17 1 8/05/2017	.
20-Mar-17 2 10/03/2017	1/06/2017
20-Mar-17 2 15/11/2018	16/04/2020
20-Mar-17 2 1/12/2020	.
22-Apr-16 3 14/04/2016	4/01/2017
22-Apr-16 3 27/02/2017	22/03/2019
24-May-16 4 17/11/2020	.
;
 
Data want; 
   set have; 
  /* your variables*/ 
   array v (*) m0 - m4; 
   /* the limits. Assumes these are numeric as you have cleverly hidden the values in macro variables. Note, the*/ 
   array t(5) _temporary_ (28,56,84,112,140); 
   do i=1 to dim(v); 
      if (End < (date+ t[i]) and End ^= .)  or Start >(date+ t[i]) then V[i] = 0; else V[i] =1; 
   end; 
   drop i; 
run;

/* or if you have 60 of these apparent 28 day intervals*/
data want2;
   set have;
   array v (*) m0 - m59; 
   do i=1 to dim(v); 
      if (End < (date+(28*i)) and End ^= .)  or Start > (date+(28*i)) then V[i] = 0; else V[i] =1; 
   end;
run;

I don't know what you might be doing in that "macro" you mention but it is not needed for this part of the issue I think.

 

Now you have to justify why M0 is 1 for that first record. End is missing so the first part of the comparison is false and Start is much more than 28 days after the date.

 

I don't know why you so drastically changed your "have" data, multiple records with the same value, no missing "end", no starts before the "date".

View solution in original post

11 REPLIES 11
jimbarbour
Meteorite | Level 14

You can generate the IF statements with a macro like this:

%MACRO	Generate_IFs(Iterations);
	%local	i;
	%DO	i					=	1	%TO	&Iterations;
		IF	(END < &&M&i AND NOT MISING(End)) OR	Start > &&M&i THEN M&i = 0; ELSE M&i = 1;
	%END;
%MEND	Generate_IFs;

 

We can test the Macro by temporarily removing the semi-colons from the IF statement.  We have to put the semi-colons back if we want executable code, like so:

%PUT	NOTE:  %Generate_IFs(10);

 

The variables and values for M1 ... Mn need to be set before we test the macro.  We can set up a little macro to create some test values:

%MACRO	Dummy_M_Values(Iterations);
	%local	i;
	%DO	i					=	1	%TO	&Iterations;
		%GLOBAL	M&i;
		%LET	M&i			=	%EVAL(&i +  3);
		%PUT	NOTE:  &i.. &&M&i;
	%END;
%MEND	Dummy_M_Values;

%Dummy_M_Values(10);

 

Running a test (with the semi-colons removed from the IF statement) yields:

NOTE:  IF (END < 4 AND NOT MISING(End)) OR Start > 4 THEN M1 = 0 ELSE M1 = 1    IF (END < 5 AND NOT MISING(End)) OR Start > 
5 THEN M2 = 0 ELSE M2 = 1    IF (END < 6 AND NOT MISING(End)) OR Start > 6 THEN M3 = 0 ELSE M3 = 1    IF (END < 7 AND NOT 
MISING(End)) OR Start > 7 THEN M4 = 0 ELSE M4 = 1    IF (END < 8 AND NOT MISING(End)) OR Start > 8 THEN M5 = 0 ELSE M5 = 1   
 IF (END < 9 AND NOT MISING(End)) OR Start > 9 THEN M6 = 0 ELSE M6 = 1    IF (END < 10 AND NOT MISING(End)) OR Start > 10 
THEN M7 = 0 ELSE M7 = 1    IF (END < 11 AND NOT MISING(End)) OR Start > 11 THEN M8 = 0 ELSE M8 = 1    IF (END < 12 AND NOT 
MISING(End)) OR Start > 12 THEN M9 = 0 ELSE M9 = 1    IF (END < 13 AND NOT MISING(End)) OR Start > 13 THEN M10 = 0 ELSE M10 
= 1

The Iterations argument of the macro will determine how many iterations of M there are.  In my example, I used 10, so there will be M1 through M10.

 

Jim

ballardw
Super User

When you want to do essentially the same thing to multiple variables the answer is quite often ARRAY processing.

There's a lot of stuff I'm not going to reiterate from the documentation . The array Name is the variable on the Array statement. It cannot be the name of an existing variable. If you already have variables you can list them as shown.

_temporary_ means the variables created by the array are not kept after the end of the data step. The values in () are the initial values. If you have character values there are more things to specify such as $ and length and the initial values would be in quotes. You reference the element of an array with the array name and an integer index value in parentheses. I use the [ ] as it is allowed and makes it easier to see this is an array and not some other parentheses.

It is up to you to make sure the values in the temporary array align in order for use with the other array(s).

The DIM function returns the number of elements in an array. This makes array coding very flexible as just adding the other 55 variables to the array definition of V means there are now 60. The _temporary_ array since it is not using existing variables needs to have the array definition resized to 60 and then list the values.

Example:

Data want;
   set have;
  /* your variables*/
   array v (*) m0 - m5;
   /* the limits. Assumes these are numeric as you have cleverly hidden the values in macro variables. Note, the*/
   array t(5) _temporary_ (&m1 &m2 &m3 &m4 &m5);
   do i=1 to dim(v);
      if (End < t[i] and End ^= .)  or Start > t[i] then V[i] = 0; else V[i] =1;
   end;
   drop i;
end;
ChrisNZ
Tourmaline | Level 20
if (End < &M1 and End ^= .)  or Start > &M1 then M0 = 0; else M0 =1;

can be written as

M0 = ifn( . < END < &M1 or START > &M1 , 0 , 1);

or even

M0 = ^( . < END < &M1 or START > &M1 );

^ means NOT

Kurt_Bremser
Super User

Let's state your intention:

You create a series of offsets from date (which increase by 28 for every loop instance), and then set a boolean value for each offset, depending on start and end.

I guess that your "up to 60 statements" refer to the extent of the above loop?

 

If the above meets your requirement, you do not need any macro coding, just an array of target variables and a DO loop.

saweheh
Fluorite | Level 6

that's correct. i still can't get it right in the code below.

 

data have; 
infile datalines expandtabs; 
input (ID Date Start End) ($); 
datalines; 
1	28-Mar-17	28-Apr-17	28-Jun-17 
1	28-Mar-17	28-Apr-17	28-Jun-17 
1	28-Mar-17	28-Apr-17	28-Jun-17 
2	20-Mar-17	28-May-17	28-Jul-17 
2	20-Mar-17	28-May-17	28-Jul-17 
3	22-Apr-16	28-Jun-17	28-Jun-17 
3	22-Apr-16	28-Jun-17	28-Jun-17 
4	22-May-16	28-Jul-17	28-Jul-17 
4	22-May-16	28-Jul-17	28-Jul-17 
4	22-May-16	28-Jul-17	28-Jul-17 
; 
 
Data want; 
   set have; 
  /* your variables*/ 
   array v (*) m0 - m5; 
   /* the limits. Assumes these are numeric as you have cleverly hidden the values in macro variables. Note, the*/ 
   array t(5) _temporary_ (m1=Date+28 m2=Date+56 m3=Date+84 m4=Date+112 m5=Date+140); 
   do i=1 to dim(v); 
      if (End < t[i] and End ^= .)  or Start > t[i] then V[i] = 0; else V[i] =1; 
   end; 
   drop i; 
run;
Tom
Super User Tom
Super User

Initial values of an array are just numbers, not expression.  And they are INITIAL values, so they only really apply to the first iteration of the data step, so they cannot be different for each observation.

 

Instead use a _TEMPORARY_ array to store the offsets so that you can now use a DO to loop over the elements in the array.

 

Sounds like your input (the dynamic part) is the list of intervals.  So unless you want to have a tens of thousands of intervals just put the list into a single macro variable.

%let offsets=28 56 84 112 140;

You can then count how many there are:

%let n=%sysfunc(countw(&offsets,%str( )));

And then use that number to generate the SAS code you need.

SAS will evaluate boolean expression to 1 (TRUE) or 0 (FALSE).  So just assign the value of your condition to the new variable

data want; 
   set have; 
   array offset [&n] _temporary_ (&offsets);
   array m [&n] ;
   do index=1 to &n;
     m[index] = start < sum(date,offset[index]) and ( missing(end) or end >= sum(date,offset[index]));
  end;
run;
ballardw
Super User

I


@saweheh wrote:

that's correct. i still can't get it right in the code below.

 

data have; 
infile datalines expandtabs; 
input (ID Date Start End) ($); 
datalines; 
1	28-Mar-17	28-Apr-17	28-Jun-17 
1	28-Mar-17	28-Apr-17	28-Jun-17 
1	28-Mar-17	28-Apr-17	28-Jun-17 
2	20-Mar-17	28-May-17	28-Jul-17 
2	20-Mar-17	28-May-17	28-Jul-17 
3	22-Apr-16	28-Jun-17	28-Jun-17 
3	22-Apr-16	28-Jun-17	28-Jun-17 
4	22-May-16	28-Jul-17	28-Jul-17 
4	22-May-16	28-Jul-17	28-Jul-17 
4	22-May-16	28-Jul-17	28-Jul-17 
; 
 
Data want; 
   set have; 
  /* your variables*/ 
   array v (*) m0 - m5; 
   /* the limits. Assumes these are numeric as you have cleverly hidden the values in macro variables. Note, the*/ 
   array t(5) _temporary_ (m1=Date+28 m2=Date+56 m3=Date+84 m4=Date+112 m5=Date+140); 
   do i=1 to dim(v); 
      if (End < t[i] and End ^= .)  or Start > t[i] then V[i] = 0; else V[i] =1; 
   end; 
   drop i; 
run;

I was attempting to reduce your code, which used those M0, M1 etc variables. I.E "simplifying multiple if-else statements". A request to "simplify" code usually means that the code that is the target of simplifying actually works.

So you did not actually request code to "simplify".

Additionally there is almost no way that character date values work for any sort of computation. And if

%Let M0 =date+28;

 Then the values compared would have been : Start > date+28. There was no numeric variable "date" to add to and would have generated a bunch of errors.

 

data have; 
infile datalines expandtabs; 
input Date :date9. ID $ Start:ddmmyy10. End:ddmmyy10.; 
format date start end date9.;
datalines; 
28-Mar-17 1 8/05/2017	.
20-Mar-17 2 10/03/2017	1/06/2017
20-Mar-17 2 15/11/2018	16/04/2020
20-Mar-17 2 1/12/2020	.
22-Apr-16 3 14/04/2016	4/01/2017
22-Apr-16 3 27/02/2017	22/03/2019
24-May-16 4 17/11/2020	.
;
 
Data want; 
   set have; 
  /* your variables*/ 
   array v (*) m0 - m4; 
   /* the limits. Assumes these are numeric as you have cleverly hidden the values in macro variables. Note, the*/ 
   array t(5) _temporary_ (28,56,84,112,140); 
   do i=1 to dim(v); 
      if (End < (date+ t[i]) and End ^= .)  or Start >(date+ t[i]) then V[i] = 0; else V[i] =1; 
   end; 
   drop i; 
run;

/* or if you have 60 of these apparent 28 day intervals*/
data want2;
   set have;
   array v (*) m0 - m59; 
   do i=1 to dim(v); 
      if (End < (date+(28*i)) and End ^= .)  or Start > (date+(28*i)) then V[i] = 0; else V[i] =1; 
   end;
run;

I don't know what you might be doing in that "macro" you mention but it is not needed for this part of the issue I think.

 

Now you have to justify why M0 is 1 for that first record. End is missing so the first part of the comparison is false and Start is much more than 28 days after the date.

 

I don't know why you so drastically changed your "have" data, multiple records with the same value, no missing "end", no starts before the "date".

ChrisNZ
Tourmaline | Level 20

Adding an array and a loop might be seen by some as "complexify" compared to a list of tests.

Fewer lines of code, but harder to read.

So simplify is probably not the best term here. 

saweheh
Fluorite | Level 6

You're right Chris. I shouldn't have used the word "Simplify". My intention/aim as a new coder is to learn the advanced method to replace repetitive tasks as this.

The question is how long would it take to advance to this level as a coder? Where do I start from here?

saweheh
Fluorite | Level 6

Thank you so much. It's so amazing you managed to resolve this with a few lines of code. I can now replicate your code in my project. 

It'll take me a while to explain myself how your code works as I'm new to all these coding syntax and symbols.

 

Thanks again for your help.

ChrisNZ
Tourmaline | Level 20

> It'll take me a while to explain myself how your code works

It's not simplified at all then is it? Simple means easy to understand and maintain. Sometimes more compact is not the preferred outcome. Code legibility should be a top priority.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2300 views
  • 5 likes
  • 6 in conversation