BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Dancer_on_data
Obsidian | Level 7

Hello everyone, recently, I have encountered a problem that involves numerous if-else statements. Given the sheer volume of my data, if I were to follow this route, I would have spent days or even weeks keying in if else statements. I am wondering if there is a way to iterate this process through a combinantion of array and macro techniques. An example is shown below:

 

Let's say there is a data set named "account" that contains 100 observations and 11 variables.

 

Account_ID M1 M2 M3 M4 M5 M6 M7 M8 M9 M10

A00001       C    M   M   M   M   M   M   M   M  P

A00002       C    P   C    M   M   M   M   M   M  M

A00003       P    P   M    C   M   M   M   M   M  M

.

.

.

A00100      C    M   C    M   M   M   M   M   M  M

 

What I tried to do is illustated in my codes below:

 

Data Account_Final;

Set Account;

if M1="C" and M2="M" and M3="M" and M4="M" then P1=3;

else P1=0;

if M2="C" and M3="M" and M4="M" and M5="M" then P2=3;

else P2=0;

if M3="C" and M4="M" and M5="M" and M6="M" then P3=3;

else P3=0;

if M4="C" and M5="M" and M6="M" and M7="M" then P4=3;

else P4=0;

.

.

.

if M7="C" and M8="M" and M9="M" and M10="M" then P7=3;

else P7=0;

 

/***codes below are slightly different from those above***/

 

if M1="C" and M2="M" and M3="M" and M4="M"  M5="M" then P8=4;

else P8=0;

.

.

.

if M6="C" and M7="M" and M8="M" and M9="M"  M10then P14=4;

else P7=0;

 

/*** the "if statement" expands by increment of one "and info" until it reaches the last variable "M10" in the data set***/

 

if M1="C" and M2="M" and M3="M" and M4="M"  and M5="M" and  ... and M10="M" then PX=X;

else PX=0;

run;

 

As you can see, I have to write a lot of if else statements to accomplish my goal, but if the variable goes from 11 to 100, I am not able to write the lines of codes like I did above. I am wondering if there is another way for me to efficiently accomplish my goal with way fewer codes.

 

Thanks so much.

 

Best,

1 ACCEPTED SOLUTION

Accepted Solutions
Dancer_on_data
Obsidian | Level 7

Hi SASkiwi, sorry about this delayed reply, it worked out perfectly except some missing values, I changed the codes a bit to make it look like exactly what i wanted.Thanks so much for you help. I will copy this idea and try to expand my variables from M1-M10 to M1-M100.Smiley LOL

data have;
informat account_id $8. m1-m10 $1.;
input Account_ID $ M1-M10 $;
cards;
A00001 C M   M    M   M   M   M   M   M  P
A00002 C P   C    M   M   M   M   M   M  M
A00003 P P   M    C   M   M   M   M   M  M
;
run;

data want;
set have;
array m(10) m1-m10;
array p(13)  p1-p13 ;
do I = 1 to 7;
text=catt(m(i), m(i+1), m(i+2), m(i+3));
	if index(text, 'CMMM') then p(i) = 3;
	else p(i)=0;
end;

do I= 1 to 6;
        text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4));
	if index(text, 'CMMMM') then p(i +7) = 4;
	else p(i+7)=0;
end;
drop i text;
run;

Best,

 

View solution in original post

21 REPLIES 21
Reeza
Super User

What's the logic behind your rules?

Dancer_on_data
Obsidian | Level 7

Hi Reeza, thanks for your reply, let's think of M1- M10 as the first month all the way to the 10th month, and C stands for Core, M for Mid and P for Premier.

 

My logic is to figure out two things: 1, How many accounts moved from Core to Mid, and they also must stay in Mid for at least three consecutive months, that means "CMMM" is the minimum criteria to consider.2, Once it qualifies for consideration, I also want to figure out how many months those qualified accounts have stayed in Mid?Such as "CMMMMM"should be interpreted as 5, because 1. It qualifies for consideration when it moves from Core to Mid. 2. It has stayed in Mid for 5 months.

 

I hope I explained myself clearly to you, please let me know if you have any questions on it.

 

Thanks a lot,

 

Best,

Reeza
Super User

You don't need to post the same thing twice 🙂

 

A slight modification to my code. Instead of all the IF's I count the number of consecutive M, after the initial C. If you want, you can delete 3 from the number to account for the initial requirement. 

 

data want;
set have;
array m(10) m1-m10;
array p(10) p1-p10 (10*0);
    consecutiveM=0;
	text=catt(of m1-m10);
	x= index(text, 'CMMM') ;
	if x>0 then p(x)=1;
	
	if x>0 then do i=x+1 to dim(m);
        if char(text, i)='M' then consecutiveM+1;
        else leave;
	end;
	
	drop x i;
run;
Reeza
Super User

If you can have multiple occurences of 'CMMM' then you need to modify this code to loop through once you first find your X.

Otherwise you can concatenate the results and then search for the string CMMM. The index will be the first position and you can set the P variable to 1.

 

data have;
informat account_id $8. m1-m10 $1.;
input Account_ID $ M1-M10 $;
cards;
A00001 C M   M    M   M   M   M   M   M  P
A00002 C P   C    M   M   M   M   M   M  M
A00003 P P   M    C   M   M   M   M   M  M
;
run;

data want;
set have;
array m(10) m1-m10;
array p(10) p1-p10 (10*0);
	text=catt(of m1-m10);
	x= index(text, 'CMMM') ;
	if x>0 then p(x)=1;
run;

 

EDIT: re-reading this I don't think I got your rule, which is why you may want to explain it 🙂 

Dancer_on_data
Obsidian | Level 7

Hi Reeza, I am very new to this forum, I just want to be respectful to everyone who responded to me. Clearly I see this is not the right way to interact with anyone in this forum, thanks for the heads up.

 

I ran your codes, it has done 50% of what I wanted, the desired result I would like to see is:

 

Account_ID P1 P2 P3 P4 P5 P6 P7

A00001        3   0    0   0    0   0   0

A00002        0   0    3   0    0   0   0

A00003        0   0    0   3    0   0   0

 

After I got these result, and I would just want to figure out the maximum and minimum months each qualified account has stayed by max/min (of p1-p7).

 

Thanks

 

 

SASKiwi
PROC Star

I've borrowed @Reeza's code to do just the "3" scenario. This can be enhanced to do the other combinations.

 

data have;
informat account_id $8. m1-m10 $1.;
input Account_ID $ M1-M10 $;
cards;
A00001 C M   M    M   M   M   M   M   M  P
A00002 C P   C    M   M   M   M   M   M  M
A00003 P P   M    C   M   M   M   M   M  M
;
run;

data want;
set have;
array m(10) m1-m10;
array p(10) p1-p10 (10*0);
do I = 1 to 7;
	text=catt(m(i), m(i+1), m(i+2), m(i+3);
	if index(text, 'CMMM') then p(i) = 3;
end;
run;
Dancer_on_data
Obsidian | Level 7

Hi SASkiwi, thanks for your reply, I probably didn't explain my thoughts clearly in my codes previously.

 

In my previous dataset, let's think of M1- M10 as the first month to the10th month, and C stands for Core, M for Mid and P for Premier.

 

My logic is to figure out two things: 1, How many accounts moved from Core to Mid, and they also must stay in Mid for at least three consecutive months, that means "CMMM" is the minimum criteria to consider. 2, Once it qualifies for consideration, I also want to figure out how many months those qualified accounts have stayed in Mid? Such as "CMMMMM" should be interpreted as 5, becasue 1. It qualifies for consideration when moving from Core to Mid. 2. It has stayed in Mid for 5 months.

 

I hope I explained myself clearly to you this time, please let me know if you have any questions.

 

Thanks a lot,

 

Best,

SASKiwi
PROC Star

Thanks for the detailed explanation.

 

Can you try @Reeza and my example code to see if gives the right result for the 3 consecutive months or not? If it does then we can try the second part.

Dancer_on_data
Obsidian | Level 7

Hi SASkiwi, I ran your codes, and below are the results I got.

 

I think I am not getting the resuts as I expected, let's look at one example by focusing on the values for P1 variable. What I expected to see is P1 for first observation is 3 ( becasue its "CMMM" , and 0 for second observation (for it is "CPCM" and 0 for third observation (for it is "PPMC").On the contrary, I see 3 for all three observations.

 

Thanks.

 

Note: Sorry, for some reason, I am not able to line the P variables  correctly with its respective value in the table.

                                                                                                                                                                           P1       P2         P3    

1A00001     CMMMMMMMMP        3     0      0               
2A00002    CPCMMMMMMM        3     0     3           
3A00003    PPMCMMMMMM        3     0     3     
Reeza
Super User

I didn't bother with the indicator variables just the total count. Based on your description of the problem it seems to make sense. 

 

You could easily store the values into an array for a running count but that doesn't seem very useful. 

ChrisNZ
Tourmaline | Level 20

Regular expressions easily bring this kind of flexibility.

 

 

data HAVE;
  input (X1-X10) (: $1.);
cards;
P C M M M M M M M P
P C M M C M M M M P
P C M M C M M C M P
run;
data WANT; set HAVE; array P [10]; RE=prxparse('/CM{3,}/'); %* This means match a C followed by a least 3 Ms; call prxsubstr(RE,cat(of X1-X10), POS, LEN); %* and return the match position and length; if POS then P[POS]=LEN-1; run;
proc print noobs; var P1-P10; run;

 

 

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
. 7 . . . . . . . .
. . . . 4 . . . . .
. . . . . . . . . .

 


 

Exhaustive list of regular expressions supported by SAS in:
http://www.amazon.com/High-Performance-SAS-Coding-Christian-Graffeuille/dp/1512397490

 

Dancer_on_data
Obsidian | Level 7

Hi SASkiwi, I drew on your insights, and added some other codes to finally create the result i desired, however,  i have to ran several times to arrive at the final result, please see my codes below:

data have;
informat account_id $8. m1-m10 $1.;
input Account_ID $ M1-M10 $;
cards;
A00001 C M   M    M   M   M   M   M   M  P
A00002 C P   C    M   M   M   M   M   M  M
A00003 P P   M    C   M   M   M   M   M  M
;
run;

data want;
set have;
array m(10) m1-m10;
array p(7)  p1-p7 ;
do I = 1 to 7;
	text=catt(m(i), m(i+1), m(i+2), m(i+3));
	if index(text, 'CMMM') then p(i) = 3;
	else p(i)=0;
end;
drop i text;
run;

data want1;
set want;
array m(10) m1-m10;
array p(6)  p8-p13 ;
do I= 1 to 6;
text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4));
	if index(text, 'CMMMM') then p(i) = 4;
	else p(i)=0;
end;
drop i text;
run;

data want2;
set want1;
array m(10) m1-m10;
array p(5)  p14-p18 ;
do I= 1 to 5;
text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4), m(i+5));
	if index(text, 'CMMMMM') then p(i) = 5;
	else p(i)=0;
end;
drop i text;
run;

Data want3;
set want2;
array m(10) m1-m10;
array p(4)  p19-p22 ;
do I= 1 to 4;
text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4), m(i+5),m(i+6));
	if index(text, 'CMMMMMM') then p(i) = 6;
	else p(i)=0;
end;
drop i text;
run;

Data want4;
set want3;
array m(10) m1-m10;
array p(3)  p23-p25 ;
do I= 1 to 3;
text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4), m(i+5),m(i+6),m(i+7));
	if index(text, 'CMMMMMMM') then p(i) = 7;
	else p(i)=0;
end;
drop i text;
run;

Data want_final;
set want4;
array m(10) m1-m10;
array p(2)  p26-p27 ;
do I= 1 to 2;
text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4), m(i+5),m(i+6),m(i+7),m(1+8));
	if index(text, 'CMMMMMMMM') then p(i) = 8;
	else p(i)=0;
end;
drop i text;
Max=max(of p1-p27);
Min=min(of p1-p27);
run;

 I am wondering if there is a "magic wand" that you can help me wave  that turns them into a series of  coherent steps to produce the final result with one fell swoop.Thanks!

SASKiwi
PROC Star

Well this might be one way of combining 2 of the rules as an example. Please test it.

data have;
informat account_id $8. m1-m10 $1.;
input Account_ID $ M1-M10 $;
cards;
A00001 C M   M    M   M   M   M   M   M  P
A00002 C P   C    M   M   M   M   M   M  M
A00003 P P   M    C   M   M   M   M   M  M
;
run;

data want;
set have;
array m(10) m1-m10;
array p(13)  p1-p13 ;
do I = 1 to 7;
text=catt(m(i), m(i+1), m(i+2), m(i+3));
	if index(text, 'CMMM') then p(i) = 3;
	else p(i)=0;
end;

do I= 1 to 6;

        text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4));
	if p(i) = 3 and index(text, 'CMMMM') then p(i +7) = 4;
	else p(i)=0;
end;

drop i text;
run;

  

Dancer_on_data
Obsidian | Level 7

Hi SASkiwi, sorry about this delayed reply, it worked out perfectly except some missing values, I changed the codes a bit to make it look like exactly what i wanted.Thanks so much for you help. I will copy this idea and try to expand my variables from M1-M10 to M1-M100.Smiley LOL

data have;
informat account_id $8. m1-m10 $1.;
input Account_ID $ M1-M10 $;
cards;
A00001 C M   M    M   M   M   M   M   M  P
A00002 C P   C    M   M   M   M   M   M  M
A00003 P P   M    C   M   M   M   M   M  M
;
run;

data want;
set have;
array m(10) m1-m10;
array p(13)  p1-p13 ;
do I = 1 to 7;
text=catt(m(i), m(i+1), m(i+2), m(i+3));
	if index(text, 'CMMM') then p(i) = 3;
	else p(i)=0;
end;

do I= 1 to 6;
        text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4));
	if index(text, 'CMMMM') then p(i +7) = 4;
	else p(i+7)=0;
end;
drop i text;
run;

Best,

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 1491 views
  • 6 likes
  • 4 in conversation