turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- How to streamline this data generating process?

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-08-2016 07:32 PM - edited 04-08-2016 07:43 PM

Hello everyone, recently, I have encountered a problem that involves numerous if-else statements. Given the sheer volume of my data, if I were to follow this route, I would have spent days or even weeks keying in if else statements. I am wondering if there is a way to iterate this process through a combinantion of array and macro techniques. An example is shown below:

Let's say there is a data set named "account" that contains 100 observations and 11 variables.

Account_ID M1 M2 M3 M4 M5 M6 M7 M8 M9 M10

A00001 C M M M M M M M M P

A00002 C P C M M M M M M M

A00003 P P M C M M M M M M

.

.

.

A00100 C M C M M M M M M M

What I tried to do is illustated in my codes below:

Data Account_Final;

Set Account;

if M1="C" and M2="M" and M3="M" and M4="M" then P1=3;

else P1=0;

if M2="C" and M3="M" and M4="M" and M5="M" then P2=3;

else P2=0;

if M3="C" and M4="M" and M5="M" and M6="M" then P3=3;

else P3=0;

if M4="C" and M5="M" and M6="M" and M7="M" then P4=3;

else P4=0;

.

.

.

if M7="C" and M8="M" and M9="M" and M10="M" then P7=3;

else P7=0;

/***codes below are slightly different from those above***/

if M1="C" and M2="M" and M3="M" and M4="M" M5="M" then P8=4;

else P8=0;

.

.

.

if M6="C" and M7="M" and M8="M" and M9="M" M10then P14=4;

else P7=0;

/*** the "if statement" expands by increment of one "and info" until it reaches the last variable "M10" in the data set***/

if M1="C" and M2="M" and M3="M" and M4="M" and M5="M" and ... and M10="M" then PX=X;

else PX=0;

run;

As you can see, I have to write a lot of if else statements to accomplish my goal, but if the variable goes from 11 to 100, I am not able to write the lines of codes like I did above. I am wondering if there is another way for me to efficiently accomplish my goal with way fewer codes.

Thanks so much.

Best,

Accepted Solutions

Solution

04-10-2016
09:24 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2016 09:18 PM

Hi SASkiwi, sorry about this delayed reply, it worked out perfectly except some missing values, I changed the codes a bit to make it look like exactly what i wanted.Thanks so much for you help. I will copy this idea and try to expand my variables from M1-M10 to M1-M100.

```
data have;
informat account_id $8. m1-m10 $1.;
input Account_ID $ M1-M10 $;
cards;
A00001 C M M M M M M M M P
A00002 C P C M M M M M M M
A00003 P P M C M M M M M M
;
run;
data want;
set have;
array m(10) m1-m10;
array p(13) p1-p13 ;
do I = 1 to 7;
text=catt(m(i), m(i+1), m(i+2), m(i+3));
if index(text, 'CMMM') then p(i) = 3;
else p(i)=0;
end;
do I= 1 to 6;
text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4));
if index(text, 'CMMMM') then p(i +7) = 4;
else p(i+7)=0;
end;
drop i text;
run;
```

Best,

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-08-2016 08:46 PM

What's the logic behind your rules?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-08-2016 11:29 PM - edited 04-08-2016 11:41 PM

Hi Reeza, thanks for your reply, let's think of M1- M10 as the first month all the way to the 10th month, and C stands for Core, M for Mid and P for Premier.

My logic is to figure out two things: 1, How many accounts moved from Core to Mid, and they also must stay in Mid for at least three consecutive months, that means "CMMM" is the minimum criteria to consider.2, Once it qualifies for consideration, I also want to figure out how many months those qualified accounts have stayed in Mid?Such as "CMMMMM"should be interpreted as 5, because 1. It qualifies for consideration when it moves from Core to Mid. 2. It has stayed in Mid for 5 months.

I hope I explained myself clearly to you, please let me know if you have any questions on it.

Thanks a lot,

Best,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-08-2016 11:54 PM

You don't need to post the same thing twice

A slight modification to my code. Instead of all the IF's I count the number of consecutive M, after the initial C. If you want, you can delete 3 from the number to account for the initial requirement.

```
data want;
set have;
array m(10) m1-m10;
array p(10) p1-p10 (10*0);
consecutiveM=0;
text=catt(of m1-m10);
x= index(text, 'CMMM') ;
if x>0 then p(x)=1;
if x>0 then do i=x+1 to dim(m);
if char(text, i)='M' then consecutiveM+1;
else leave;
end;
drop x i;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-08-2016 08:56 PM - edited 04-08-2016 08:57 PM

If you can have multiple occurences of 'CMMM' then you need to modify this code to loop through once you first find your X.

Otherwise you can concatenate the results and then search for the string CMMM. The index will be the first position and you can set the P variable to 1.

```
data have;
informat account_id $8. m1-m10 $1.;
input Account_ID $ M1-M10 $;
cards;
A00001 C M M M M M M M M P
A00002 C P C M M M M M M M
A00003 P P M C M M M M M M
;
run;
data want;
set have;
array m(10) m1-m10;
array p(10) p1-p10 (10*0);
text=catt(of m1-m10);
x= index(text, 'CMMM') ;
if x>0 then p(x)=1;
run;
```

EDIT: re-reading this I don't think I got your rule, which is why you may want to explain it

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2016 12:36 AM

Hi Reeza, I am very new to this forum, I just want to be respectful to everyone who responded to me. Clearly I see this is not the right way to interact with anyone in this forum, thanks for the heads up.

I ran your codes, it has done 50% of what I wanted, the desired result I would like to see is:

Account_ID P1 P2 P3 P4 P5 P6 P7

A00001 3 0 0 0 0 0 0

A00002 0 0 3 0 0 0 0

A00003 0 0 0 3 0 0 0

After I got these result, and I would just want to figure out the maximum and minimum months each qualified account has stayed by max/min (of p1-p7).

Thanks

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-08-2016 10:48 PM

I've borrowed @Reeza's code to do just the "3" scenario. This can be enhanced to do the other combinations.

```
data have;
informat account_id $8. m1-m10 $1.;
input Account_ID $ M1-M10 $;
cards;
A00001 C M M M M M M M M P
A00002 C P C M M M M M M M
A00003 P P M C M M M M M M
;
run;
data want;
set have;
array m(10) m1-m10;
array p(10) p1-p10 (10*0);
do I = 1 to 7;
text=catt(m(i), m(i+1), m(i+2), m(i+3);
if index(text, 'CMMM') then p(i) = 3;
end;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-08-2016 11:38 PM - edited 04-08-2016 11:42 PM

Hi SASkiwi, thanks for your reply, I probably didn't explain my thoughts clearly in my codes previously.

In my previous dataset, let's think of M1- M10 as the first month to the10th month, and C stands for Core, M for Mid and P for Premier.

My logic is to figure out two things: 1, How many accounts moved from Core to Mid, and they also must stay in Mid for at least three consecutive months, that means "CMMM" is the minimum criteria to consider. 2, Once it qualifies for consideration, I also want to figure out how many months those qualified accounts have stayed in Mid? Such as "CMMMMM" should be interpreted as 5, becasue 1. It qualifies for consideration when moving from Core to Mid. 2. It has stayed in Mid for 5 months.

I hope I explained myself clearly to you this time, please let me know if you have any questions.

Thanks a lot,

Best,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-08-2016 11:48 PM

Thanks for the detailed explanation.

Can you try @Reeza and my example code to see if gives the right result for the 3 consecutive months or not? If it does then we can try the second part.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2016 12:04 AM - edited 04-09-2016 12:08 AM

Hi SASkiwi, I ran your codes, and below are the results I got.

I think I am not getting the resuts as I expected, let's look at one example by focusing on the values for P1 variable. What I expected to see is P1 for first observation is 3 ( becasue its "CMMM" , and 0 for second observation (for it is "CPCM" and 0 for third observation (for it is "PPMC").On the contrary, I see 3 for all three observations.

Thanks.

Note: Sorry, for some reason, I am not able to line the P variables correctly with its respective value in the table.

P1 P2 P3

1 | A00001 | C | M | M | M | M | M | M | M | M | P | 3 | 0 | 0 |

2 | A00002 | C | P | C | M | M | M | M | M | M | M | 3 | 0 | 3 |

3 | A00003 | P | P | M | C | M | M | M | M | M | M | 3 | 0 | 3 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2016 12:15 AM

I didn't bother with the indicator variables just the total count. Based on your description of the problem it seems to make sense.

You could easily store the values into an array for a running count but that doesn't seem very useful.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2016 12:16 AM - edited 04-10-2016 03:02 AM

Regular expressions easily bring this kind of flexibility.

```
data HAVE;
input (X1-X10) (: $1.);
cards;
P C M M M M M M M P
P C M M C M M M M P
P C M M C M M C M P
run;
```

data WANT;
set HAVE;
array P [10];
RE=prxparse('/CM{3,}/'); %* This means match a C followed by a least 3 Ms;
call prxsubstr(RE,cat(of X1-X10), POS, LEN); %* and return the match position and length;
if POS then P[POS]=LEN-1;
run;

proc print noobs;
var P1-P10;
run;

P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 |
---|---|---|---|---|---|---|---|---|---|

. | 7 | . | . | . | . | . | . | . | . |

. | . | . | . | 4 | . | . | . | . | . |

. | . | . | . | . | . | . | . | . | . |

Exhaustive list of regular expressions supported by SAS in:

http://www.amazon.com/High-Performance-SAS-Coding-Christian-Graffeuille/dp/1512397490

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2016 01:21 AM

Hi SASkiwi, I drew on your insights, and added some other codes to finally create the result i desired, however, i have to ran several times to arrive at the final result, please see my codes below:

```
data have;
informat account_id $8. m1-m10 $1.;
input Account_ID $ M1-M10 $;
cards;
A00001 C M M M M M M M M P
A00002 C P C M M M M M M M
A00003 P P M C M M M M M M
;
run;
data want;
set have;
array m(10) m1-m10;
array p(7) p1-p7 ;
do I = 1 to 7;
text=catt(m(i), m(i+1), m(i+2), m(i+3));
if index(text, 'CMMM') then p(i) = 3;
else p(i)=0;
end;
drop i text;
run;
data want1;
set want;
array m(10) m1-m10;
array p(6) p8-p13 ;
do I= 1 to 6;
text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4));
if index(text, 'CMMMM') then p(i) = 4;
else p(i)=0;
end;
drop i text;
run;
data want2;
set want1;
array m(10) m1-m10;
array p(5) p14-p18 ;
do I= 1 to 5;
text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4), m(i+5));
if index(text, 'CMMMMM') then p(i) = 5;
else p(i)=0;
end;
drop i text;
run;
Data want3;
set want2;
array m(10) m1-m10;
array p(4) p19-p22 ;
do I= 1 to 4;
text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4), m(i+5),m(i+6));
if index(text, 'CMMMMMM') then p(i) = 6;
else p(i)=0;
end;
drop i text;
run;
Data want4;
set want3;
array m(10) m1-m10;
array p(3) p23-p25 ;
do I= 1 to 3;
text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4), m(i+5),m(i+6),m(i+7));
if index(text, 'CMMMMMMM') then p(i) = 7;
else p(i)=0;
end;
drop i text;
run;
Data want_final;
set want4;
array m(10) m1-m10;
array p(2) p26-p27 ;
do I= 1 to 2;
text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4), m(i+5),m(i+6),m(i+7),m(1+8));
if index(text, 'CMMMMMMMM') then p(i) = 8;
else p(i)=0;
end;
drop i text;
Max=max(of p1-p27);
Min=min(of p1-p27);
run;
```

I am wondering if there is a "magic wand" that you can help me wave that turns them into a series of coherent steps to produce the final result with one fell swoop.Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2016 02:01 AM - edited 04-09-2016 02:04 AM

Well this might be one way of combining 2 of the rules as an example. Please test it.

```
data have;
informat account_id $8. m1-m10 $1.;
input Account_ID $ M1-M10 $;
cards;
A00001 C M M M M M M M M P
A00002 C P C M M M M M M M
A00003 P P M C M M M M M M
;
run;
data want;
set have;
array m(10) m1-m10;
array p(13) p1-p13 ;
do I = 1 to 7;
text=catt(m(i), m(i+1), m(i+2), m(i+3));
if index(text, 'CMMM') then p(i) = 3;
else p(i)=0;
end;
do I= 1 to 6;
text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4));
if p(i) = 3 and index(text, 'CMMMM') then p(i +7) = 4;
else p(i)=0;
end;
drop i text;
run;
```

Solution

04-10-2016
09:24 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2016 09:18 PM

Hi SASkiwi, sorry about this delayed reply, it worked out perfectly except some missing values, I changed the codes a bit to make it look like exactly what i wanted.Thanks so much for you help. I will copy this idea and try to expand my variables from M1-M10 to M1-M100.

```
data have;
informat account_id $8. m1-m10 $1.;
input Account_ID $ M1-M10 $;
cards;
A00001 C M M M M M M M M P
A00002 C P C M M M M M M M
A00003 P P M C M M M M M M
;
run;
data want;
set have;
array m(10) m1-m10;
array p(13) p1-p13 ;
do I = 1 to 7;
text=catt(m(i), m(i+1), m(i+2), m(i+3));
if index(text, 'CMMM') then p(i) = 3;
else p(i)=0;
end;
do I= 1 to 6;
text=catt(m(i), m(i+1), m(i+2), m(i+3), m(i+4));
if index(text, 'CMMMM') then p(i +7) = 4;
else p(i+7)=0;
end;
drop i text;
run;
```

Best,