Dear All
I'm using SAS university edition. I'll outline my problem by first outlining my data. I've used the array command to create a variable for each week (in 2008) displaying whether an observation lives in place A or B.
As my dataset is confidentiel I've created a quick look-a-like-version here.
As you see from the picture each observation can possibly move back and forth.
I'm wondering whether it's possible, using the array command, to create new variables for each sequence respectively and in that variable for the specific sequence report the length of the sequence. In the example above there would at most be 3 new variables (lets call them sequence1, sequence2 and sequence3). For the first observation sequence1 = 2, sequence2 = 3 and sequence3= 2. For the second observation sequence1 = 8, sequence2 = 0 and sequence3= 0.
So far i've only succeed in capturing the length of the first sequence by the following code
data newdata (compress=binary);
set olddata;
array weeks (52) $2;
endB=.;
periodB=.;
if indicatorB = 1 then /* where indicatorB shows whether or not B is present within any of the weeks variables */
do i=startB to 52 by 1 UNTIL(spell(i) ne 'B'); /* where startB is a variable telling me in what week we first observe B (between 1-52) */
if weeks(i) in ('B') then endB = i;
end;
periodB=endB-startB+1;
run;
Hope you can help me update my code so I can capture the different lengths of multiple sequences
Kind regards
Marius
HI @mariusmorthorst Good afternoon, If you could post the sample of what you HAVE and what you WANT as plain text, a lazy person like me would be interested to copy paste the data on to my sas software and do some program/testing. Thank you!
hi @novinosrin
Thanks for your quick reply
Is the following sufficient
What i have
ID | week 1 | week 2 | week 3 | week 4 | week 5 | week 6 | week 7 | week 8 |
1 | A | A | B | B | B | A | A | A |
2 | B | B | B | B | B | B | B | B |
3 | B | B | A | A | A | A | A | A |
4 | A | A | A | A | B | B | B | B |
ID | week 1 | week 2 | week 3 | week 4 | week 5 | week 6 | week 7 | week 8 | sequence1 | sequence2 | sequence3 | number of sequence |
1 | A | A | B | B | B | A | A | A | 2 | 3 | 3 | 3 |
2 | B | B | B | B | B | B | B | B | 8 | 0 | 0 | 1 |
3 | B | B | A | A | A | A | A | A | 2 | 6 | 0 | 2 |
4 | A | A | A | A | B | B | B | B | 4 | 4 | 0 | 2 |
data have;
input ID (week1 week2 week3 week4 week5 week6 week7 week8) ($);
datalines;
1 A A B B B A A A
2 B B B B B B B B
3 B B A A A A A A
4 A A A A B B B B
;
data want;
set have;
array t(*) week:;
array s(*) seq1-seq3 ;
i=0;
_count=1;
do _n_=2 to dim(t);
if t(_n_)=t(_n_-1) then _count+1;
else do;i=i+1;s(i)=_count;_count=1;continue;end;
if _n_=dim(t) then do;i=i+1;s(i)=_count;end;
end;
number_of_sequence=n(of s(*));
drop _: i;
run;
Yes you can use arrays. I like your thinking and at the same time I concur with other et al's recommendations too. Have fun and welcome to SAS forum. Cheers!
I would recommend transposing your data to a long format, then the problem is trivial.
use the approach outlined here:
https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/
With the modification of the NOTSORTED option on your BY statement so that the order is correct.
by id week value NOTSORTED;
if first.value then seq+1;
You can easily do this with arrays as well, but it's harder because you'll have to account for almost all weeks, so you can create seq1-seq8 to hold the values but I see issues with using this later on.
Untested because no data
data want;
set have;
array week(8) week1-week8;
array seq(8);
index=1;
do i=2 to dim(week);
if week(i)=week(i-1) then seq(index)=seq(index)+1;
else do;
index+1;
seq(index) = 1;
end;
end;
run;
This is very easy to do with SAS once your data is formatted properly. The proper format is long as opposed to wide, with columns ID, Week, and Place. With that format the sequence lengths will be given by :
proc sort data=have; by ID week; run;
data want;
do seqLength = 1 by 1 until(last.place);
set have; by id place notsorted;
end;
run;
How many sequences do you want? All? Just the first 3? It's possible that the number ranges from 1 to 52 for one ID.
Wouldn't sequence3 equal 3 (not 2) for the first ID?
Thanks for your quick reply@Astounding
I would like as many sequences as the observation with the most changes has. As i have 52 weeks in the original dataset, this would in theory be the maximum. The data i put up was only a small part used to illustrate the problem at hand.
And yes your right for the first observation the second sequence should equal 3
@mariusmorthorst wrote:
Dear All
I'm using SAS university edition. I'll outline my problem by first outlining my data. I've used the array command to create a variable for each week (in 2008) displaying whether an observation lives in place A or B.
I'm wondering whether it's possible, using the array command, to create new variables for each sequence respectively and in that variable for the specific sequence report the length of the sequence. In the example above there would at most be 3 new variables (lets call them sequence1, sequence2 and sequence3). For the first observation sequence1 = 2, sequence2 = 3 and sequence3= 2. For the second observation sequence1 = 8, sequence2 = 0 and sequence3= 0.
What rule assigns values to those sequence variables? It isn't obvious to me. Why is sequence 3=2 for the first record, I might think 3 was more appropriate for the shown data.
And with SAS 9.4 I get an error from spell(I).
If you are counting sequential recurrence of the same value are you allowing for up to 52 sequence variables? (Assuming that you only have 52 weeks (365 or 366 days in a year yields 52 and a fraction weeks so 53 may be needed).
data have;
infile cards expandtabs;
input ID (week1 week2 week3 week4 week5 week6 week7 week8) ($);
datalines;
1 A A B B B A A A
2 B B B B B B B B
3 B B A A A A A A
4 A A A A B B B B
;
proc transpose data=have out=temp;
by id;
var week:;
run;
proc summary data=temp;
by id col1 notsorted;
output out=temp1 ;
run;
proc transpose data=temp1 out=temp2 prefix=seq;
by id;
var _freq_;
run;
data temp3;
set temp2;
total=n(of seq:);
drop _name_;
run;
data want;
merge have temp3;
by id;
run;
proc print;run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.