BookmarkSubscribeRSS Feed
mariusmorthorst
Calcite | Level 5

Dear All 

I'm using SAS university edition. I'll outline my problem by first outlining my data.  I've used the array command to create a variable for each week (in 2008) displaying whether an observation lives in place A or B. 


As my dataset is confidentiel I've created a quick look-a-like-version here. 

As you see from the picture each observation can possibly move back and forth. 

Skærmbillede 2018-03-21 20.29.31.png

I'm wondering whether it's possible, using the array command, to create new variables for each sequence respectively and in that variable for the specific sequence report the length of the sequence.  In the example above there would at most be 3 new variables (lets call them sequence1, sequence2 and sequence3). For the first observation sequence1 = 2, sequence2 = 3 and sequence3= 2. For the second observation  sequence1 = 8, sequence2 = 0 and sequence3= 0. 

So far i've only succeed in capturing the length of the first sequence by the following code

data newdata (compress=binary);
set olddata;
array weeks (52) $2;

endB=.;
periodB=.;

if indicatorB = 1 then /* where indicatorB shows whether or not B is present within any of the weeks variables */
do i=startB to 52 by 1 UNTIL(spell(i) ne 'B'); /* where startB is a variable telling me in what week we first observe B (between 1-52) */
if weeks(i) in ('B') then endB = i;
end;
periodB=endB-startB+1;
run;

Hope you can help me update my code so  I can capture the different lengths of multiple sequences 

Kind regards

Marius 

9 REPLIES 9
novinosrin
Tourmaline | Level 20

HI @mariusmorthorst  Good afternoon, If you could post the sample of what you HAVE and what you WANT as plain text, a lazy person like me would be interested to copy paste the data on to my sas software and do some program/testing. Thank you!

mariusmorthorst
Calcite | Level 5

hi @novinosrin 

Thanks for your quick reply

Is the following sufficient 

What i have 

 

IDweek 1 week 2 week 3 week 4week 5week 6week 7week 8
1AABBBAAA
2BBBBBBBB
3BBAAAAAA
4AAAABBBB
 
 
What i want
IDweek 1 week 2 week 3 week 4week 5week 6week 7week 8sequence1sequence2sequence3number of sequence
1AABBBAAA2333
2BBBBBBBB8001
3BBAAAAAA2602
4AAAABBBB4402
 
 
 
novinosrin
Tourmaline | Level 20
data have;
input ID	(week1 	week2 	week3 	week4	week5	week6	week7	week8) ($);
datalines;
1	A	A	B	B	B	A	A	A
2	B	B	B	B	B	B	B	B
3	B	B	A	A	A	A	A	A
4	A	A	A	A	B	B	B	B
;


data want;
set have;
array t(*) week:;
array s(*) seq1-seq3 ;
i=0;
_count=1;
do _n_=2 to dim(t);
if t(_n_)=t(_n_-1) then _count+1;
else  do;i=i+1;s(i)=_count;_count=1;continue;end;
if _n_=dim(t) then do;i=i+1;s(i)=_count;end;
end;
number_of_sequence=n(of s(*));
drop _: i;
run;

Yes you can use arrays. I like your thinking and at the same time I concur with other et al's recommendations too. Have fun and welcome to SAS forum. Cheers! 

Reeza
Super User

I would recommend transposing your data to a long format, then the problem is trivial.


use the approach outlined here:

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

 

With the modification of the NOTSORTED option on your BY statement so that the order is correct.

 

by id week value NOTSORTED;

if first.value then seq+1;

You can easily do this with arrays as well, but it's harder because you'll have to account for almost all weeks, so you can create seq1-seq8 to hold the values but I see issues with using this later on. 

 

Untested because no data

 

data want;
set have;
array week(8) week1-week8;
array seq(8);

index=1;
do i=2 to dim(week);
if week(i)=week(i-1) then seq(index)=seq(index)+1;
else do;
index+1;
seq(index) = 1;
end;
end;
run;
PGStats
Opal | Level 21

This is very easy to do with SAS once your data is formatted properly. The proper format is long as opposed to wide, with columns ID, Week, and Place. With that format the sequence lengths will be given by :

 

proc sort data=have; by ID week; run;

data want;
do seqLength = 1 by 1 until(last.place);
    set have; by id place notsorted;
    end;
run;
PG
Astounding
PROC Star

How many sequences do you want?  All?  Just the first 3?  It's possible that the number ranges from 1 to 52 for one ID.

 

Wouldn't sequence3 equal 3 (not 2) for the first ID?

mariusmorthorst
Calcite | Level 5

Thanks for your quick reply@Astounding

I would like as many sequences as the observation with the most changes has. As i have 52 weeks in the original dataset, this would in theory be the maximum. The data i put up was only a small part used to illustrate the problem at hand. 

And yes your right for the first observation the second sequence should equal 3 

ballardw
Super User

@mariusmorthorst wrote:

Dear All 

I'm using SAS university edition. I'll outline my problem by first outlining my data.  I've used the array command to create a variable for each week (in 2008) displaying whether an observation lives in place A or B. 

 

I'm wondering whether it's possible, using the array command, to create new variables for each sequence respectively and in that variable for the specific sequence report the length of the sequence.  In the example above there would at most be 3 new variables (lets call them sequence1, sequence2 and sequence3). For the first observation sequence1 = 2, sequence2 = 3 and sequence3= 2. For the second observation  sequence1 = 8, sequence2 = 0 and sequence3= 0. 


What rule assigns values to those sequence variables? It isn't obvious to me. Why is sequence 3=2 for the first record, I might think 3 was more appropriate for the shown data.

 

And with SAS 9.4 I get an error from spell(I).

 

If you are counting sequential recurrence of the same value are you allowing for up to 52 sequence variables? (Assuming that you only have 52 weeks (365 or 366 days in a year yields 52 and a fraction weeks so 53 may be needed).

Ksharp
Super User
data have;
infile cards expandtabs;
input ID	(week1 	week2 	week3 	week4	week5	week6	week7	week8) ($);
datalines;
1	A	A	B	B	B	A	A	A
2	B	B	B	B	B	B	B	B
3	B	B	A	A	A	A	A	A
4	A	A	A	A	B	B	B	B
;
proc transpose data=have out=temp;
by id;
var week:;
run;

proc summary data=temp;
by id col1 notsorted;
output out=temp1 ;
run;

proc transpose data=temp1 out=temp2 prefix=seq;
by id;
var _freq_;
run;

data temp3;
 set temp2;
 total=n(of seq:);
 drop _name_;
run;

data want;
 merge have temp3;
 by id;
run;
proc print;run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1517 views
  • 0 likes
  • 7 in conversation