DATA Step, Macro, Functions and more

Can I use arrays to capture "length" of multiple sequences of variables?

Reply
New Contributor
Posts: 4

Can I use arrays to capture "length" of multiple sequences of variables?

[ Edited ]

Dear All 

I'm using SAS university edition. I'll outline my problem by first outlining my data.  I've used the array command to create a variable for each week (in 2008) displaying whether an observation lives in place A or B. 


As my dataset is confidentiel I've created a quick look-a-like-version here. 

As you see from the picture each observation can possibly move back and forth. 

Skærmbillede 2018-03-21 20.29.31.png

I'm wondering whether it's possible, using the array command, to create new variables for each sequence respectively and in that variable for the specific sequence report the length of the sequence.  In the example above there would at most be 3 new variables (lets call them sequence1, sequence2 and sequence3). For the first observation sequence1 = 2, sequence2 = 3 and sequence3= 2. For the second observation  sequence1 = 8, sequence2 = 0 and sequence3= 0. 

So far i've only succeed in capturing the length of the first sequence by the following code

data newdata (compress=binary);
set olddata;
array weeks (52) $2;

endB=.;
periodB=.;

if indicatorB = 1 then /* where indicatorB shows whether or not B is present within any of the weeks variables */
do i=startB to 52 by 1 UNTIL(spell(i) ne 'B'); /* where startB is a variable telling me in what week we first observe B (between 1-52) */
if weeks(i) in ('B') then endB = i;
end;
periodB=endB-startB+1;
run;

Hope you can help me update my code so  I can capture the different lengths of multiple sequences 

Kind regards

Marius 

PROC Star
Posts: 1,282

Re: Can I use arrays to capture "length" of multiple sequences of variables?

[ Edited ]
Posted in reply to mariusmorthorst

HI @mariusmorthorst  Good afternoon, If you could post the sample of what you HAVE and what you WANT as plain text, a lazy person like me would be interested to copy paste the data on to my sas software and do some program/testing. Thank you!

New Contributor
Posts: 4

Re: Can I use arrays to capture "length" of multiple sequences of variables?

[ Edited ]
Posted in reply to novinosrin

hi @novinosrin 

Thanks for your quick reply

Is the following sufficient 

What i have 

 

IDweek 1 week 2 week 3 week 4week 5week 6week 7week 8
1AABBBAAA
2BBBBBBBB
3BBAAAAAA
4AAAABBBB
 
 
What i want
IDweek 1 week 2 week 3 week 4week 5week 6week 7week 8sequence1sequence2sequence3number of sequence
1AABBBAAA2333
2BBBBBBBB8001
3BBAAAAAA2602
4AAAABBBB4402
 
 
 
PROC Star
Posts: 1,282

Re: Can I use arrays to capture "length" of multiple sequences of variables?

[ Edited ]
Posted in reply to mariusmorthorst
data have;
input ID	(week1 	week2 	week3 	week4	week5	week6	week7	week8) ($);
datalines;
1	A	A	B	B	B	A	A	A
2	B	B	B	B	B	B	B	B
3	B	B	A	A	A	A	A	A
4	A	A	A	A	B	B	B	B
;


data want;
set have;
array t(*) week:;
array s(*) seq1-seq3 ;
i=0;
_count=1;
do _n_=2 to dim(t);
if t(_n_)=t(_n_-1) then _count+1;
else  do;i=i+1;s(i)=_count;_count=1;continue;end;
if _n_=dim(t) then do;i=i+1;s(i)=_count;end;
end;
number_of_sequence=n(of s(*));
drop _: i;
run;

Yes you can use arrays. I like your thinking and at the same time I concur with other et al's recommendations too. Have fun and welcome to SAS forum. Cheers! 

Super User
Posts: 22,818

Re: Can I use arrays to capture "length" of multiple sequences of variables?

Posted in reply to mariusmorthorst

I would recommend transposing your data to a long format, then the problem is trivial.


use the approach outlined here:

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

 

With the modification of the NOTSORTED option on your BY statement so that the order is correct.

 

by id week value NOTSORTED;

if first.value then seq+1;

You can easily do this with arrays as well, but it's harder because you'll have to account for almost all weeks, so you can create seq1-seq8 to hold the values but I see issues with using this later on. 

 

Untested because no data

 

data want;
set have;
array week(8) week1-week8;
array seq(8);

index=1;
do i=2 to dim(week);
if week(i)=week(i-1) then seq(index)=seq(index)+1;
else do;
index+1;
seq(index) = 1;
end;
end;
run;
Esteemed Advisor
Posts: 5,390

Re: Can I use arrays to capture "length" of multiple sequences of variables?

[ Edited ]
Posted in reply to mariusmorthorst

This is very easy to do with SAS once your data is formatted properly. The proper format is long as opposed to wide, with columns ID, Week, and Place. With that format the sequence lengths will be given by :

 

proc sort data=have; by ID week; run;

data want;
do seqLength = 1 by 1 until(last.place);
    set have; by id place notsorted;
    end;
run;
PG
Super User
Posts: 6,522

Re: Can I use arrays to capture "length" of multiple sequences of variables?

Posted in reply to mariusmorthorst

How many sequences do you want?  All?  Just the first 3?  It's possible that the number ranges from 1 to 52 for one ID.

 

Wouldn't sequence3 equal 3 (not 2) for the first ID?

New Contributor
Posts: 4

Re: Can I use arrays to capture "length" of multiple sequences of variables?

Posted in reply to Astounding

Thanks for your quick reply@Astounding

I would like as many sequences as the observation with the most changes has. As i have 52 weeks in the original dataset, this would in theory be the maximum. The data i put up was only a small part used to illustrate the problem at hand. 

And yes your right for the first observation the second sequence should equal 3 

Super User
Posts: 12,994

Re: Can I use arrays to capture "length" of multiple sequences of variables?

[ Edited ]
Posted in reply to mariusmorthorst

@mariusmorthorst wrote:

Dear All 

I'm using SAS university edition. I'll outline my problem by first outlining my data.  I've used the array command to create a variable for each week (in 2008) displaying whether an observation lives in place A or B. 

 

I'm wondering whether it's possible, using the array command, to create new variables for each sequence respectively and in that variable for the specific sequence report the length of the sequence.  In the example above there would at most be 3 new variables (lets call them sequence1, sequence2 and sequence3). For the first observation sequence1 = 2, sequence2 = 3 and sequence3= 2. For the second observation  sequence1 = 8, sequence2 = 0 and sequence3= 0. 


What rule assigns values to those sequence variables? It isn't obvious to me. Why is sequence 3=2 for the first record, I might think 3 was more appropriate for the shown data.

 

And with SAS 9.4 I get an error from spell(I).

 

If you are counting sequential recurrence of the same value are you allowing for up to 52 sequence variables? (Assuming that you only have 52 weeks (365 or 366 days in a year yields 52 and a fraction weeks so 53 may be needed).

Super User
Posts: 10,609

Re: Can I use arrays to capture "length" of multiple sequences of variables?

Posted in reply to mariusmorthorst
data have;
infile cards expandtabs;
input ID	(week1 	week2 	week3 	week4	week5	week6	week7	week8) ($);
datalines;
1	A	A	B	B	B	A	A	A
2	B	B	B	B	B	B	B	B
3	B	B	A	A	A	A	A	A
4	A	A	A	A	B	B	B	B
;
proc transpose data=have out=temp;
by id;
var week:;
run;

proc summary data=temp;
by id col1 notsorted;
output out=temp1 ;
run;

proc transpose data=temp1 out=temp2 prefix=seq;
by id;
var _freq_;
run;

data temp3;
 set temp2;
 total=n(of seq:);
 drop _name_;
run;

data want;
 merge have temp3;
 by id;
run;
proc print;run;
Ask a Question
Discussion stats
  • 9 replies
  • 112 views
  • 0 likes
  • 7 in conversation