Hi All,
I have a dataset as follows -
data test;
INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
datalines;
xx 3 1 1 1 1 . . . . . 1 . .
yy 0 1 1 1 1 1 1 1 1 1 1 1 1
pp 2 1 1 1 1 1 1 1 1 1 . 1 1
qq 4 1 . . . 1 1 1 1 1 . 1 1
rr 1 1 1 1 1 1 1 1 1 1 1 . .
;
run;
I am trying to create 12 columns - 1month, 2month,....,12month which looks into the time gap between
the change when it occurs from 1 to . (or 0)
So, essentially a table like below:
id | 1m | 2m | 3m | 4m | 5m | 6m | 7m | 8m | 9m | 10m | 11m | 12m | Total |
xx | 1 | 1 | 1 | 3 | |||||||||
yy | 0 | ||||||||||||
pp | 1 | 1 | 2 | ||||||||||
2 | 1 | 1 | 4 | ||||||||||
rr | 1 | 1 |
Total is INDEX in table Test above.
Kind regards
SK
data have; INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11; datalines; xx 3 1 1 1 1 . . . . . 1 . . yy 0 1 1 1 1 1 1 1 1 1 1 1 1 pp 2 1 1 1 1 1 1 1 1 1 . 1 1 qq 4 1 . . . 1 1 1 1 1 . 1 1 rr 1 1 1 1 1 1 1 1 1 1 1 . . ; run; data temp; set have; array x{*} Apr12--May11; do i=dim(x) to 1 by -1; value=x{i};output; end; keep id value; run; data temp1; set temp; by id notsorted; retain found; if first.id then found=0; if value ne lag(value) and id=lag(id) then found=1; if found; run; proc summary data=temp1; by id value notsorted; output out=temp2(keep=id _freq_) ; run; data month; do month=1 to 12; output; end; run; proc sql; create table temp3 as select a.*,count from (select * from (select distinct id from have),month) as a left join (select *,count(*) as count from temp2 group by id,_freq_) as b on a.id=b.id and a.month=b._freq_; quit; proc transpose data=temp3 out=want(drop=_name_) prefix=month; by id; var count; id month; run;
I think you need to be a bit more specific about your comparison as it looks like you may also be doing a count of run length of . as well. Is that correct?.
Also in a minor note 1month is not a valid SAS variable name. Month1 to Month12 will work much better in general.
This seems to do what your are requesting:
data test;
INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
array m Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
array c Month1-Month12;
count=1;
do i = 1 to (dim(m)-1);
if m[i]=m[i+1] then count=sum(Count,1);
else if i>1 then do;
c[count]=sum(c[count],1);
count=1;
end;
else if i=1 then do;
c[1]=1;
count=1;
end;
end;
total=max(sum(of c(*)),0);
drop i count;
datalines;
xx 3 1 1 1 1 . . . . . 1 . .
yy 0 1 1 1 1 1 1 1 1 1 1 1 1
pp 2 1 1 1 1 1 1 1 1 1 . 1 1
qq 4 1 . . . 1 1 1 1 1 . 1 1
rr 1 1 1 1 1 1 1 1 1 1 1 . .
;
run;
@toconero wrote:
Can you explain Ballardw how it run? is it possible to make two steps?.
Thank you
If the base data set already exists then replace the informat and input with a SET statement referencing the existing data and remove the datalines block.
The code is basically looking at two values, for example the first and second. If they are the same we add to a counter of "sameness", if they differ than the existing count is the value to increment in the Month variable (a lousy name, RunLength would be more indicative of what it contains). After updating that we need to reset the count to one as we have started a new sequence.
A shorter version of Ballardw can be made as:
data have;
INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
datalines;
xx 3 1 1 1 1 . . . . . 1 . .
yy 0 1 1 1 1 1 1 1 1 1 1 1 1
pp 2 1 1 1 1 1 1 1 1 1 . 1 1
qq 4 1 . . . 1 1 1 1 1 . 1 1
rr 1 1 1 1 1 1 1 1 1 1 1 . .
;
run;
data want;
set have;
total = 0;
array k[*] Apr12 -- May11;
do i = 1 to dim(k) - 1;
if k[i] ^= k[i+1] then total + 1;
end;
drop i;
run;
proc print data = want;
run;
Sorry guys, for not being clear. The process below will only give the total #of changes per id. I am trying to obtain a matrix containing ID and 12 variables which are binary (1/0) - 1 month, 2 months,...., 12 months. So, "2 months = 1" would mean that it took 2 months for the status to change. So, for ID 'xx' , if you look at the SAS table 'have' created from the below procedure, there are following changes -
1. in Jul11- as the STATUS was . (0) between Jul11, so "1 month" = 1
2. in Aug11 - from 1 in Jul11 to . (0) in Aug11, so "1 month" = 1
3. in Jan12 - from . (0) in Aug11 to 1 in Jan12, so "5 month" = 1 as the status changed after 5 months
In total as the STATUS changed three times, so INDEX = 3.
Likelwise I am thinking of an array but I do not know a way of creating such a matrix in SAS.
KR
SK
Did attempt to run either of the proposed solutions?
BTW with 12 input values you only can have 11 changes.
data have; INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11; datalines; xx 3 1 1 1 1 . . . . . 1 . . yy 0 1 1 1 1 1 1 1 1 1 1 1 1 pp 2 1 1 1 1 1 1 1 1 1 . 1 1 qq 4 1 . . . 1 1 1 1 1 . 1 1 rr 1 1 1 1 1 1 1 1 1 1 1 . . ; run; data temp; set have; array x{*} Apr12--May11; do i=dim(x) to 1 by -1; value=x{i};output; end; keep id value; run; data temp1; set temp; by id notsorted; retain found; if first.id then found=0; if value ne lag(value) and id=lag(id) then found=1; if found; run; proc summary data=temp1; by id value notsorted; output out=temp2(keep=id _freq_) ; run; data month; do month=1 to 12; output; end; run; proc sql; create table temp3 as select a.*,count from (select * from (select distinct id from have),month) as a left join (select *,count(*) as count from temp2 group by id,_freq_) as b on a.id=b.id and a.month=b._freq_; quit; proc transpose data=temp3 out=want(drop=_name_) prefix=month; by id; var count; id month; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.