Hi All,
I have a dataset as follows -
data test;
INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
datalines;
xx 3 1 1 1 1 . . . . . 1 . .
yy 0 1 1 1 1 1 1 1 1 1 1 1 1
pp 2 1 1 1 1 1 1 1 1 1 . 1 1
qq 4 1 . . . 1 1 1 1 1 . 1 1
rr 1 1 1 1 1 1 1 1 1 1 1 . .
;
run;
I am trying to create 12 columns - 1month, 2month,....,12month which looks into the time gap between
the change when it occurs from 1 to . (or 0)
So, essentially a table like below:
id | 1m | 2m | 3m | 4m | 5m | 6m | 7m | 8m | 9m | 10m | 11m | 12m | Total |
xx | 1 | 1 | 1 | 3 | |||||||||
yy | 0 | ||||||||||||
pp | 1 | 1 | 2 | ||||||||||
2 | 1 | 1 | 4 | ||||||||||
rr | 1 | 1 |
Total is INDEX in table Test above.
Kind regards
SK
data have; INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11; datalines; xx 3 1 1 1 1 . . . . . 1 . . yy 0 1 1 1 1 1 1 1 1 1 1 1 1 pp 2 1 1 1 1 1 1 1 1 1 . 1 1 qq 4 1 . . . 1 1 1 1 1 . 1 1 rr 1 1 1 1 1 1 1 1 1 1 1 . . ; run; data temp; set have; array x{*} Apr12--May11; do i=dim(x) to 1 by -1; value=x{i};output; end; keep id value; run; data temp1; set temp; by id notsorted; retain found; if first.id then found=0; if value ne lag(value) and id=lag(id) then found=1; if found; run; proc summary data=temp1; by id value notsorted; output out=temp2(keep=id _freq_) ; run; data month; do month=1 to 12; output; end; run; proc sql; create table temp3 as select a.*,count from (select * from (select distinct id from have),month) as a left join (select *,count(*) as count from temp2 group by id,_freq_) as b on a.id=b.id and a.month=b._freq_; quit; proc transpose data=temp3 out=want(drop=_name_) prefix=month; by id; var count; id month; run;
I think you need to be a bit more specific about your comparison as it looks like you may also be doing a count of run length of . as well. Is that correct?.
Also in a minor note 1month is not a valid SAS variable name. Month1 to Month12 will work much better in general.
This seems to do what your are requesting:
data test;
INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
array m Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
array c Month1-Month12;
count=1;
do i = 1 to (dim(m)-1);
if m[i]=m[i+1] then count=sum(Count,1);
else if i>1 then do;
c[count]=sum(c[count],1);
count=1;
end;
else if i=1 then do;
c[1]=1;
count=1;
end;
end;
total=max(sum(of c(*)),0);
drop i count;
datalines;
xx 3 1 1 1 1 . . . . . 1 . .
yy 0 1 1 1 1 1 1 1 1 1 1 1 1
pp 2 1 1 1 1 1 1 1 1 1 . 1 1
qq 4 1 . . . 1 1 1 1 1 . 1 1
rr 1 1 1 1 1 1 1 1 1 1 1 . .
;
run;
@toconero wrote:
Can you explain Ballardw how it run? is it possible to make two steps?.
Thank you
If the base data set already exists then replace the informat and input with a SET statement referencing the existing data and remove the datalines block.
The code is basically looking at two values, for example the first and second. If they are the same we add to a counter of "sameness", if they differ than the existing count is the value to increment in the Month variable (a lousy name, RunLength would be more indicative of what it contains). After updating that we need to reset the count to one as we have started a new sequence.
A shorter version of Ballardw can be made as:
data have;
INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
datalines;
xx 3 1 1 1 1 . . . . . 1 . .
yy 0 1 1 1 1 1 1 1 1 1 1 1 1
pp 2 1 1 1 1 1 1 1 1 1 . 1 1
qq 4 1 . . . 1 1 1 1 1 . 1 1
rr 1 1 1 1 1 1 1 1 1 1 1 . .
;
run;
data want;
set have;
total = 0;
array k[*] Apr12 -- May11;
do i = 1 to dim(k) - 1;
if k[i] ^= k[i+1] then total + 1;
end;
drop i;
run;
proc print data = want;
run;
Sorry guys, for not being clear. The process below will only give the total #of changes per id. I am trying to obtain a matrix containing ID and 12 variables which are binary (1/0) - 1 month, 2 months,...., 12 months. So, "2 months = 1" would mean that it took 2 months for the status to change. So, for ID 'xx' , if you look at the SAS table 'have' created from the below procedure, there are following changes -
1. in Jul11- as the STATUS was . (0) between Jul11, so "1 month" = 1
2. in Aug11 - from 1 in Jul11 to . (0) in Aug11, so "1 month" = 1
3. in Jan12 - from . (0) in Aug11 to 1 in Jan12, so "5 month" = 1 as the status changed after 5 months
In total as the STATUS changed three times, so INDEX = 3.
Likelwise I am thinking of an array but I do not know a way of creating such a matrix in SAS.
KR
SK
Did attempt to run either of the proposed solutions?
BTW with 12 input values you only can have 11 changes.
data have; INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11; datalines; xx 3 1 1 1 1 . . . . . 1 . . yy 0 1 1 1 1 1 1 1 1 1 1 1 1 pp 2 1 1 1 1 1 1 1 1 1 . 1 1 qq 4 1 . . . 1 1 1 1 1 . 1 1 rr 1 1 1 1 1 1 1 1 1 1 1 . . ; run; data temp; set have; array x{*} Apr12--May11; do i=dim(x) to 1 by -1; value=x{i};output; end; keep id value; run; data temp1; set temp; by id notsorted; retain found; if first.id then found=0; if value ne lag(value) and id=lag(id) then found=1; if found; run; proc summary data=temp1; by id value notsorted; output out=temp2(keep=id _freq_) ; run; data month; do month=1 to 12; output; end; run; proc sql; create table temp3 as select a.*,count from (select * from (select distinct id from have),month) as a left join (select *,count(*) as count from temp2 group by id,_freq_) as b on a.id=b.id and a.month=b._freq_; quit; proc transpose data=temp3 out=want(drop=_name_) prefix=month; by id; var count; id month; run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.