DATA Step, Macro, Functions and more

Data Summary

Accepted Solution Solved
Reply
Contributor
Posts: 50
Accepted Solution

Data Summary

Hi All,

 

I have a dataset as follows -

data test;

INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;

datalines;

xx 3 1 1 1 1 . . . . . 1 . .

yy 0 1 1 1 1 1 1 1 1 1 1 1 1

pp 2 1 1 1 1 1 1 1 1 1 . 1 1

qq 4 1 . . . 1 1 1 1 1 . 1 1

rr 1 1 1 1 1 1 1 1 1 1 1 . .

;

run;

 

I am trying to create 12 columns - 1month, 2month,....,12month which looks into the time gap between

the change when it occurs from 1 to . (or 0)

 

So, essentially a table like below:

 

 

id 1m 2m 3m 4m 5m 6m 7m 8m 9m 10m 11m 12m Total
xx 1     1 1               3
yy                         0
pp 1               1       2
qq 2     1 1               4
rr                   1     1

 

 

Total is INDEX in table Test above.

 

Kind regards

 

SK

 


Accepted Solutions
Solution
‎07-27-2016 12:45 AM
Super User
Posts: 9,681

Re: Data Summary


data have;
INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
datalines;
xx 3 1 1 1 1 . . . . . 1 . .
yy 0 1 1 1 1 1 1 1 1 1 1 1 1
pp 2 1 1 1 1 1 1 1 1 1 . 1 1
qq 4 1 . . . 1 1 1 1 1 . 1 1
rr 1 1 1 1 1 1 1 1 1 1 1 . .
;
run;

data temp;
 set have;
 array x{*} Apr12--May11;
 do i=dim(x) to 1 by -1;
  value=x{i};output;
 end;
 keep id value;
run;
data temp1;
 set temp;
 by id notsorted;
 retain found;
 if first.id then found=0;
 if value ne lag(value) and id=lag(id) then found=1;
 if found;
run;
proc summary data=temp1;
 by id value notsorted;
 output out=temp2(keep=id _freq_) ;
run;
data month;
 do month=1 to 12;
  output;
 end;
run;
proc sql;
create table temp3 as
 select a.*,count
  from (select * from (select distinct id from have),month) as a
   left join 
    (select *,count(*) as count from temp2 group by id,_freq_) as b
      on a.id=b.id and a.month=b._freq_;
quit;
proc transpose data=temp3 out=want(drop=_name_) prefix=month;
 by id;
 var count;
 id month;
run;


View solution in original post


All Replies
Super User
Posts: 10,500

Re: Data Summary

[ Edited ]

I think you need to be a bit more specific about your comparison as it looks like you may also be doing a count of run length of . as well. Is that correct?.

 

Also in a minor note 1month is not a valid SAS variable name. Month1 to Month12 will work much better in general.

 

This seems to do what your are requesting:

data test;
   INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
   array m Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
   array c Month1-Month12;

   count=1;
   do i = 1 to (dim(m)-1);
      if m[i]=m[i+1] then count=sum(Count,1);
      else if i>1 then do;
         c[count]=sum(c[count],1);
         count=1;
      end;
      else if i=1 then do;
         c[1]=1;
         count=1;
      end;
   end;
   total=max(sum(of c(*)),0);
   drop i count;
datalines;
xx 3 1 1 1 1 . . . . . 1 . .
yy 0 1 1 1 1 1 1 1 1 1 1 1 1
pp 2 1 1 1 1 1 1 1 1 1 . 1 1
qq 4 1 . . . 1 1 1 1 1 . 1 1
rr 1 1 1 1 1 1 1 1 1 1 1 . .
;
run;
Occasional Contributor
Posts: 8

Re: Data Summary

Can you explain Ballardw how it run? is it possible to make two steps?.

Thank you
Super User
Posts: 10,500

Re: Data Summary


toconero wrote:
Can you explain Ballardw how it run? is it possible to make two steps?.

Thank you

If the base data set already exists then replace the informat and input with a SET statement referencing the existing data and remove the datalines block.

 

The code is basically looking at two values, for example the first and second. If they are the same we add to a counter of "sameness", if they differ than the existing count is the value to increment in the Month variable (a lousy name, RunLength would be more indicative of what it contains). After updating that we need to reset the count to one as we have started a new sequence.

Super Contributor
Posts: 254

Re: Data Summary

A shorter version of Ballardw can be made as:

data have;
INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
datalines;
xx 3 1 1 1 1 . . . . . 1 . .
yy 0 1 1 1 1 1 1 1 1 1 1 1 1
pp 2 1 1 1 1 1 1 1 1 1 . 1 1
qq 4 1 . . . 1 1 1 1 1 . 1 1
rr 1 1 1 1 1 1 1 1 1 1 1 . .
;
run;

data want;
   set have;
   total = 0;
   array k[*] Apr12 -- May11;
   do i = 1 to dim(k) - 1;
      if k[i] ^= k[i+1] then total + 1;
   end;
drop i;
run;

proc print data = want;
run;
Contributor
Posts: 50

Re: Data Summary

Sorry guys, for not being clear. The process below will only give the total #of changes per id. I am trying to obtain a matrix containing ID and 12 variables which are binary (1/0) - 1 month, 2 months,...., 12 months. So, "2 months = 1" would mean that it took 2 months for the status to change. So, for ID 'xx' , if you look at the SAS table 'have' created from the below procedure, there are following changes -

 

1. in Jul11- as the STATUS was . (0) between Jul11, so "1 month" = 1

2. in Aug11 - from 1 in Jul11 to . (0) in Aug11, so "1 month" = 1

3. in Jan12 - from . (0) in Aug11 to 1 in Jan12, so "5 month" = 1 as the status changed after 5 months

 

In total as the STATUS changed three times, so INDEX = 3.

 

Likelwise I am thinking of an array but I do not know a way of creating such a matrix in SAS.

 

KR

SK

Super User
Posts: 10,500

Re: Data Summary

Did attempt to run either of the proposed solutions?

BTW with 12 input values you only can have 11 changes.

Solution
‎07-27-2016 12:45 AM
Super User
Posts: 9,681

Re: Data Summary


data have;
INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
datalines;
xx 3 1 1 1 1 . . . . . 1 . .
yy 0 1 1 1 1 1 1 1 1 1 1 1 1
pp 2 1 1 1 1 1 1 1 1 1 . 1 1
qq 4 1 . . . 1 1 1 1 1 . 1 1
rr 1 1 1 1 1 1 1 1 1 1 1 . .
;
run;

data temp;
 set have;
 array x{*} Apr12--May11;
 do i=dim(x) to 1 by -1;
  value=x{i};output;
 end;
 keep id value;
run;
data temp1;
 set temp;
 by id notsorted;
 retain found;
 if first.id then found=0;
 if value ne lag(value) and id=lag(id) then found=1;
 if found;
run;
proc summary data=temp1;
 by id value notsorted;
 output out=temp2(keep=id _freq_) ;
run;
data month;
 do month=1 to 12;
  output;
 end;
run;
proc sql;
create table temp3 as
 select a.*,count
  from (select * from (select distinct id from have),month) as a
   left join 
    (select *,count(*) as count from temp2 group by id,_freq_) as b
      on a.id=b.id and a.month=b._freq_;
quit;
proc transpose data=temp3 out=want(drop=_name_) prefix=month;
 by id;
 var count;
 id month;
run;


☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 346 views
  • 5 likes
  • 5 in conversation