BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Siddharth123
Obsidian | Level 7

Hi All,

 

I have a dataset as follows -

data test;

INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;

datalines;

xx 3 1 1 1 1 . . . . . 1 . .

yy 0 1 1 1 1 1 1 1 1 1 1 1 1

pp 2 1 1 1 1 1 1 1 1 1 . 1 1

qq 4 1 . . . 1 1 1 1 1 . 1 1

rr 1 1 1 1 1 1 1 1 1 1 1 . .

;

run;

 

I am trying to create 12 columns - 1month, 2month,....,12month which looks into the time gap between

the change when it occurs from 1 to . (or 0)

 

So, essentially a table like below:

 

 

id 1m 2m 3m 4m 5m 6m 7m 8m 9m 10m 11m 12m Total
xx 1     1 1               3
yy                         0
pp 1               1       2
qq 2     1 1               4
rr                   1     1

 

 

Total is INDEX in table Test above.

 

Kind regards

 

SK

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

data have;
INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
datalines;
xx 3 1 1 1 1 . . . . . 1 . .
yy 0 1 1 1 1 1 1 1 1 1 1 1 1
pp 2 1 1 1 1 1 1 1 1 1 . 1 1
qq 4 1 . . . 1 1 1 1 1 . 1 1
rr 1 1 1 1 1 1 1 1 1 1 1 . .
;
run;

data temp;
 set have;
 array x{*} Apr12--May11;
 do i=dim(x) to 1 by -1;
  value=x{i};output;
 end;
 keep id value;
run;
data temp1;
 set temp;
 by id notsorted;
 retain found;
 if first.id then found=0;
 if value ne lag(value) and id=lag(id) then found=1;
 if found;
run;
proc summary data=temp1;
 by id value notsorted;
 output out=temp2(keep=id _freq_) ;
run;
data month;
 do month=1 to 12;
  output;
 end;
run;
proc sql;
create table temp3 as
 select a.*,count
  from (select * from (select distinct id from have),month) as a
   left join 
    (select *,count(*) as count from temp2 group by id,_freq_) as b
      on a.id=b.id and a.month=b._freq_;
quit;
proc transpose data=temp3 out=want(drop=_name_) prefix=month;
 by id;
 var count;
 id month;
run;


View solution in original post

7 REPLIES 7
ballardw
Super User

I think you need to be a bit more specific about your comparison as it looks like you may also be doing a count of run length of . as well. Is that correct?.

 

Also in a minor note 1month is not a valid SAS variable name. Month1 to Month12 will work much better in general.

 

This seems to do what your are requesting:

data test;
   INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
   array m Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
   array c Month1-Month12;

   count=1;
   do i = 1 to (dim(m)-1);
      if m[i]=m[i+1] then count=sum(Count,1);
      else if i>1 then do;
         c[count]=sum(c[count],1);
         count=1;
      end;
      else if i=1 then do;
         c[1]=1;
         count=1;
      end;
   end;
   total=max(sum(of c(*)),0);
   drop i count;
datalines;
xx 3 1 1 1 1 . . . . . 1 . .
yy 0 1 1 1 1 1 1 1 1 1 1 1 1
pp 2 1 1 1 1 1 1 1 1 1 . 1 1
qq 4 1 . . . 1 1 1 1 1 . 1 1
rr 1 1 1 1 1 1 1 1 1 1 1 . .
;
run;
toconero
Fluorite | Level 6
Can you explain Ballardw how it run? is it possible to make two steps?.

Thank you
ballardw
Super User

@toconero wrote:
Can you explain Ballardw how it run? is it possible to make two steps?.

Thank you

If the base data set already exists then replace the informat and input with a SET statement referencing the existing data and remove the datalines block.

 

The code is basically looking at two values, for example the first and second. If they are the same we add to a counter of "sameness", if they differ than the existing count is the value to increment in the Month variable (a lousy name, RunLength would be more indicative of what it contains). After updating that we need to reset the count to one as we have started a new sequence.

KachiM
Rhodochrosite | Level 12

A shorter version of Ballardw can be made as:

data have;
INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
datalines;
xx 3 1 1 1 1 . . . . . 1 . .
yy 0 1 1 1 1 1 1 1 1 1 1 1 1
pp 2 1 1 1 1 1 1 1 1 1 . 1 1
qq 4 1 . . . 1 1 1 1 1 . 1 1
rr 1 1 1 1 1 1 1 1 1 1 1 . .
;
run;

data want;
   set have;
   total = 0;
   array k[*] Apr12 -- May11;
   do i = 1 to dim(k) - 1;
      if k[i] ^= k[i+1] then total + 1;
   end;
drop i;
run;

proc print data = want;
run;
Siddharth123
Obsidian | Level 7

Sorry guys, for not being clear. The process below will only give the total #of changes per id. I am trying to obtain a matrix containing ID and 12 variables which are binary (1/0) - 1 month, 2 months,...., 12 months. So, "2 months = 1" would mean that it took 2 months for the status to change. So, for ID 'xx' , if you look at the SAS table 'have' created from the below procedure, there are following changes -

 

1. in Jul11- as the STATUS was . (0) between Jul11, so "1 month" = 1

2. in Aug11 - from 1 in Jul11 to . (0) in Aug11, so "1 month" = 1

3. in Jan12 - from . (0) in Aug11 to 1 in Jan12, so "5 month" = 1 as the status changed after 5 months

 

In total as the STATUS changed three times, so INDEX = 3.

 

Likelwise I am thinking of an array but I do not know a way of creating such a matrix in SAS.

 

KR

SK

ballardw
Super User

Did attempt to run either of the proposed solutions?

BTW with 12 input values you only can have 11 changes.

Ksharp
Super User

data have;
INPUT id $2. INDEX Apr12 Mar12 Feb12 Jan12 Dec11 Nov11 Oct11 Sep11 Aug11 Jul11 Jun11 May11;
datalines;
xx 3 1 1 1 1 . . . . . 1 . .
yy 0 1 1 1 1 1 1 1 1 1 1 1 1
pp 2 1 1 1 1 1 1 1 1 1 . 1 1
qq 4 1 . . . 1 1 1 1 1 . 1 1
rr 1 1 1 1 1 1 1 1 1 1 1 . .
;
run;

data temp;
 set have;
 array x{*} Apr12--May11;
 do i=dim(x) to 1 by -1;
  value=x{i};output;
 end;
 keep id value;
run;
data temp1;
 set temp;
 by id notsorted;
 retain found;
 if first.id then found=0;
 if value ne lag(value) and id=lag(id) then found=1;
 if found;
run;
proc summary data=temp1;
 by id value notsorted;
 output out=temp2(keep=id _freq_) ;
run;
data month;
 do month=1 to 12;
  output;
 end;
run;
proc sql;
create table temp3 as
 select a.*,count
  from (select * from (select distinct id from have),month) as a
   left join 
    (select *,count(*) as count from temp2 group by id,_freq_) as b
      on a.id=b.id and a.month=b._freq_;
quit;
proc transpose data=temp3 out=want(drop=_name_) prefix=month;
 by id;
 var count;
 id month;
run;


sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1434 views
  • 5 likes
  • 5 in conversation