Desktop productivity for business analysts and programmers

Count number of changes in variables over time

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Count number of changes in variables over time

Hi All

This is my first time posting so please excuse me if i have missed anything.

 

I use enterprise guide and i have a data set that looks like this below. With thousands of IDs. I want to add an extra variable(ENTRIES) that sums the number of times the ID entered over time. the dates go back many year and will continue going forwards.

 

Any help will be appreciated.

Thanks

KKS2

IDJul-17Aug-17Sep-17Oct-17Nov-17Dec-17Entries      
1111..111.1 (entered in Sept17)   
11121.11.13 (entered in July17 then Sept17 and then dec17)
1113....111 (entered in Nov17)   
1114.111111 (entered in Aug17)   
11151.11112      
11161.1...2      
11171...1.2      
11181 ..1.2      
1119.1.11.2      
11201.11.13      
1121.11...1      

 


Accepted Solutions
Solution
‎01-31-2018 06:28 AM
Super User
Super User
Posts: 9,839

Re: Count number of changes in variables over time

To note, post test data in the form of a datastep, follow this post if unsure:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Use the code window which is the {i} above the post are to post these things.

 

There are two main ways to approach this.  The first is to stick with the model you have now for your data - to do this you would then employ arrays.  Array all the dated variables, then loop over that array turning on a flag when 1 is encountered and turning it off and adding 1 to entries when . or last, maybe something like:

data want;
  set have;
  array dates jul17--dec17;
  onflag=0;
  entries=0;
  do i=1 to dim(dates);
    if dates{i}=1 and onflag=0 then onflag=1;
    if dates{i}=. and onflag=1 then do;
      onflag=0;
      entries=sum(entries,1);
    end;
    if i=dim(dates) and onflag=1 then sum(entries,1);
  end;
run;

This will work, but as always, thinking in Excel terms, and having data transposed across the page is not by any means the best way to work with the data (or in fact from a resource or storage point of view).  Which leads onto the second method, normalising your data, so you have a layout like:

ID     DATE   FLAG

1111  Jul17    .

1111  Aug17  .

...

 

This type of data is far simpler an easier to program with and can still create your Excel transposed output with a simple call to proc transpose.  You can also effectively shrink your data down using this method, in your given case Jul17 has an entry for 1111, even though it doesn't capture anything in the above, we can filter down and only keep the information we actually need:

ID     DATE   FLAG

1111  Sep17 1 

1111  Oct17  1

...

In fact we can then go further than this and by a datastep with a by group select just the start and stop dates eliminating all the data in between to end with something like:
ID     START   END

1111 Sep17    Nov17

1112 Jul17      Jul17

1112 Sep17    Oct17

...

Thus drastically shrinking the data, and making things like counting number of entries, days within entries and such like far simpler.

View solution in original post


All Replies
Solution
‎01-31-2018 06:28 AM
Super User
Super User
Posts: 9,839

Re: Count number of changes in variables over time

To note, post test data in the form of a datastep, follow this post if unsure:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Use the code window which is the {i} above the post are to post these things.

 

There are two main ways to approach this.  The first is to stick with the model you have now for your data - to do this you would then employ arrays.  Array all the dated variables, then loop over that array turning on a flag when 1 is encountered and turning it off and adding 1 to entries when . or last, maybe something like:

data want;
  set have;
  array dates jul17--dec17;
  onflag=0;
  entries=0;
  do i=1 to dim(dates);
    if dates{i}=1 and onflag=0 then onflag=1;
    if dates{i}=. and onflag=1 then do;
      onflag=0;
      entries=sum(entries,1);
    end;
    if i=dim(dates) and onflag=1 then sum(entries,1);
  end;
run;

This will work, but as always, thinking in Excel terms, and having data transposed across the page is not by any means the best way to work with the data (or in fact from a resource or storage point of view).  Which leads onto the second method, normalising your data, so you have a layout like:

ID     DATE   FLAG

1111  Jul17    .

1111  Aug17  .

...

 

This type of data is far simpler an easier to program with and can still create your Excel transposed output with a simple call to proc transpose.  You can also effectively shrink your data down using this method, in your given case Jul17 has an entry for 1111, even though it doesn't capture anything in the above, we can filter down and only keep the information we actually need:

ID     DATE   FLAG

1111  Sep17 1 

1111  Oct17  1

...

In fact we can then go further than this and by a datastep with a by group select just the start and stop dates eliminating all the data in between to end with something like:
ID     START   END

1111 Sep17    Nov17

1112 Jul17      Jul17

1112 Sep17    Oct17

...

Thus drastically shrinking the data, and making things like counting number of entries, days within entries and such like far simpler.

New Contributor
Posts: 3

Re: Count number of changes in variables over time

Thanks RW9. I will try this code.

i do have the data in this format:

ID     DATE   FLAG

1111  Sep17 1 

1111  Oct17  1

 

so if there is a code for this format, that will also be helpful.

Super User
Super User
Posts: 9,839

Re: Count number of changes in variables over time

Simplest if you have that is:

data want;
  set have;
  by id;
  if first.id or last.id then output;
  else if date ne intnx('month',lag(date),1) then output;
run;

That should then give you any non-consecutive dates (note I haven't tested as not typing in the test data), i.e. distinct periods.

Occasional Contributor
Posts: 11

Re: Count number of changes in variables over time

you can use sum statement,

example as below,

 

data t1;
input id jul_17 Aug_17 sep_17 oct_17 nov_17 dec_17;
datalines;
1111 . . 1 1 1 1
1112 . . . . . 1
1113 1 1 . 1 . .
1114 . . . . 1 1
1115 1 . . . 1 .
1116 1 . . . 1 .
;
run;

 

 

data t2;
set t1;
ENTRIES= sum(of jul_17--Dec_17);
run;

 

accept it if it works..

Super User
Posts: 10,849

Re: Count number of changes in variables over time


data t1;
input id jul_17 Aug_17 sep_17 oct_17 nov_17 dec_17;
datalines;
1111 . . 1 1 1 1
1112 . . . . . 1
1113 1 1 . 1 . .
1114 . . . . 1 1
1115 1 . . . 1 .
1116 1 . . . 1 .
;
run;

data want;
 set t1;
 n=countw(cat(of jul_17--dec_17),'.');
run;
New Contributor
Posts: 3

Re: Count number of changes in variables over time

Thank you. this also works!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 154 views
  • 3 likes
  • 4 in conversation