BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
KKS2
Fluorite | Level 6

Hi All

This is my first time posting so please excuse me if i have missed anything.

 

I use enterprise guide and i have a data set that looks like this below. With thousands of IDs. I want to add an extra variable(ENTRIES) that sums the number of times the ID entered over time. the dates go back many year and will continue going forwards.

 

Any help will be appreciated.

Thanks

KKS2

IDJul-17Aug-17Sep-17Oct-17Nov-17Dec-17Entries      
1111..111.1 (entered in Sept17)   
11121.11.13 (entered in July17 then Sept17 and then dec17)
1113....111 (entered in Nov17)   
1114.111111 (entered in Aug17)   
11151.11112      
11161.1...2      
11171...1.2      
11181 ..1.2      
1119.1.11.2      
11201.11.13      
1121.11...1      

 

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

To note, post test data in the form of a datastep, follow this post if unsure:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Use the code window which is the {i} above the post are to post these things.

 

There are two main ways to approach this.  The first is to stick with the model you have now for your data - to do this you would then employ arrays.  Array all the dated variables, then loop over that array turning on a flag when 1 is encountered and turning it off and adding 1 to entries when . or last, maybe something like:

data want;
  set have;
  array dates jul17--dec17;
  onflag=0;
  entries=0;
  do i=1 to dim(dates);
    if dates{i}=1 and onflag=0 then onflag=1;
    if dates{i}=. and onflag=1 then do;
      onflag=0;
      entries=sum(entries,1);
    end;
    if i=dim(dates) and onflag=1 then sum(entries,1);
  end;
run;

This will work, but as always, thinking in Excel terms, and having data transposed across the page is not by any means the best way to work with the data (or in fact from a resource or storage point of view).  Which leads onto the second method, normalising your data, so you have a layout like:

ID     DATE   FLAG

1111  Jul17    .

1111  Aug17  .

...

 

This type of data is far simpler an easier to program with and can still create your Excel transposed output with a simple call to proc transpose.  You can also effectively shrink your data down using this method, in your given case Jul17 has an entry for 1111, even though it doesn't capture anything in the above, we can filter down and only keep the information we actually need:

ID     DATE   FLAG

1111  Sep17 1 

1111  Oct17  1

...

In fact we can then go further than this and by a datastep with a by group select just the start and stop dates eliminating all the data in between to end with something like:
ID     START   END

1111 Sep17    Nov17

1112 Jul17      Jul17

1112 Sep17    Oct17

...

Thus drastically shrinking the data, and making things like counting number of entries, days within entries and such like far simpler.

View solution in original post

6 REPLIES 6
RW9
Diamond | Level 26 RW9
Diamond | Level 26

To note, post test data in the form of a datastep, follow this post if unsure:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Use the code window which is the {i} above the post are to post these things.

 

There are two main ways to approach this.  The first is to stick with the model you have now for your data - to do this you would then employ arrays.  Array all the dated variables, then loop over that array turning on a flag when 1 is encountered and turning it off and adding 1 to entries when . or last, maybe something like:

data want;
  set have;
  array dates jul17--dec17;
  onflag=0;
  entries=0;
  do i=1 to dim(dates);
    if dates{i}=1 and onflag=0 then onflag=1;
    if dates{i}=. and onflag=1 then do;
      onflag=0;
      entries=sum(entries,1);
    end;
    if i=dim(dates) and onflag=1 then sum(entries,1);
  end;
run;

This will work, but as always, thinking in Excel terms, and having data transposed across the page is not by any means the best way to work with the data (or in fact from a resource or storage point of view).  Which leads onto the second method, normalising your data, so you have a layout like:

ID     DATE   FLAG

1111  Jul17    .

1111  Aug17  .

...

 

This type of data is far simpler an easier to program with and can still create your Excel transposed output with a simple call to proc transpose.  You can also effectively shrink your data down using this method, in your given case Jul17 has an entry for 1111, even though it doesn't capture anything in the above, we can filter down and only keep the information we actually need:

ID     DATE   FLAG

1111  Sep17 1 

1111  Oct17  1

...

In fact we can then go further than this and by a datastep with a by group select just the start and stop dates eliminating all the data in between to end with something like:
ID     START   END

1111 Sep17    Nov17

1112 Jul17      Jul17

1112 Sep17    Oct17

...

Thus drastically shrinking the data, and making things like counting number of entries, days within entries and such like far simpler.

KKS2
Fluorite | Level 6

Thanks RW9. I will try this code.

i do have the data in this format:

ID     DATE   FLAG

1111  Sep17 1 

1111  Oct17  1

 

so if there is a code for this format, that will also be helpful.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Simplest if you have that is:

data want;
  set have;
  by id;
  if first.id or last.id then output;
  else if date ne intnx('month',lag(date),1) then output;
run;

That should then give you any non-consecutive dates (note I haven't tested as not typing in the test data), i.e. distinct periods.

Gangi
Obsidian | Level 7

you can use sum statement,

example as below,

 

data t1;
input id jul_17 Aug_17 sep_17 oct_17 nov_17 dec_17;
datalines;
1111 . . 1 1 1 1
1112 . . . . . 1
1113 1 1 . 1 . .
1114 . . . . 1 1
1115 1 . . . 1 .
1116 1 . . . 1 .
;
run;

 

 

data t2;
set t1;
ENTRIES= sum(of jul_17--Dec_17);
run;

 

accept it if it works..

Ksharp
Super User

data t1;
input id jul_17 Aug_17 sep_17 oct_17 nov_17 dec_17;
datalines;
1111 . . 1 1 1 1
1112 . . . . . 1
1113 1 1 . 1 . .
1114 . . . . 1 1
1115 1 . . . 1 .
1116 1 . . . 1 .
;
run;

data want;
 set t1;
 n=countw(cat(of jul_17--dec_17),'.');
run;
KKS2
Fluorite | Level 6

Thank you. this also works!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1597 views
  • 3 likes
  • 4 in conversation