BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Coder_12
Calcite | Level 5

Hi there.


I've got stuck with my SAS code.

 

My aim: To find the average gap of a row (but only a gap where there is either a year before or after the gap) So if someone joins for 2 years, leaves then comes back 2 years later it will count a 2 year gap. However if someone joins and stays for 3 year and then leaves they should not be counted if that's all they did.  

 

I've managed to get to a point the code will either include to much or not enough. I can make it either include all gaps (even if they have never been in before, so essentially just all 0's) or only when there's a specified amount of years either side of it. 

 

data my_data;
input KEY YEAR97 YEAR98 YEAR99 YEAR00 YEAR01 YEAR02 YEAR03 YEAR04 YEAR05 YEAR06 YEAR07 YEAR08 YEAR09;
datalines;
1 0 0 1 0 0 1 0 1 0 0 1 0 0
2 1 1 0 1 1 0 1 0 1 1 0 1 1
3 0 0 0 0 1 0 0 0 1 0 0 1 0
4 1 1 1 1 1 1 1 1 1 1 1 1 1
;

data gaps_in_year;
set my_data;
array year_array YEAR97 YEAR98 YEAR99 YEAR00 YEAR01 YEAR02 YEAR03 YEAR04 YEAR05 YEAR06 YEAR07 YEAR08 YEAR09;
gap_counter = 0;
total_gap_length = 0;

do i = 2 to dim(year_array)-1;
if year_array[i-1] = 1 and year_array[i] = 0 and year_array[i+1] = 1 then do;
gap_counter + 1;
end;
else if gap_counter > 0 then do;
total_gap_length + gap_counter;
gap_counter = 0;
end;
end;

/* Calculate average gap length for the current row */
if gap_counter > 0 then do;
total_gap_length + gap_counter;
end;

drop i ;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You can

  1. Concatenate all the 1's and 0's in the YEAR vars into a single character variable (_strng).
  2. Trim leading and trailing 0's from _strng, so that all remaining zeroes must be between 1's.
  3. Then 
    1. Treating 1's as word separators, count the number of "words" (NGAPS)
    2. Get total number of zeroes (compress out the 1's, get the length -->TOTAL_GAP_LENGTH).
  4. Calculate the average gap length.

 

data my_data;
input KEY YEAR97 YEAR98 YEAR99 YEAR00 YEAR01 YEAR02 YEAR03 YEAR04 YEAR05 YEAR06 YEAR07 YEAR08 YEAR09;
datalines;
1 0 0 1 0 0 1 0 1 0 0 1 0 0
2 1 1 0 1 1 0 1 0 1 1 0 1 1
3 0 0 0 0 1 0 0 0 1 0 0 1 0
4 1 1 1 1 1 1 1 1 1 1 1 1 1
;


data want (drop=_:);
  set my_data;
  length _strng  $14;     *Assign a length at least one byte longer than N of YEAR variables *;

  _strng=cats(of year:);

  _strng=substr(_strng,findc(_strng,'1'));   *trim leading zeroes;
  _last1=findc(_strng,'1',-14);              *Find RIGHTMOST '1', note negative 3rd parameter;
  _strng=substr(_strng,1,_last1);            *trim trailing zeroes*;

  ngaps=countw(_strng,'1')-1;
  total_gap_length=lengthn(compress(_strng,'1'));
  if ngaps>0 then average_gap_length=total_gap_length/ngaps;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
Astounding
PROC Star
For me, there is still a bit of guesswork as to what you need. So here is an approach for you to consider.

Start at the beginning of the array and find the first 1. For example, if the first 1 is in YEAR99, the variable First1 would be 3.
Then count backwards from the end of the array. For example, if the first 1 when counting backwards is in YEAR06, then Last1 would be 10.
Finally, count the number of 0 values between First1 and Last1.
Does this approach come close to doing what you need?
mkeintz
PROC Star

You can

  1. Concatenate all the 1's and 0's in the YEAR vars into a single character variable (_strng).
  2. Trim leading and trailing 0's from _strng, so that all remaining zeroes must be between 1's.
  3. Then 
    1. Treating 1's as word separators, count the number of "words" (NGAPS)
    2. Get total number of zeroes (compress out the 1's, get the length -->TOTAL_GAP_LENGTH).
  4. Calculate the average gap length.

 

data my_data;
input KEY YEAR97 YEAR98 YEAR99 YEAR00 YEAR01 YEAR02 YEAR03 YEAR04 YEAR05 YEAR06 YEAR07 YEAR08 YEAR09;
datalines;
1 0 0 1 0 0 1 0 1 0 0 1 0 0
2 1 1 0 1 1 0 1 0 1 1 0 1 1
3 0 0 0 0 1 0 0 0 1 0 0 1 0
4 1 1 1 1 1 1 1 1 1 1 1 1 1
;


data want (drop=_:);
  set my_data;
  length _strng  $14;     *Assign a length at least one byte longer than N of YEAR variables *;

  _strng=cats(of year:);

  _strng=substr(_strng,findc(_strng,'1'));   *trim leading zeroes;
  _last1=findc(_strng,'1',-14);              *Find RIGHTMOST '1', note negative 3rd parameter;
  _strng=substr(_strng,1,_last1);            *trim trailing zeroes*;

  ngaps=countw(_strng,'1')-1;
  total_gap_length=lengthn(compress(_strng,'1'));
  if ngaps>0 then average_gap_length=total_gap_length/ngaps;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 402 views
  • 3 likes
  • 3 in conversation