Hi there.
I've got stuck with my SAS code.
My aim: To find the average gap of a row (but only a gap where there is either a year before or after the gap) So if someone joins for 2 years, leaves then comes back 2 years later it will count a 2 year gap. However if someone joins and stays for 3 year and then leaves they should not be counted if that's all they did.
I've managed to get to a point the code will either include to much or not enough. I can make it either include all gaps (even if they have never been in before, so essentially just all 0's) or only when there's a specified amount of years either side of it.
data my_data;
input KEY YEAR97 YEAR98 YEAR99 YEAR00 YEAR01 YEAR02 YEAR03 YEAR04 YEAR05 YEAR06 YEAR07 YEAR08 YEAR09;
datalines;
1 0 0 1 0 0 1 0 1 0 0 1 0 0
2 1 1 0 1 1 0 1 0 1 1 0 1 1
3 0 0 0 0 1 0 0 0 1 0 0 1 0
4 1 1 1 1 1 1 1 1 1 1 1 1 1
;
data gaps_in_year;
set my_data;
array year_array YEAR97 YEAR98 YEAR99 YEAR00 YEAR01 YEAR02 YEAR03 YEAR04 YEAR05 YEAR06 YEAR07 YEAR08 YEAR09;
gap_counter = 0;
total_gap_length = 0;
do i = 2 to dim(year_array)-1;
if year_array[i-1] = 1 and year_array[i] = 0 and year_array[i+1] = 1 then do;
gap_counter + 1;
end;
else if gap_counter > 0 then do;
total_gap_length + gap_counter;
gap_counter = 0;
end;
end;
/* Calculate average gap length for the current row */
if gap_counter > 0 then do;
total_gap_length + gap_counter;
end;
drop i ;
run;
You can
data my_data;
input KEY YEAR97 YEAR98 YEAR99 YEAR00 YEAR01 YEAR02 YEAR03 YEAR04 YEAR05 YEAR06 YEAR07 YEAR08 YEAR09;
datalines;
1 0 0 1 0 0 1 0 1 0 0 1 0 0
2 1 1 0 1 1 0 1 0 1 1 0 1 1
3 0 0 0 0 1 0 0 0 1 0 0 1 0
4 1 1 1 1 1 1 1 1 1 1 1 1 1
;
data want (drop=_:);
set my_data;
length _strng $14; *Assign a length at least one byte longer than N of YEAR variables *;
_strng=cats(of year:);
_strng=substr(_strng,findc(_strng,'1')); *trim leading zeroes;
_last1=findc(_strng,'1',-14); *Find RIGHTMOST '1', note negative 3rd parameter;
_strng=substr(_strng,1,_last1); *trim trailing zeroes*;
ngaps=countw(_strng,'1')-1;
total_gap_length=lengthn(compress(_strng,'1'));
if ngaps>0 then average_gap_length=total_gap_length/ngaps;
run;
You can
data my_data;
input KEY YEAR97 YEAR98 YEAR99 YEAR00 YEAR01 YEAR02 YEAR03 YEAR04 YEAR05 YEAR06 YEAR07 YEAR08 YEAR09;
datalines;
1 0 0 1 0 0 1 0 1 0 0 1 0 0
2 1 1 0 1 1 0 1 0 1 1 0 1 1
3 0 0 0 0 1 0 0 0 1 0 0 1 0
4 1 1 1 1 1 1 1 1 1 1 1 1 1
;
data want (drop=_:);
set my_data;
length _strng $14; *Assign a length at least one byte longer than N of YEAR variables *;
_strng=cats(of year:);
_strng=substr(_strng,findc(_strng,'1')); *trim leading zeroes;
_last1=findc(_strng,'1',-14); *Find RIGHTMOST '1', note negative 3rd parameter;
_strng=substr(_strng,1,_last1); *trim trailing zeroes*;
ngaps=countw(_strng,'1')-1;
total_gap_length=lengthn(compress(_strng,'1'));
if ngaps>0 then average_gap_length=total_gap_length/ngaps;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.