BookmarkSubscribeRSS Feed
aayeshasingh
Calcite | Level 5

HI all,

    I have a variable which shows the years of experience and it is like this:

YrsofEXP

5.00

5 years

5 yrs

3 in September

Less than a year

2 1/3

10+

12 and above

8.

10 years 

8 plus years

 

I want result with years. For example;

YrsofEXP

5

5

5

3

1

1

2

But, I want result just in years. How do I remove all those alphabets? My total observation is around 900.

I would appreciate your help. Thank you much.

1 REPLY 1
Patrick
Opal | Level 21

Looks like you're dealing with a free-text field so there isn't much else you can do than to analyse the text and then write custom logic to convert the data into a numerical value.

Good thing is that you're only dealing with 900 obs so things should be manageable. 

Below code could give you a start. Just run it, look what remains with the Proc Freq report and extend the program logic until there is no leftover.

data have;
  infile datalines truncover;
  input YrsofEXP $20.;
  datalines;
5.00
5 years
5 yrs
3 in September
Less than a year
2 1/3
10+
12 and above
8.
10 years 
8 plus years
;

data want remainder;
  set have;
  if not missing(YrsofEXP) then
    do;
      YrsofEXP_num=input(compress(YrsofEXP,'./','kd'), ?? best32.);
      /** here add more logic to read the remaining patterns into a SAS numerical variable **/
/*      if missing(YrsofEXP_num) then YrsofEXP_num= <some logic>;*/
      /**  **/
    end;
  if missing(YrsofEXP) then output want;
  else if cmiss(YrsofEXP_num,YrsofEXP)=0 then output want;
  else output remainder;
run;

data remaining_patterns;
  set remainder;
  length patter $20;
  pattern=prxchange('s/\d/d/oi',-1,YrsofEXP);
run;
title 'Not yet converted patterns';
proc freq data=pattern;
  table pattern /nocum nopercent;
run;
title;

 Capture.JPG