SAS Programming

JeffreyLowe · Posted 07-29-2020 09:08 AM

Could sure use some help on this problem please. I have a data set where there are instances of duplicate job codes where one instance the job title is abbreviated and the other instance the job title is complete. I need to keep the observation with the complete job title and ignore the observation with the abbreviated job title. As you can see with some observations the abbreviated form is first but in other cases the complete form is first.

Shmuel · Posted 07-29-2020 01:45 PM

It is easy to check both job_code and BU_level_1 - updated code:

proc sort data=have out=temp;
    by job_code bu_level_1;
run;
data want;
 set temp;
  by Job_code bu_level_1;
      length title $50;  /* max expected length */
      retain title;
      if first.bu_level_1 then title = Job_title; 
      else  if length(Job_Title) > length(title) then title = job_title;
      if last.bu_level_1 then do;
         Job_title = title;
         output;
      end;
run;

View solution in original post

Shmuel · Posted 07-29-2020 09:37 AM

I assume that:

- the only difference between observations of the same job_ID is the job title,

- the complete title is the longest one

- the file is sorted by job_id

then use next code to save wanted result:

data want;
 set have;
  by Job_code;
      length title $50;  /* max expected length */
      retain title;
      if first.job_code then title = Job_title; 
      else  if length(Job_Title) > length(title) then title = job_title;
      if last.Job_code then do;
         Job_title = title;
         output;
      end;
run;

JeffreyLowe · Posted 07-29-2020 10:21 AM

Your solution worked great, except for one thing....my fault I totally forgot to mention it.....in these examples, the job_code and job_title are duplicates but the BU_Level_01 are different, so in these cases I would need to keep both. So sorry for the confusion totally my fault for omitting this info. Other than that the code you provided works perfectly! Thank you!

Shmuel · Posted 07-29-2020 01:45 PM

It is easy to check both job_code and BU_level_1 - updated code:

proc sort data=have out=temp;
    by job_code bu_level_1;
run;
data want;
 set temp;
  by Job_code bu_level_1;
      length title $50;  /* max expected length */
      retain title;
      if first.bu_level_1 then title = Job_title; 
      else  if length(Job_Title) > length(title) then title = job_title;
      if last.bu_level_1 then do;
         Job_title = title;
         output;
      end;
run;

JeffreyLowe · Posted 07-29-2020 02:20 PM

Perfect! Thank you so much Shmuel works like a charm!

mkeintz · Posted 07-29-2020 09:45 AM

So by "abbreviation" you don't mean things like "ltd" (for limited) or "dir." (for director). Your examples seem to be such that one title is a sequence of characters that falls within the other title.

If so, and your data are sorted by jobcode:

data want;  
  set have;
  by jobcode;
  merge have have (firstobs=2 rename=(job_title=nxt_job_title));
  if (first.jobcode=0 and length(job_title)>length(lag(job_title)))
  or (last.jobcode=0  and length(job_title)>length(nxt_job_title))
  or (first.jobcode=1 and last.jobcode=1);
run;

The subsetting if is satisfied by any of these three conditions

The record-in-hand is the second for this jobcode and its jobtitle length is greater than the preceding (first for this jobcode) jobtitle.
The record-in-hand is the first for this jobcode and its jobtitle length is greater than the following (second for this jobcode).
The record-in-hand is the only one for this jobcode.

Note this program assumes you never have more than 2 records per jobcode.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

JeffreyLowe · Posted 07-29-2020 10:25 AM

Thank you for your response, yes things like "Dir" would be considered a duplicate if there were a second observation with the same job_code but "Director" spelled out as seen here:

100041 Learning & Development Dir
100041 Learning & Development Director

This is a duplicate, the first needs dropped and the second is the observation I want to keep.

SAS Programming

Dropping Some Observations

Re: Dropping Some Observations

Re: Dropping Some Observations

Re: Dropping Some Observations

Re: Dropping Some Observations

Re: Dropping Some Observations

Re: Dropping Some Observations

Re: Dropping Some Observations

DROP CERTAIN VARIABLES

optional drop table

dropping last observations

[ SAS 활용 노하우] 데이터 생성 / LENGTH / LABEL / DROP / 조건문

Count Unique Observations

Follow Us

What is...

SAS Programming

Register Today!

SAS Training: Just a Click Away

Follow Us

What is...