BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JeffreyLowe
Obsidian | Level 7

Could sure use some help on this problem please. I have a data set where there are instances of duplicate job codes where one instance the job title is abbreviated and the other instance the job title is complete. I need to keep the observation with the complete job title and ignore the observation with the abbreviated job title. As you can see with some observations the abbreviated form is first but in other cases the complete form is first.

 

2020-07-29_8-30-39.jpg

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

It is easy to check both job_code and BU_level_1  - updated code:

proc sort data=have out=temp;
    by job_code bu_level_1;
run;
data want;
 set temp;
  by Job_code bu_level_1;
      length title $50;  /* max expected length */
      retain title;
      if first.bu_level_1 then title = Job_title; 
      else  if length(Job_Title) > length(title) then title = job_title;
      if last.bu_level_1 then do;
         Job_title = title;
         output;
      end;
run;

View solution in original post

6 REPLIES 6
Shmuel
Garnet | Level 18

I assume that:

   - the only difference between observations of the same job_ID is the job title,

   - the complete title is the longest one

   - the file is sorted by job_id

then use next code to save wanted result:

data want;
 set have;
  by Job_code;
      length title $50;  /* max expected length */
      retain title;
      if first.job_code then title = Job_title; 
      else  if length(Job_Title) > length(title) then title = job_title;
      if last.Job_code then do;
         Job_title = title;
         output;
      end;
run;
JeffreyLowe
Obsidian | Level 7

Your solution worked great, except for one thing....my fault I totally forgot to mention it.....in these examples, the job_code and job_title are duplicates but the BU_Level_01 are different, so in these cases I would need to keep both. So sorry for the confusion totally my fault for omitting this info. Other than that the code you provided works perfectly! Thank you!

 

2020-07-29_10-13-15.jpg

Shmuel
Garnet | Level 18

It is easy to check both job_code and BU_level_1  - updated code:

proc sort data=have out=temp;
    by job_code bu_level_1;
run;
data want;
 set temp;
  by Job_code bu_level_1;
      length title $50;  /* max expected length */
      retain title;
      if first.bu_level_1 then title = Job_title; 
      else  if length(Job_Title) > length(title) then title = job_title;
      if last.bu_level_1 then do;
         Job_title = title;
         output;
      end;
run;
JeffreyLowe
Obsidian | Level 7

Perfect! Thank you so much Shmuel works like a charm!

mkeintz
PROC Star

So by "abbreviation" you don't mean things like "ltd" (for limited) or "dir." (for director).  Your examples seem to be such that one title is a sequence of characters that falls within the other title.

 

If so, and your data are sorted by jobcode:

 

data want;  
  set have;
  by jobcode;
  merge have have (firstobs=2 rename=(job_title=nxt_job_title));
if (first.jobcode=0 and length(job_title)>length(lag(job_title))) or (last.jobcode=0 and length(job_title)>length(nxt_job_title)) or (first.jobcode=1 and last.jobcode=1); run;

The subsetting if is satisfied by any of these three conditions

  1. The record-in-hand is the second for this jobcode and its jobtitle length is greater than the preceding (first for this jobcode) jobtitle.
  2. The record-in-hand is the first for this jobcode and its jobtitle length is greater than the following (second for this jobcode).
  3. The record-in-hand is the only one for this jobcode.

Note this program assumes you never have more than 2 records per jobcode.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
JeffreyLowe
Obsidian | Level 7

Thank you for your response, yes things like "Dir" would be considered a duplicate if there were a second observation with the same job_code but "Director" spelled out as seen here:

 

100041 Learning & Development Dir
100041 Learning & Development Director

 

This is a duplicate, the first needs dropped and the second is the observation I want to keep.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 815 views
  • 1 like
  • 3 in conversation