BookmarkSubscribeRSS Feed
CharlotteCain
Quartz | Level 8

Hey SAS Specialists,

Hope you folks are having a great weekend. I would appreciate if you can provide me the SAS Datastep or Proc SQL logic for the below requirement

I have a Dataset that has the column JOB_DESIGNATION as seen below, and I need logic to get the code column

JOB_DESIGNATIONCODE(i need this column in output)
senior managementsm
middle managementmm
lower managementlm
finance managerfm
sales managersm
marketing managermarmm
strategy consultants.consult_temp
communications managercomm
technical enginnert.eng
support engineert.support_eng
logisticsLGS
supply chain managerscm
brand managerbrm
location managerlocm

I thought of something like 

do= i to _n_;

if job_desgnation= "&senior management" then code="&sm";

will this work?   Anyway, the point is in every 3 months i.e quarter the job_designation and code values and numbers are subject to change respectively as our employees are constantly restructured and the need of the hour is to automate the report in a way to pull from the tables directly without having the need of a human intervention to give us an update.I hope that makes sense.  I'd appreciate a great solution. Thanks   

P.S The above is the sample, of course there are many more Job designations, which means it's fairly a big size dataset                               

17 REPLIES 17
PGStats
Opal | Level 21

You could keep the list of job designations and codes somewhere convenient and easy to edit and have SAS read and transform it into a character format. This is how I would do it:

/* Read in the codes */

data codes;
length job $32 code $12;
input job & code;
datalines;
senior management  sm
middle management  mm
lower management  lm
finance manager  fm
sales manager  sm
marketing manager  marmm
strategy consultant  s.consult_temp
communications manager  comm
technical enginner  t.eng
support engineer  t.support_eng
logistics  LGS
supply chain manager  scm
brand manager  brm
location manager  locm
;

/* Create a format list. Character formats are case sensitive so generate low, upper and proper case versions of the job designations.

Create a view instead of a dataset, since it is going to be used only once. */

data fmt / view=fmt;
set codes end=other;
fmtname = "jobCodes";
type = "C";
label = code;
start = lowcase(job);
output;
start = upcase(job);
output;
start = propcase(job);
output;
run;

/* Transform the list into a SAS character format */

proc format cntlin=fmt; run;

/* Test the format */

data test;
length title $32;
format title $jobCodes.;
title = "technical enginner";
run;

proc print; run;

PG

PG
CharlotteCain
Quartz | Level 8

Hi PG sir,

Many thanks for the spontaneous response. I understand that in your 1st step that you are creating the dataset codes and then using that as an input file. Well, my input dataset only has the JOB_DESIGNATION column and the CODE column is actually the requirement in the output and is not a part of the input file. So bascially,looking up values of JOB_DESIGNATION such as senior management should generate the code for each JOB.

For example, if job_designation="senior management" then code=sm;

else if ........continues till eof.........../*as we all know multiple if then else statements will ridicule performance drastically and not a good practice*/ and also the values of JOB_DESIGNATION is not static as they may change.For instance in this release, it is senior management and going forward(3 months later) that may change to "Top management" or "Higher management" or anything of your educated guess.

Hence, I thought of the idea of looping it. Like do i=1 to _n_; here _n_ because we don't know how many different JOB_DESIGNATION we may have as the dataset can be large and then apply conditions inside the loop, however my idea is not working or I am not good enough to get the logic right. Also I thought of an idea using an array within the loop, a temporary array,

Please accept my apologies if I didn't explain the question well enough. Many Thanks.

Cheers from England,

Charlotte

RichardinOz
Quartz | Level 8

Charlotte

From the examples you have given there does not seem to be any straightforward way of deducing the 'code' from the job designation. Also, you have assigned 'sm' to more than one position.  As for the shifting sands of fashionable job naming, none of us can guess where that might go next so that part of the problem is insoluble. Normally with a code you would have a short alpha numeric string unique to each description.  I would suggest you could have say 5 characters: up to the first 3 could be the initial letters of designation words and the rest a sequence number to avoid duplication.  So your example would become


senior management                SM001

middle management                MM001

lower management                LM001

finance manager                    FM001

sales manager                        SM002

marketing manager                MM002

strategy consultant            SC001

communications manager        CM001

technical enginner            TE001

support engineer                SE001

logistics                                L0001

supply chain manager            SCM01

brand manager                        BM001

location manager                LM002


Here is code  (untested) which should achieve the result


proc sort data = designations ;

     by job_name ;

run ;


data job_code ;

     set designations ;

     length k $1

               prefix $3

               code $5

               ;

     retain seq 0 ;

     by job_name ;

     if first.job_name ;

     prefix = '000'

     do i = 1 to 3 ;

          k = upcase (scan (job_name, i)) ;

          if k > ' ' then

          substr (prefix, i, 1) = k ;

     end ;

     prev = lag (prefix) ;

     if prefix = prev then seq + 1 ;

     else seq = 1 ;

     code = prefix || put (seq, z2.) ;

     keep job_name code ;

run ;


There is a problem with this approach:  supposing a new designation is encountered you cannot simply re-run the code because existing codes might be shifted to new positions.  Code for an automatic update could be quite complex; it might be easier to deal with updates manually as they are encountered.

A way round this would be to make the sequence number equal to _N_, the row number; then any new code would take the next highest sequence number.  You might need to allocate more than 2 sequence numerals.


Gidday from Oz

Richard


CharlotteCain
Quartz | Level 8

Hey Richard,

Thanks very much indeed for that nice creative approach and your time for solving the problem. I apologise for having not looked into 1 or more Typos as I wrote my question, however i can tell in your response that you clearly had a well educated guess.

Please pardon me on the Code typos. Well,for example  the CODE column values should actually be like:

COMPName-SM-BD-CEO meaning  (Company name, SM for Senior management, BD for Board of Directors, CEO Chief executive officer)

SM-BD-CFO ((company name ,SM for Senior management, BD for Board of Directors, CEO Chief financial officer)

SM-BD-COO (company name,SM for Senior management, BD for Board of Directors, CEO Chief operating officer)

likewise it gets so granular in the segmentation process and the idea is to list entire codes within all segments consolidated  in one single report. I have no clue why they have not chosen alphanumeric codes. So for example SAS experts like you would be given code like companyname-TD-IM-TA (TD is Technology department, IM is Information management and TA is technical analyst. So can you alter the sas code slightly to this kind of values as I'm afraid the numeric prefix wouldn't work? . So imagine if there are over 700 employees, there would have to be 700 corresponding codes.

The above relational logic isn't defined in any of the input tables however I learned it having noticed those excel sheets in earlier manual reports.

Basically, the objective is to have a simple descriptive report on our employees categorizing them to certain groups(segments) using CODES and then of course perhaps use them in relation to other dimensions and measures, which are quite straight forward and so I never bothered to bring that up in the discussion.

I completely understand and acknowledge the possibility of an automatic update is pretty complex. Until recently, the work has been done manually in excel sheets and for whatever reason my big bosses seem to wanna automate many such processes directly from the existing oracle tables instead of manually working on excel sheets.

Hope sun shines down under,

Charlotte

.

RichardinOz
Quartz | Level 8

Charlotte

Regarding the sun - as we are heading into winter now we have about as much chance of seeing it as in the UK this time of the year!

I have the feeling you are being asked to rectify a defect in organisational planning (nothing new in that, except usually technical staff are the last to be consulted).  Normally I would expect job codes to be assigned by HR as part of setting up the job descriptions, and these codes would be entered into your database; and it would be the task of the reporting analyst to translate those job codes back into job titles.  In this case the titles are already available but someone now feels the need to inflict codes on the organisation.

Job and department codes really only serve two useful purposes:  saving space in a computer database (not in your case - the database contains the full title) and saving space in financial articles.  Hence everybody recognises codes such as CEO, CFO, COO and the like and the become status symbols for the elite, like military decorations (DSO, VC etc).  The trouble is that lower down the organisational pyramid there are potentially overlapping codes for quite dissimilar roles.  SA, for example, can be Systems Architect, Senior Analyst, or Sales Assistant.  Hence the idea of attaching company-department codes to the job codes to distinguish between duplicates.  This is making your codes unwieldy. 

By the way, if job codes were properly implemented in your organisation there would be a database table translating the codes into full titles, and a procedure for updating those tables; and you could use the methods others have suggested to expand codes into descriptions as required.

However, if you have no choice to implement codes in the format you have described I would hope you have a source table with the following columns: Company, Department, Level, Position.  I changed the order as I think it is more logical to have Company-BD-SM-CFO.

Proc Sort data = Job_Titles ;

  By Company Department Level Position ;

Run ;

Data Job_Codes ;

Set Job_Titles ;

By Company Department Level Position ;

Length Cmp Dpt Lvl Pos $3

Cmp0 Dpt0 Lvl0 Pos0 $3

job_code $16

k $1 ;

Retain Cmp Dpt Lvl

Cmp0 Dpt0 Lvl0 Pos0 ;

If first.Company then

Do ;

Cmp0 = Cmp ;

Cmp = ' ' ;

Do i = 1 to 3;

k = Upcase (Scan (Company, i)) ; /* Initial letter */

Substr (Cmp, i, 1) = k ; /* Stuff it into the company code */

End ;

If Cmp = Cmp0 then put _All_ ; /* unintended duplicates*/

If first.Department …

If first.Level …

If first.Position …

job_code = catx ('-', cmp, dpt, lvl, pos) ;

Drop  Cmp Dpt Lvl

Cmp0 Dpt0 Lvl0 Pos0

i k

;

Run ;

KiaOra (I'm really a displaced Kiwi)

Richard

ISW (Itinerant SAS Wrangler)

CharlotteCain
Quartz | Level 8

Hey Richard,

First off, I can't thank you enough for the amazing effort.I am in Awe with your extremely thoughtful ideas and the approach you choose to inscribe the codes. It is so overwhelming that considering my beginner experience compared to your level of superior expertise gives me a blonde moment even before the sun begins to shine in this grey England. Smiley Happy

Anyways, you are indeed so close to even guess it to perfection how and why the CODES are being structured. Damn right!! as you have explained it..As a matter of fact I had a meeting with the bosses short while ago to seek further clarification and I was told that the CODE consists of 4 parts and in very few certain cases just 3. So here you go <Companyname>_<Department>_.<Level>_.<Position>

<Companyname> is always XYZ so doesn't matter if that is hard coded. I hope they don't change company names too often (laughs)

<Department> is either Finance, Technology, Marketing etc etc. This needs to be a parameter as it is subject to change and that too very often crazy enough they may call any fancy names unlike olden days eliquency.

At the moment, the mapping file for the CODE corresponding to JOB_DESIGNATION  is not present in the oracle source tables and the situation has been escalated to get the information needed in. However until then the challenge is proceed with how it is at present assuming the necessary mapping file(in other words relating new fields) won't exist for sometime. I guess that's also the reason that drove me to seek help from SAS communities as I presumed not so straight forward..

Thanks so much once again,

Charlotte

RichardinOz
Quartz | Level 8

Charlotte

I thought after I sent my last response I had not thought out the department/level: as you have indicated the departments are probably single words.  I suspect that level information is what may be missing in some departments, so you can substitute ANY, ALL, NUL, or XXX or whatever because you will need a placeholder in your code.

First get from your Oracle system a complete list of Departments and management levels in each.  If you have the list in a spreadsheet with those 2 columns (all levels for each department) you can decide how department and level are to be encoded and expand the table to 4 columns to include codes for department and level:

Department     Management_Level     Dept     Lvl

Finance          Senior Management     FIN      SNR

Finance          Middle Management     FIN     MID

Finance          Lower Management     FIN      LWR

Technology     Senior Management     TXG      SNR

Technology     Middle Management    TXG     MID

Technology     Lower Management      TXG     LWR

Marketing     Senior Management     MKT      SNR

... etc

Research                                        RES     XXX

... etc

Import or create this table into SAS as Departments.


Next get an extract from Oracle of all positions, with columns for Department, Level, and Position.  I'll assume that's called Extract.  Now we can join these two files to bring the codes for Dept and Lvl together with the position descriptions, and get it all sorted.


Proc SQL ;

     Create Table Bigfile as

          Select  Distinct  

                    X.Department

               ,     X.Level

               ,     X.Position

               ,     D.Dept

               ,      Case

                         When X.Level IS NULL

                              then 'XXX'

                         Else D.Lvl    

                      End        As Lvl

          From     Extract     X

               Left Join

                       Departments D

                On     D.Department     =     X.Department

               And     D.Management_Level = X.Level

          Order by   4, 5

          ;

Quit ;


Dump that table into Excel and look for mismatches in Department names or Management levels; if necessary update your Department Table and run the SQL again.  If all is good, you can proceed to the next step which is a cut down version of my previous attempt.  NB use dashes or underscores as required.

Data Job_Codes ;

     Set Bigfile ;

     By Dept Lvl Position ;

     Length Pos Pos0 $3

          job_code $16

          k $1 ;

     Retain     Cmp    'XYZ'

                     Pos0     ' '

                    ;

     Do i = 1 to 3;

          k = ' ' ;

          k = Upcase (Scan (Position, i)) ; /* Initial letter */

          Substr (Pos, i, 1) = k ; /* Stuff it into the Position code */

     End ;

     Pos0 = Lag (Pos) ;

     If Pos = Pos0 then _ERROR_ ; /* unintended duplicates*/

     job_code = catx ('-', cmp, dpt, lvl, pos) ;

     Drop  Pos0 i k ;

Run ;

The _ERROR_ statement is intended to catch you eye if there are any duplicate position codes within a department and level.

I haven't tested this code so if you have any problems please provide an extract from the log.

Richard

Message was edited by: Richard Carson (missing End statement in SQL)

CharlotteCain
Quartz | Level 8

Hi Richard,

I do get your approach. However the oracle tables do not have information on positions and levels as you have listed to extract, which i guess is completely my fault in not describing what is exactly in the oracle table.  OK, well in oracle table named description_detail there are only 2 columns with values as shown below

JOB_DESIGNATION(in propcase)  DEPARTMENT_FLAG

Senior Manager                                    FIN

Manager                                              ALL  /* managers can be for many departments*/

Senior Analyst                                      TEC

.Marketing Manager.                                MKT

and so on..../* this is all that exists in the oracle tables, therefore extracting information from oracle tables more than the above will not work, nor it is straight forward to simply read the excel sheet stuff into SAS for the sake of doing it in SAS*- Perhaps the only way is hard coding? Well then, that wouldn't solve the problem*/

The explanation or logic for the CODE is not defined or present in detail in the Oracle tables and that is the real pain. However the information does exist in the excel sheets from where i was quoting in my previous messages.

CODE we need as mentioned is Companyname/*hard code?*/>_<Department>/*extract from department_flag?_.<Level>/*excel mapping?_.<Position/*again excel mapping*/>...where I can see that we have company name (easy), department code from the department_flag from the above. For level and Position it seems the only is way to map from the excel sheets.

what i see in excel sheets is column1 JOB_DESIGNATION 2.DEPARTMENT_FLAG 3. LEVEL_CODE 4. POSITION_CODE 5.display_order 6.Job_code(our needed one)

I know it is so weird!!lol Doesn't make much sense to me either. Either it is too vague or confusing Smiley Sad Sorry about that

Charlotte

RichardinOz
Quartz | Level 8

Charlotte

These seem to be the facts

  1. Oracle has a (nearly?) complete set of job descriptions and department codes, but no level information
  2. Your Excel spreadsheet contains an outdated or incomplete set of codes for company, department, level and position, but crucially not the full job description that would match what is in Oracle (or does it?)

You simply do not have enough information to proceed with an automated derivation of job codes that could be made consistent with what is already in Excel (and what I suspect is being used downstream in reporting).  The only possible join is on Department code, if indeed that is consistent, and that does not have enough granularity to match existing position codes with Job Designations. 

You could set aside the excel version and just produce new codes from the Oracle extract, treating the Level information as missing.  Then try and match the new version with the excel table, using it to update Level information for those jobs where the new job code matches the existing excel position code.  I'm guessing there will be mismatches which will cause havoc with downstream processing, which might be alleviated by keeping everyone informed.

If the spreadsheet did contain job descriptions (as in Oracle) you could make some further progress and be more consistent with existing codes.  You could use existing position codes where they matched job descriptions in Oracle, and code of the type I have suggested to create new ones as required.  You could fill in Level information where it exists, and leave the rest blank.  The rest would have to wait for a downstream project.

It may not be too big a job to manually match Oracle job designations to the existing codes.  A company of 700 is unlikely to have 700 unique job designations, so if you sort both tables by department code and job or position and put them side by side many matches will be obvious.  It may even be quicker at this point to create new position codes for jobs that are missing form excel.  If you can do this you will have a lookup table that can convert Oracle job designation and Department to company, department code, (partially to level), and position code; which may be all you could hope for in the short term.

Richard

DBailey
Lapis Lazuli | Level 10

His code is actually creating those associations. 

data codes;
length job $32 code $12;
input job & code;
datalines;
senior management  sm
middle management  mm
lower management  lm
finance manager  fm
sales manager  sm
marketing manager  marmm
strategy consultant  s.consult_temp
communications manager  comm
technical enginner  t.eng
support engineer  t.support_eng
logistics  LGS
supply chain manager  scm
brand manager  brm
location manager  locm
;


you end up with a dataset called codes that has two columns: the job title and the job code.  Then you would use that codes dataset to do the matching.

PGStats
Opal | Level 21

Charlotte,

The question was well explained, but I didn't explain enough how my proposed solution was related to it. You were asking for some algorithm (if...then; do i=... etc) to do a simple one to one translation. From the example given it is clear to me that no simple rule can work all the time, now and in the future. You are thus stuck with having to list all associations explicitly. This, in a way simplifies the problem because the algorithm to do simple translations can be factored in two parts. The first one is a data table that gives the translations explicitly and the second part is a search algorithm that will look in that table for the appropriate entry.

The solution that I proposed rests on SAS user formats. They give you a very efficient lookup mechanism into what can be a very large table. You could keep your translation table somewhere convenient, like an Excel or Access table, adapt my code to read and transform the table into a SAS format, which you can store and reuse, and then use that format everywhere you need for SAS reporting.

PG

PG
ballardw
Super User

If you add an "other" category of "Not Coded" or similar to the format PGStats recommends you can use the formatted value of the Job_designation to find new codes that need to be added to your list.

Something like:

proc sql;

     select distinct Job_designation

     from dataset

     where put(Job_designation,$Job_designation_code.)="Not Coded";

run;

PGStats
Opal | Level 21

Or you might find more useful not to have an OTHER entry since jobs designations that are not listed in the format will simply not be translated if there is no OTHER entry. That's a very reasonable thing to do as well.

PG

PG
CharlotteCain
Quartz | Level 8

Hi PG Sir,

Thank you so much once again for the much detailed explanation. I guess that meant a lot and made me understand your approach pretty well. Whilst format is a great approach, do you think this could done using Hash? I did spend a fair amount of time researching some SUGI articles on SAS, and it seems there might be a possibility. Of course you would know better for sure. is it really possible?If so, Can you help on it at your convenient free time if you don't mind

.To be honest, I don't know how to use Hashes in this context besides the fact I know about it being highly RAM speed dependent.

I sincerely do appreciate the favour so far. Thanks so much.

Charlotte

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 1110 views
  • 4 likes
  • 7 in conversation