Hey SAS Specialists,
Hope you folks are having a great weekend. I would appreciate if you can provide me the SAS Datastep or Proc SQL logic for the below requirement
I have a Dataset that has the column JOB_DESIGNATION as seen below, and I need logic to get the code column
JOB_DESIGNATION | CODE(i need this column in output) | ||
senior management | sm | ||
middle management | mm | ||
lower management | lm | ||
finance manager | fm | ||
sales manager | sm | ||
marketing manager | marmm | ||
strategy consultant | s.consult_temp | ||
communications manager | comm | ||
technical enginner | t.eng | ||
support engineer | t.support_eng | ||
logistics | LGS | ||
supply chain manager | scm | ||
brand manager | brm | ||
location manager | locm |
I thought of something like
do= i to _n_;
if job_desgnation= "&senior management" then code="&sm";
will this work? Anyway, the point is in every 3 months i.e quarter the job_designation and code values and numbers are subject to change respectively as our employees are constantly restructured and the need of the hour is to automate the report in a way to pull from the tables directly without having the need of a human intervention to give us an update.I hope that makes sense. I'd appreciate a great solution. Thanks
P.S The above is the sample, of course there are many more Job designations, which means it's fairly a big size dataset
You could keep the list of job designations and codes somewhere convenient and easy to edit and have SAS read and transform it into a character format. This is how I would do it:
/* Read in the codes */
data codes;
length job $32 code $12;
input job & code;
datalines;
senior management sm
middle management mm
lower management lm
finance manager fm
sales manager sm
marketing manager marmm
strategy consultant s.consult_temp
communications manager comm
technical enginner t.eng
support engineer t.support_eng
logistics LGS
supply chain manager scm
brand manager brm
location manager locm
;
/* Create a format list. Character formats are case sensitive so generate low, upper and proper case versions of the job designations.
Create a view instead of a dataset, since it is going to be used only once. */
data fmt / view=fmt;
set codes end=other;
fmtname = "jobCodes";
type = "C";
label = code;
start = lowcase(job);
output;
start = upcase(job);
output;
start = propcase(job);
output;
run;
/* Transform the list into a SAS character format */
proc format cntlin=fmt; run;
/* Test the format */
data test;
length title $32;
format title $jobCodes.;
title = "technical enginner";
run;
proc print; run;
PG
Hi PG sir,
Many thanks for the spontaneous response. I understand that in your 1st step that you are creating the dataset codes and then using that as an input file. Well, my input dataset only has the JOB_DESIGNATION column and the CODE column is actually the requirement in the output and is not a part of the input file. So bascially,looking up values of JOB_DESIGNATION such as senior management should generate the code for each JOB.
For example, if job_designation="senior management" then code=sm;
else if ........continues till eof.........../*as we all know multiple if then else statements will ridicule performance drastically and not a good practice*/ and also the values of JOB_DESIGNATION is not static as they may change.For instance in this release, it is senior management and going forward(3 months later) that may change to "Top management" or "Higher management" or anything of your educated guess.
Hence, I thought of the idea of looping it. Like do i=1 to _n_; here _n_ because we don't know how many different JOB_DESIGNATION we may have as the dataset can be large and then apply conditions inside the loop, however my idea is not working or I am not good enough to get the logic right. Also I thought of an idea using an array within the loop, a temporary array,
Please accept my apologies if I didn't explain the question well enough. Many Thanks.
Cheers from England,
Charlotte
Charlotte
From the examples you have given there does not seem to be any straightforward way of deducing the 'code' from the job designation. Also, you have assigned 'sm' to more than one position. As for the shifting sands of fashionable job naming, none of us can guess where that might go next so that part of the problem is insoluble. Normally with a code you would have a short alpha numeric string unique to each description. I would suggest you could have say 5 characters: up to the first 3 could be the initial letters of designation words and the rest a sequence number to avoid duplication. So your example would become
senior management SM001
middle management MM001
lower management LM001
finance manager FM001
sales manager SM002
marketing manager MM002
strategy consultant SC001
communications manager CM001
technical enginner TE001
support engineer SE001
logistics L0001
supply chain manager SCM01
brand manager BM001
location manager LM002
Here is code (untested) which should achieve the result
proc sort data = designations ;
by job_name ;
run ;
data job_code ;
set designations ;
length k $1
prefix $3
code $5
;
retain seq 0 ;
by job_name ;
if first.job_name ;
prefix = '000'
do i = 1 to 3 ;
k = upcase (scan (job_name, i)) ;
if k > ' ' then
substr (prefix, i, 1) = k ;
end ;
prev = lag (prefix) ;
if prefix = prev then seq + 1 ;
else seq = 1 ;
code = prefix || put (seq, z2.) ;
keep job_name code ;
run ;
There is a problem with this approach: supposing a new designation is encountered you cannot simply re-run the code because existing codes might be shifted to new positions. Code for an automatic update could be quite complex; it might be easier to deal with updates manually as they are encountered.
A way round this would be to make the sequence number equal to _N_, the row number; then any new code would take the next highest sequence number. You might need to allocate more than 2 sequence numerals.
Gidday from Oz
Richard
Hey Richard,
Thanks very much indeed for that nice creative approach and your time for solving the problem. I apologise for having not looked into 1 or more Typos as I wrote my question, however i can tell in your response that you clearly had a well educated guess.
Please pardon me on the Code typos. Well,for example the CODE column values should actually be like:
COMPName-SM-BD-CEO meaning (Company name, SM for Senior management, BD for Board of Directors, CEO Chief executive officer)
SM-BD-CFO ((company name ,SM for Senior management, BD for Board of Directors, CEO Chief financial officer)
SM-BD-COO (company name,SM for Senior management, BD for Board of Directors, CEO Chief operating officer)
likewise it gets so granular in the segmentation process and the idea is to list entire codes within all segments consolidated in one single report. I have no clue why they have not chosen alphanumeric codes. So for example SAS experts like you would be given code like companyname-TD-IM-TA (TD is Technology department, IM is Information management and TA is technical analyst. So can you alter the sas code slightly to this kind of values as I'm afraid the numeric prefix wouldn't work? . So imagine if there are over 700 employees, there would have to be 700 corresponding codes.
The above relational logic isn't defined in any of the input tables however I learned it having noticed those excel sheets in earlier manual reports.
Basically, the objective is to have a simple descriptive report on our employees categorizing them to certain groups(segments) using CODES and then of course perhaps use them in relation to other dimensions and measures, which are quite straight forward and so I never bothered to bring that up in the discussion.
I completely understand and acknowledge the possibility of an automatic update is pretty complex. Until recently, the work has been done manually in excel sheets and for whatever reason my big bosses seem to wanna automate many such processes directly from the existing oracle tables instead of manually working on excel sheets.
Hope sun shines down under,
Charlotte
.
Charlotte
Regarding the sun - as we are heading into winter now we have about as much chance of seeing it as in the UK this time of the year!
I have the feeling you are being asked to rectify a defect in organisational planning (nothing new in that, except usually technical staff are the last to be consulted). Normally I would expect job codes to be assigned by HR as part of setting up the job descriptions, and these codes would be entered into your database; and it would be the task of the reporting analyst to translate those job codes back into job titles. In this case the titles are already available but someone now feels the need to inflict codes on the organisation.
Job and department codes really only serve two useful purposes: saving space in a computer database (not in your case - the database contains the full title) and saving space in financial articles. Hence everybody recognises codes such as CEO, CFO, COO and the like and the become status symbols for the elite, like military decorations (DSO, VC etc). The trouble is that lower down the organisational pyramid there are potentially overlapping codes for quite dissimilar roles. SA, for example, can be Systems Architect, Senior Analyst, or Sales Assistant. Hence the idea of attaching company-department codes to the job codes to distinguish between duplicates. This is making your codes unwieldy.
By the way, if job codes were properly implemented in your organisation there would be a database table translating the codes into full titles, and a procedure for updating those tables; and you could use the methods others have suggested to expand codes into descriptions as required.
However, if you have no choice to implement codes in the format you have described I would hope you have a source table with the following columns: Company, Department, Level, Position. I changed the order as I think it is more logical to have Company-BD-SM-CFO.
Proc Sort data = Job_Titles ;
By Company Department Level Position ;
Run ;
Data Job_Codes ;
Set Job_Titles ;
By Company Department Level Position ;
Length Cmp Dpt Lvl Pos $3
Cmp0 Dpt0 Lvl0 Pos0 $3
job_code $16
k $1 ;
Retain Cmp Dpt Lvl
Cmp0 Dpt0 Lvl0 Pos0 ;
If first.Company then
Do ;
Cmp0 = Cmp ;
Cmp = ' ' ;
Do i = 1 to 3;
k = Upcase (Scan (Company, i)) ; /* Initial letter */
Substr (Cmp, i, 1) = k ; /* Stuff it into the company code */
End ;
If Cmp = Cmp0 then put _All_ ; /* unintended duplicates*/
If first.Department …
If first.Level …
If first.Position …
job_code = catx ('-', cmp, dpt, lvl, pos) ;
Drop Cmp Dpt Lvl
Cmp0 Dpt0 Lvl0 Pos0
i k
;
Run ;
KiaOra (I'm really a displaced Kiwi)
Richard
ISW (Itinerant SAS Wrangler)
Hey Richard,
First off, I can't thank you enough for the amazing effort.I am in Awe with your extremely thoughtful ideas and the approach you choose to inscribe the codes. It is so overwhelming that considering my beginner experience compared to your level of superior expertise gives me a blonde moment even before the sun begins to shine in this grey England.
Anyways, you are indeed so close to even guess it to perfection how and why the CODES are being structured. Damn right!! as you have explained it..As a matter of fact I had a meeting with the bosses short while ago to seek further clarification and I was told that the CODE consists of 4 parts and in very few certain cases just 3. So here you go <Companyname>_<Department>_.<Level>_.<Position>
<Companyname> is always XYZ so doesn't matter if that is hard coded. I hope they don't change company names too often (laughs)
<Department> is either Finance, Technology, Marketing etc etc. This needs to be a parameter as it is subject to change and that too very often crazy enough they may call any fancy names unlike olden days eliquency.
At the moment, the mapping file for the CODE corresponding to JOB_DESIGNATION is not present in the oracle source tables and the situation has been escalated to get the information needed in. However until then the challenge is proceed with how it is at present assuming the necessary mapping file(in other words relating new fields) won't exist for sometime. I guess that's also the reason that drove me to seek help from SAS communities as I presumed not so straight forward..
Thanks so much once again,
Charlotte
Charlotte
I thought after I sent my last response I had not thought out the department/level: as you have indicated the departments are probably single words. I suspect that level information is what may be missing in some departments, so you can substitute ANY, ALL, NUL, or XXX or whatever because you will need a placeholder in your code.
First get from your Oracle system a complete list of Departments and management levels in each. If you have the list in a spreadsheet with those 2 columns (all levels for each department) you can decide how department and level are to be encoded and expand the table to 4 columns to include codes for department and level:
Department Management_Level Dept Lvl
Finance Senior Management FIN SNR
Finance Middle Management FIN MID
Finance Lower Management FIN LWR
Technology Senior Management TXG SNR
Technology Middle Management TXG MID
Technology Lower Management TXG LWR
Marketing Senior Management MKT SNR
... etc
Research RES XXX
... etc
Import or create this table into SAS as Departments.
Next get an extract from Oracle of all positions, with columns for Department, Level, and Position. I'll assume that's called Extract. Now we can join these two files to bring the codes for Dept and Lvl together with the position descriptions, and get it all sorted.
Proc SQL ;
Create Table Bigfile as
Select Distinct
X.Department
, X.Level
, X.Position
, D.Dept
, Case
When X.Level IS NULL
then 'XXX'
Else D.Lvl
End As Lvl
From Extract X
Left Join
Departments D
On D.Department = X.Department
And D.Management_Level = X.Level
Order by 4, 5
;
Quit ;
Dump that table into Excel and look for mismatches in Department names or Management levels; if necessary update your Department Table and run the SQL again. If all is good, you can proceed to the next step which is a cut down version of my previous attempt. NB use dashes or underscores as required.
Data Job_Codes ;
Set Bigfile ;
By Dept Lvl Position ;
Length Pos Pos0 $3
job_code $16
k $1 ;
Retain Cmp 'XYZ'
Pos0 ' '
;
Do i = 1 to 3;
k = ' ' ;
k = Upcase (Scan (Position, i)) ; /* Initial letter */
Substr (Pos, i, 1) = k ; /* Stuff it into the Position code */
End ;
Pos0 = Lag (Pos) ;
If Pos = Pos0 then _ERROR_ ; /* unintended duplicates*/
job_code = catx ('-', cmp, dpt, lvl, pos) ;
Drop Pos0 i k ;
Run ;
The _ERROR_ statement is intended to catch you eye if there are any duplicate position codes within a department and level.
I haven't tested this code so if you have any problems please provide an extract from the log.
Richard
Message was edited by: Richard Carson (missing End statement in SQL)
Hi Richard,
I do get your approach. However the oracle tables do not have information on positions and levels as you have listed to extract, which i guess is completely my fault in not describing what is exactly in the oracle table. OK, well in oracle table named description_detail there are only 2 columns with values as shown below
JOB_DESIGNATION(in propcase) DEPARTMENT_FLAG
Senior Manager FIN
Manager ALL /* managers can be for many departments*/
Senior Analyst TEC
.Marketing Manager. MKT
and so on..../* this is all that exists in the oracle tables, therefore extracting information from oracle tables more than the above will not work, nor it is straight forward to simply read the excel sheet stuff into SAS for the sake of doing it in SAS*- Perhaps the only way is hard coding? Well then, that wouldn't solve the problem*/
The explanation or logic for the CODE is not defined or present in detail in the Oracle tables and that is the real pain. However the information does exist in the excel sheets from where i was quoting in my previous messages.
CODE we need as mentioned is Companyname/*hard code?*/>_<Department>/*extract from department_flag?_.<Level>/*excel mapping?_.<Position/*again excel mapping*/>...where I can see that we have company name (easy), department code from the department_flag from the above. For level and Position it seems the only is way to map from the excel sheets.
what i see in excel sheets is column1 JOB_DESIGNATION 2.DEPARTMENT_FLAG 3. LEVEL_CODE 4. POSITION_CODE 5.display_order 6.Job_code(our needed one)
I know it is so weird!!lol Doesn't make much sense to me either. Either it is too vague or confusing Sorry about that
Charlotte
Charlotte
These seem to be the facts
You simply do not have enough information to proceed with an automated derivation of job codes that could be made consistent with what is already in Excel (and what I suspect is being used downstream in reporting). The only possible join is on Department code, if indeed that is consistent, and that does not have enough granularity to match existing position codes with Job Designations.
You could set aside the excel version and just produce new codes from the Oracle extract, treating the Level information as missing. Then try and match the new version with the excel table, using it to update Level information for those jobs where the new job code matches the existing excel position code. I'm guessing there will be mismatches which will cause havoc with downstream processing, which might be alleviated by keeping everyone informed.
If the spreadsheet did contain job descriptions (as in Oracle) you could make some further progress and be more consistent with existing codes. You could use existing position codes where they matched job descriptions in Oracle, and code of the type I have suggested to create new ones as required. You could fill in Level information where it exists, and leave the rest blank. The rest would have to wait for a downstream project.
It may not be too big a job to manually match Oracle job designations to the existing codes. A company of 700 is unlikely to have 700 unique job designations, so if you sort both tables by department code and job or position and put them side by side many matches will be obvious. It may even be quicker at this point to create new position codes for jobs that are missing form excel. If you can do this you will have a lookup table that can convert Oracle job designation and Department to company, department code, (partially to level), and position code; which may be all you could hope for in the short term.
Richard
His code is actually creating those associations.
data codes;
length job $32 code $12;
input job & code;
datalines;
senior management sm
middle management mm
lower management lm
finance manager fm
sales manager sm
marketing manager marmm
strategy consultant s.consult_temp
communications manager comm
technical enginner t.eng
support engineer t.support_eng
logistics LGS
supply chain manager scm
brand manager brm
location manager locm
;
you end up with a dataset called codes that has two columns: the job title and the job code. Then you would use that codes dataset to do the matching.
Charlotte,
The question was well explained, but I didn't explain enough how my proposed solution was related to it. You were asking for some algorithm (if...then; do i=... etc) to do a simple one to one translation. From the example given it is clear to me that no simple rule can work all the time, now and in the future. You are thus stuck with having to list all associations explicitly. This, in a way simplifies the problem because the algorithm to do simple translations can be factored in two parts. The first one is a data table that gives the translations explicitly and the second part is a search algorithm that will look in that table for the appropriate entry.
The solution that I proposed rests on SAS user formats. They give you a very efficient lookup mechanism into what can be a very large table. You could keep your translation table somewhere convenient, like an Excel or Access table, adapt my code to read and transform the table into a SAS format, which you can store and reuse, and then use that format everywhere you need for SAS reporting.
PG
If you add an "other" category of "Not Coded" or similar to the format PGStats recommends you can use the formatted value of the Job_designation to find new codes that need to be added to your list.
Something like:
proc sql;
select distinct Job_designation
from dataset
where put(Job_designation,$Job_designation_code.)="Not Coded";
run;
Or you might find more useful not to have an OTHER entry since jobs designations that are not listed in the format will simply not be translated if there is no OTHER entry. That's a very reasonable thing to do as well.
PG
Hi PG Sir,
Thank you so much once again for the much detailed explanation. I guess that meant a lot and made me understand your approach pretty well. Whilst format is a great approach, do you think this could done using Hash? I did spend a fair amount of time researching some SUGI articles on SAS, and it seems there might be a possibility. Of course you would know better for sure. is it really possible?If so, Can you help on it at your convenient free time if you don't mind
.To be honest, I don't know how to use Hashes in this context besides the fact I know about it being highly RAM speed dependent.
I sincerely do appreciate the favour so far. Thanks so much.
Charlotte
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.