Hi all, I wanted to extract some words from the data column so that I can classified them in the new column based on some words (but not all) from the old column.
For example,
in my old column,
ZMAJCLDSC
CGO - Marine Cargo (I want to extract CMC here)
ENG - Engineering (I want to extract the word ENG here)
PAB (I want to extract the first alphabet P here)
PMD (I want to extract the first alphabet P here)
CBA (I want to extract the first alphabet C here)
IAB (I want to extract the word IAB here)
then I will classified it based on what I have extract.
ZMAJCLDSC (OLD COLUMN) lob (NEW COLUMN)
CGO - Marine Cargo cargo
ENG - Engineering ENG
PAB PA
PMD PA
CBA PA
IAB PA
I plan to use if statement based on what I extracted. here's my first two code below .But I dun know how to extract the words. What can I do with this?
if zmajcldsc='CGO - Marine Cargo' then
lob='cargo';
else if zmajcldsc='ENG - Engineering' then
lob='Engineering';
you could create a permit table that contained a column for ZMAJCLDSC valid values and a column that values translation to for the lob value.
or maintain a format for ZMAJCLDSC
At first I was thinking that a format would be the best option, but it all depends on what the logic would be.
It is not clear if you want to have "ENG - <any random text>" to be converted to ENG or if it is only "ENG - Engineering".
proc format;
value $myConv
"CGO-MarineCargo" = "CMC"
"ENG-Engineering" = "ENG"
"PAB", "PMD" = "P"
"CBA" = "C"
"IAB" = "IAB"
other = "***** Unknown ****";
run;
data have;
length ZMAJCLDSC $ 32;
input ZMAJCLDSC ;
datalines;
CGO-MarineCargo
ENG-Engineering
PAB
PMD
CBA
IAB
;;;
run;
data want;
set have;
lob = put(ZMAJCLDSC,$myConv.);
run;
I might consider a custom INFORMAT to read the data as needed with an error condition for unexpected values
proc format library=work; invalue $myConv (upcase) "CGO-MARINECARGO" = "CMC" "ENG-ENGINEERING" = "ENG" "PAB", "PMD" = "P" "CBA" = "C" "IAB" = "IAB" other = _error_; run; data have; length ZMAJCLDSC $ 32; informat ZMAJCLDSC $myConv. ; input ZMAJCLDSC ; datalines; CGO-MarineCargo ENG-Engineering PAB PMD CBA IAB somethingunexpected ;;; run;
The UPCASE option on an informat can help if data entry is somewhat inconsistent so you don't have to consider Pab PAB PaB as different values.
The _error_ for other will generate an invalid data message so you modify the INFORMAT code when an unexpected value is encountered.
If the field is sometimes expected to be blank you would add a " "= " " to the list of values.
@heffo wrote:
Thank you, I didn't know about the _error_ nor the Upcase() inside of formats!
The options UPCASE and _error_ only work on INVALUE , or when creating informats, not VALUE formats.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.