Hey Folks,
I have a dataset 1 with this particular column1 that has values like
COLUMN1
ABC.XYZ.EFG
ABC.XYZ.EFG
ABC.XYZ.EFG
ABC.NA.XYZ.EFG
ABC.NA.XYZ.EFG
ABC.XYZ.EFG
ABC.XYZ.EFG
ABC.NA.XYZ
I need a new dataset 2 with output as the following
COLUMN1 COLUMN2
ABC.XYZ.EFG ABC.XYZ
ABC.XYZ.EFG ABC.XYZ
ABC.XYZ.EFG ABC.XYZ
ABC.NA.XYZ.EFG ABC.NA.XYZ
ABC.NA.XYZ.EFG ABC.NA.XYZ
ABC.XYZ.EFG ABC.XYZ
ABC.XYZ.EFG ABC.XYZ
ABC.NA.XYZ ABC.NA.XYZ
I was of course thinking of applying certain functions like SCAN however you would notice in some some values it is ABC.NA.XYZ.EFG and some with ABC.XYZ.EFG
I think if i split them into two datasets and apply FUNCTION separately to get the needed output, it would work and then remerge back as one. However I would not want to do that. Is there a way in one shot?
This probably good enough.
Hi DN,
Many thanks for the response. I understand the logic assumes constantly .EFG as the last 3 letters. I sincerely apologise for not mentioning earlier how the data looks or changes as the rows pass by.
So in detail: data [Market research Customer segmentation modelScotland dataset composed of consolidated categories with input values being made up of SOURCE.SEGMENTATION.GROUP.SEGMENT or just SOURCE.SEGMENTATION.GROUP where source is ABC. segmentation is of 3 types 1. XYZ 2. YZX 3.ZXY. The GROUP and SEGMENT varies as you notice in the below input values unlike a consistent EFG. In the required output I want source.segmentation in a new column(output) which is ABC.XYZ, ABC.YZX and ABC.ZXY. There is also a sub segmentation category within the above 3 segmentation called NA (Not Applicable) Segmentation where the INPUT value is present like ABC.NA.XYZ, ABC.NA.YZX and ABC.NA.ZXY.
So the total number of rows is approximately 550 when i scrolled down on the VIEW table window.
INPUT(Source.segmentation.group.segment) Required OUTPUT (source.segmentation)
ABC.XYZ.SGR /*No segment in here*/ ABC.XYZ
ABC.XYZ.URB ABC.XYZ
ABC.XYZ.JKL.KPI /*with segment*/ ABC.XYZ
ABC.NA.XYZ.KBG.KBR /*NA Not Applicable category with segment*/ ABC.NA.XYZ
ABC.NA.XYZ.HTY.PLO ABC.NA.XYZ
ABC.XYZ.EFG ABC.XYZ
ABC.XYZ.EFG ABC.XYZ
ABC.XYZ.EFG ABC.XYZ
ABC.NA.XYZ.EFG/*NA Not Applicable category without segment*/ ABC.NA.XYZ
ABC.NA.XYZ.EFG ABC.NA.XYZ
ABC.YZX.STG ABC.YZX
ABC.YZX.SGR ABC.YZX
ABC.YZX.SBN ABC.YZX
ABC.YZX.EFG ABC.YZX
ABC.YZX.EFG ABC.YZX
ABC.NA.YZX.EFG ABC.NA.YZX
ABC.NA.YZX.OLK ABC.NA.YZX
ABC.YZX.EFG ABC.YZX
ABC.NA.YZX.EFG ABC.NA.YZX
ABC.NA.YZX.EFG ABC.NA.YZX
ABC.NA.YZX.PIL.OUL ABC.NA.YZX
ABC.NA.YZX.ASD.LKJ ABC.NA.YZX
ABC.ZXY.EFG ABC.ZXY
ABC.ZXY.EFG ABC.ZXY
ABC.ZXY.SGR ABC.ZXY
ABC.ZXY.POL ABC.ZXY
ABC.NA.ZXY.EFG ABC.NA.ZXY
ABC.NA.ZXY.EFG ABC.NA.ZXY
ABC.NA.ZXY.PKL.AFG ABC.NA.ZXY
ABC.NA.ZXY.GTR.EBH ABC.NA.ZXY
ABC.ZXY.UJH ABC.ZXY
Would appreciate so much. Thanks
Mark
So you want the first two words, except when the second word is 'NA' you also want the third word.
SCAN function is probably what you want to use.
word1=scan(input,1,'.');
word2=scan(input,2,'.');
if word2='NA' then word3=scan(input,3,'.');
output = catx('.',word1,word2,word3);
On top of DN's suggestion, you can also use tranwrd() or a stronger one: prxchange():
data noefg;
input col1 $40.;
col2=tranwrd(col1,'.EFG','');
col3=prxchange('s/\.EFG//',-1,col1);
cards;
ABC.XYZ.EFG
ABC.XYZ.EFG
ABC.XYZ.EFG
ABC.NA.XYZ.EFG
ABC.NA.XYZ.EFG
ABC.XYZ.EFG
ABC.XYZ.EFG
ABC.NA.XYZ
;;;;
run;
data NewData;
set YourData;
if (countw(COLUMN1,'NA')-1)=0 then COLUMN2=substr(COLUMN1,1,7 );
else if (countw(COLUMN1,'NA')-1) ne 0 then COLUMN2=substr(COLUMN1,1, 10);
proc print noobs;
run;
COLUMN1 COLUMN2
Hi HE,
Hmm I like that approach and never really came across in for my need previously the use of Countw function. I will probably have to look into it for more details.
May I ask did you test it?
Mark
Hi Mark,
The output of my SAS code is listed in the answer. I don't know details about your data but i beleive you can develop this code if necessary.
Grettings
Hadi Eid
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.