05-30-2013 08:59 AM
I have a dataset 1 with this particular column1 that has values like
I need a new dataset 2 with output as the following
I was of course thinking of applying certain functions like SCAN however you would notice in some some values it is ABC.NA.XYZ.EFG and some with ABC.XYZ.EFG
I think if i split them into two datasets and apply FUNCTION separately to get the needed output, it would work and then remerge back as one. However I would not want to do that. Is there a way in one shot?
05-30-2013 09:13 AM
This probably good enough.
05-31-2013 04:35 AM
Many thanks for the response. I understand the logic assumes constantly .EFG as the last 3 letters. I sincerely apologise for not mentioning earlier how the data looks or changes as the rows pass by.
So in detail: data [Market research Customer segmentation modelScotland dataset composed of consolidated categories with input values being made up of SOURCE.SEGMENTATION.GROUP.SEGMENT or just SOURCE.SEGMENTATION.GROUP where source is ABC. segmentation is of 3 types 1. XYZ 2. YZX 3.ZXY. The GROUP and SEGMENT varies as you notice in the below input values unlike a consistent EFG. In the required output I want source.segmentation in a new column(output) which is ABC.XYZ, ABC.YZX and ABC.ZXY. There is also a sub segmentation category within the above 3 segmentation called NA (Not Applicable) Segmentation where the INPUT value is present like ABC.NA.XYZ, ABC.NA.YZX and ABC.NA.ZXY.
So the total number of rows is approximately 550 when i scrolled down on the VIEW table window.
INPUT(Source.segmentation.group.segment) Required OUTPUT (source.segmentation)
ABC.XYZ.SGR /*No segment in here*/ ABC.XYZ
ABC.XYZ.JKL.KPI /*with segment*/ ABC.XYZ
ABC.NA.XYZ.KBG.KBR /*NA Not Applicable category with segment*/ ABC.NA.XYZ
ABC.NA.XYZ.EFG/*NA Not Applicable category without segment*/ ABC.NA.XYZ
Would appreciate so much. Thanks
05-31-2013 07:36 AM
So you want the first two words, except when the second word is 'NA' you also want the third word.
SCAN function is probably what you want to use.
if word2='NA' then word3=scan(input,3,'.');
output = catx('.',word1,word2,word3);
05-30-2013 09:31 AM
On top of DN's suggestion, you can also use tranwrd() or a stronger one: prxchange():
input col1 $40.;
05-30-2013 10:26 AM
if (countw(COLUMN1,'NA')-1)=0 then COLUMN2=substr(COLUMN1,1,7 );
else if (countw(COLUMN1,'NA')-1) ne 0 then COLUMN2=substr(COLUMN1,1, 10);
proc print noobs;
05-31-2013 06:15 AM
Hmm I like that approach and never really came across in for my need previously the use of Countw function. I will probably have to look into it for more details.
May I ask did you test it?
05-31-2013 09:04 AM
The output of my SAS code is listed in the answer. I don't know details about your data but i beleive you can develop this code if necessary.