Need help with the right function for this label extraction plz?

Reply
Contributor
Posts: 72

Need help with the right function for this label extraction plz?

Hey Folks,

I have a  dataset 1 with this particular column1 that has values like

COLUMN1

ABC.XYZ.EFG

ABC.XYZ.EFG

ABC.XYZ.EFG

ABC.NA.XYZ.EFG

ABC.NA.XYZ.EFG

ABC.XYZ.EFG

ABC.XYZ.EFG

ABC.NA.XYZ

I need a new dataset 2 with output as the following

COLUMN1                      COLUMN2

ABC.XYZ.EFG                  ABC.XYZ

ABC.XYZ.EFG                  ABC.XYZ

ABC.XYZ.EFG                  ABC.XYZ

ABC.NA.XYZ.EFG             ABC.NA.XYZ

ABC.NA.XYZ.EFG             ABC.NA.XYZ

ABC.XYZ.EFG                  ABC.XYZ

ABC.XYZ.EFG                  ABC.XYZ

ABC.NA.XYZ                     ABC.NA.XYZ

I was of course thinking of applying certain functions like SCAN however you would notice in some some values it is ABC.NA.XYZ.EFG and some with ABC.XYZ.EFG

I think if i split them into two datasets and apply FUNCTION separately to get the needed output, it would work and then remerge back as one. However I would not want to do that. Is there a way in one shot?

Respected Advisor
Posts: 3,777

Re: Need help with the right function for this label extraction plz?

This probably good enough.

data noefg;
   input s $40.;
   l = find(s,
'.EFG',-40);
   if l then s1 = substrn(s,1,l-1);
   else s1 = s;
   cards;
ABC.XYZ.EFG
ABC.XYZ.EFG
ABC.XYZ.EFG
ABC.NA.XYZ.EFG
ABC.NA.XYZ.EFG
ABC.XYZ.EFG
ABC.XYZ.EFG
ABC.NA.XYZ
;;;;
   run;
Contributor
Posts: 72

Re: Need help with the right function for this label extraction plz?

Hi DN,

Many thanks for the response. I understand the logic assumes constantly .EFG as the last 3 letters. I sincerely apologise for not mentioning earlier how the data looks or changes as the rows pass by.

So in detail: data [Market research Customer segmentation modelScotland dataset composed of consolidated categories with input values being made up of SOURCE.SEGMENTATION.GROUP.SEGMENT or just SOURCE.SEGMENTATION.GROUP  where source is ABC. segmentation is of 3 types 1. XYZ 2. YZX 3.ZXY. The GROUP and SEGMENT varies as you notice in the below input values unlike a consistent EFG. In the required output I want source.segmentation in a new column(output) which is ABC.XYZ, ABC.YZX and ABC.ZXY. There is also a sub segmentation category within the above 3 segmentation called NA (Not Applicable) Segmentation where the INPUT value is present like ABC.NA.XYZ, ABC.NA.YZX and ABC.NA.ZXY.

So the total number of rows is approximately 550 when i scrolled down on the VIEW table window.

INPUT(Source.segmentation.group.segment)                                                           Required OUTPUT (source.segmentation)

ABC.XYZ.SGR /*No segment in here*/                                                                            ABC.XYZ

ABC.XYZ.URB                                                                                                             ABC.XYZ

ABC.XYZ.JKL.KPI /*with segment*/                                                                                ABC.XYZ

ABC.NA.XYZ.KBG.KBR /*NA Not Applicable category with segment*/                        ABC.NA.XYZ

ABC.NA.XYZ.HTY.PLO                                                                                                ABC.NA.XYZ

ABC.XYZ.EFG                                                                                                              ABC.XYZ

ABC.XYZ.EFG                                                                                                                ABC.XYZ

ABC.XYZ.EFG                                                                                                               ABC.XYZ

ABC.NA.XYZ.EFG/*NA Not Applicable category without segment*/                            ABC.NA.XYZ

ABC.NA.XYZ.EFG                                                                                                      ABC.NA.XYZ

ABC.YZX.STG                                                                                                            ABC.YZX

ABC.YZX.SGR                                                                                                             ABC.YZX

ABC.YZX.SBN                                                                                                             ABC.YZX

ABC.YZX.EFG                                                                                                              ABC.YZX

ABC.YZX.EFG                                                                                                                ABC.YZX

ABC.NA.YZX.EFG                                                                                                         ABC.NA.YZX

ABC.NA.YZX.OLK                                                                                                         ABC.NA.YZX

ABC.YZX.EFG                                                                                                                 ABC.YZX

ABC.NA.YZX.EFG                                                                                                         ABC.NA.YZX

ABC.NA.YZX.EFG                                                                                                       ABC.NA.YZX

ABC.NA.YZX.PIL.OUL                                                                                                 ABC.NA.YZX

ABC.NA.YZX.ASD.LKJ                                                                                                ABC.NA.YZX

ABC.ZXY.EFG                                                                                                              ABC.ZXY

ABC.ZXY.EFG                                                                                                             ABC.ZXY

ABC.ZXY.SGR                                                                                                               ABC.ZXY

ABC.ZXY.POL                                                                                                                ABC.ZXY

ABC.NA.ZXY.EFG                                                                                                        ABC.NA.ZXY

ABC.NA.ZXY.EFG                                                                                                     ABC.NA.ZXY

ABC.NA.ZXY.PKL.AFG                                                                                             ABC.NA.ZXY

ABC.NA.ZXY.GTR.EBH                                                                                           ABC.NA.ZXY

ABC.ZXY.UJH                                                                                                          ABC.ZXY

Would appreciate so much. Thanks

Mark

Super User
Super User
Posts: 6,502

Re: Need help with the right function for this label extraction plz?

So you want the first two words, except when the second word is 'NA' you also want the third word.

SCAN function is probably what you want to use.

word1=scan(input,1,'.');

word2=scan(input,2,'.');

if word2='NA' then word3=scan(input,3,'.');

output = catx('.',word1,word2,word3);

Respected Advisor
Posts: 3,124

Re: Need help with the right function for this label extraction plz?

On top of DN's suggestion, you can also use tranwrd() or a stronger one: prxchange():


data noefg;

  input col1 $40.;

col2=tranwrd(col1,'.EFG','');

col3=prxchange('s/\.EFG//',-1,col1);

  cards;

ABC.XYZ.EFG

ABC.XYZ.EFG

ABC.XYZ.EFG

ABC.NA.XYZ.EFG

ABC.NA.XYZ.EFG

ABC.XYZ.EFG

ABC.XYZ.EFG

ABC.NA.XYZ

;;;;

  run;

New Contributor HE
New Contributor
Posts: 4

Re: Need help with the right function for this label extraction plz?

data NewData;

set YourData;

if (countw(COLUMN1,'NA')-1)=0 then COLUMN2=substr(COLUMN1,1,7 );

else if (countw(COLUMN1,'NA')-1) ne 0 then COLUMN2=substr(COLUMN1,1, 10);

proc print noobs;

run;

   COLUMN1         COLUMN2

  1. ABC.XYZ.EFG       ABC.XYZ  
  2. ABC.XYZ.EFG       ABC.XYZ  
  3. ABC.XYZ.EFG       ABC.XYZ  
  4. ABC.NA.XYZ.EFG    ABC.NA.XYZ
  5. ABC.NA.XYZ.EFG    ABC.NA.XYZ
  6. ABC.XYZ.EFG       ABC.XYZ  
  7. ABC.XYZ.EFG       ABC.XYZ  
  8. ABC.NA.XYZ        ABC.NA.XYZ
Contributor
Posts: 72

Re: Need help with the right function for this label extraction plz?

Hi HE,

Hmm I like that approach and never really came across in for my need previously the use of Countw function. I will probably have to look into it for more details.

May I ask did you test it?

Mark

New Contributor HE
New Contributor
Posts: 4

Re: Need help with the right function for this label extraction plz?

Hi Mark,

The output of my SAS code is listed in the answer. I don't know details about your data but i beleive you can develop this code if necessary.

Grettings

Hadi Eid

Ask a Question
Discussion stats
  • 7 replies
  • 392 views
  • 9 likes
  • 5 in conversation