BookmarkSubscribeRSS Feed
MarkWik
Quartz | Level 8

Hey Folks,

I have a  dataset 1 with this particular column1 that has values like

COLUMN1

ABC.XYZ.EFG

ABC.XYZ.EFG

ABC.XYZ.EFG

ABC.NA.XYZ.EFG

ABC.NA.XYZ.EFG

ABC.XYZ.EFG

ABC.XYZ.EFG

ABC.NA.XYZ

I need a new dataset 2 with output as the following

COLUMN1                      COLUMN2

ABC.XYZ.EFG                  ABC.XYZ

ABC.XYZ.EFG                  ABC.XYZ

ABC.XYZ.EFG                  ABC.XYZ

ABC.NA.XYZ.EFG             ABC.NA.XYZ

ABC.NA.XYZ.EFG             ABC.NA.XYZ

ABC.XYZ.EFG                  ABC.XYZ

ABC.XYZ.EFG                  ABC.XYZ

ABC.NA.XYZ                     ABC.NA.XYZ

I was of course thinking of applying certain functions like SCAN however you would notice in some some values it is ABC.NA.XYZ.EFG and some with ABC.XYZ.EFG

I think if i split them into two datasets and apply FUNCTION separately to get the needed output, it would work and then remerge back as one. However I would not want to do that. Is there a way in one shot?

7 REPLIES 7
data_null__
Jade | Level 19

This probably good enough.

data noefg;
   input s $40.;
   l = find(s,
'.EFG',-40);
   if l then s1 = substrn(s,1,l-1);
   else s1 = s;
   cards;
ABC.XYZ.EFG
ABC.XYZ.EFG
ABC.XYZ.EFG
ABC.NA.XYZ.EFG
ABC.NA.XYZ.EFG
ABC.XYZ.EFG
ABC.XYZ.EFG
ABC.NA.XYZ
;;;;
   run;
MarkWik
Quartz | Level 8

Hi DN,

Many thanks for the response. I understand the logic assumes constantly .EFG as the last 3 letters. I sincerely apologise for not mentioning earlier how the data looks or changes as the rows pass by.

So in detail: data [Market research Customer segmentation modelScotland dataset composed of consolidated categories with input values being made up of SOURCE.SEGMENTATION.GROUP.SEGMENT or just SOURCE.SEGMENTATION.GROUP  where source is ABC. segmentation is of 3 types 1. XYZ 2. YZX 3.ZXY. The GROUP and SEGMENT varies as you notice in the below input values unlike a consistent EFG. In the required output I want source.segmentation in a new column(output) which is ABC.XYZ, ABC.YZX and ABC.ZXY. There is also a sub segmentation category within the above 3 segmentation called NA (Not Applicable) Segmentation where the INPUT value is present like ABC.NA.XYZ, ABC.NA.YZX and ABC.NA.ZXY.

So the total number of rows is approximately 550 when i scrolled down on the VIEW table window.

INPUT(Source.segmentation.group.segment)                                                           Required OUTPUT (source.segmentation)

ABC.XYZ.SGR /*No segment in here*/                                                                            ABC.XYZ

ABC.XYZ.URB                                                                                                             ABC.XYZ

ABC.XYZ.JKL.KPI /*with segment*/                                                                                ABC.XYZ

ABC.NA.XYZ.KBG.KBR /*NA Not Applicable category with segment*/                        ABC.NA.XYZ

ABC.NA.XYZ.HTY.PLO                                                                                                ABC.NA.XYZ

ABC.XYZ.EFG                                                                                                              ABC.XYZ

ABC.XYZ.EFG                                                                                                                ABC.XYZ

ABC.XYZ.EFG                                                                                                               ABC.XYZ

ABC.NA.XYZ.EFG/*NA Not Applicable category without segment*/                            ABC.NA.XYZ

ABC.NA.XYZ.EFG                                                                                                      ABC.NA.XYZ

ABC.YZX.STG                                                                                                            ABC.YZX

ABC.YZX.SGR                                                                                                             ABC.YZX

ABC.YZX.SBN                                                                                                             ABC.YZX

ABC.YZX.EFG                                                                                                              ABC.YZX

ABC.YZX.EFG                                                                                                                ABC.YZX

ABC.NA.YZX.EFG                                                                                                         ABC.NA.YZX

ABC.NA.YZX.OLK                                                                                                         ABC.NA.YZX

ABC.YZX.EFG                                                                                                                 ABC.YZX

ABC.NA.YZX.EFG                                                                                                         ABC.NA.YZX

ABC.NA.YZX.EFG                                                                                                       ABC.NA.YZX

ABC.NA.YZX.PIL.OUL                                                                                                 ABC.NA.YZX

ABC.NA.YZX.ASD.LKJ                                                                                                ABC.NA.YZX

ABC.ZXY.EFG                                                                                                              ABC.ZXY

ABC.ZXY.EFG                                                                                                             ABC.ZXY

ABC.ZXY.SGR                                                                                                               ABC.ZXY

ABC.ZXY.POL                                                                                                                ABC.ZXY

ABC.NA.ZXY.EFG                                                                                                        ABC.NA.ZXY

ABC.NA.ZXY.EFG                                                                                                     ABC.NA.ZXY

ABC.NA.ZXY.PKL.AFG                                                                                             ABC.NA.ZXY

ABC.NA.ZXY.GTR.EBH                                                                                           ABC.NA.ZXY

ABC.ZXY.UJH                                                                                                          ABC.ZXY

Would appreciate so much. Thanks

Mark

Tom
Super User Tom
Super User

So you want the first two words, except when the second word is 'NA' you also want the third word.

SCAN function is probably what you want to use.

word1=scan(input,1,'.');

word2=scan(input,2,'.');

if word2='NA' then word3=scan(input,3,'.');

output = catx('.',word1,word2,word3);

Haikuo
Onyx | Level 15

On top of DN's suggestion, you can also use tranwrd() or a stronger one: prxchange():


data noefg;

  input col1 $40.;

col2=tranwrd(col1,'.EFG','');

col3=prxchange('s/\.EFG//',-1,col1);

  cards;

ABC.XYZ.EFG

ABC.XYZ.EFG

ABC.XYZ.EFG

ABC.NA.XYZ.EFG

ABC.NA.XYZ.EFG

ABC.XYZ.EFG

ABC.XYZ.EFG

ABC.NA.XYZ

;;;;

  run;

HE
Calcite | Level 5 HE
Calcite | Level 5

data NewData;

set YourData;

if (countw(COLUMN1,'NA')-1)=0 then COLUMN2=substr(COLUMN1,1,7 );

else if (countw(COLUMN1,'NA')-1) ne 0 then COLUMN2=substr(COLUMN1,1, 10);

proc print noobs;

run;

   COLUMN1         COLUMN2

  1. ABC.XYZ.EFG       ABC.XYZ  
  2. ABC.XYZ.EFG       ABC.XYZ  
  3. ABC.XYZ.EFG       ABC.XYZ  
  4. ABC.NA.XYZ.EFG    ABC.NA.XYZ
  5. ABC.NA.XYZ.EFG    ABC.NA.XYZ
  6. ABC.XYZ.EFG       ABC.XYZ  
  7. ABC.XYZ.EFG       ABC.XYZ  
  8. ABC.NA.XYZ        ABC.NA.XYZ
MarkWik
Quartz | Level 8

Hi HE,

Hmm I like that approach and never really came across in for my need previously the use of Countw function. I will probably have to look into it for more details.

May I ask did you test it?

Mark

HE
Calcite | Level 5 HE
Calcite | Level 5

Hi Mark,

The output of my SAS code is listed in the answer. I don't know details about your data but i beleive you can develop this code if necessary.

Grettings

Hadi Eid

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1310 views
  • 9 likes
  • 5 in conversation