BookmarkSubscribeRSS Feed
CynthiaWei
Obsidian | Level 7

Hi SAS Pros,

I am facing a QUESTION!

 

I am doing an almost entire character dataset. I found some texts with space after it. Specifically, I want to remove all the trailing space after text. I don't care about the leading space or the extra space in between. I have tried trim(var name), trimn(var name), strip(var name), and compress(var name). But none of them worked for my data. I don't why it doesn't.

 

Also, I am curious that why SAS cannot recognize the text with a trailing space, e.g. Egg; bread; carrot; and Egg; bread; carrot;  (there is a trailing space after). I found the texts with trailing spaces were not successfully coded.

 

Thank you so much for any help!

Best regards,

10 REPLIES 10
PaigeMiller
Diamond | Level 26

Show us the code you have tried, by pasting your code into the box that appears when you click on the "running man" icon.

 

Show us a portion of your data as SAS data step code.

 
--
Paige Miller
CynthiaWei
Obsidian | Level 7

Thank you Paige and all of you who responded me with this thread!!!

 

I think we won't know what the exact issue it is until we see the portion of the data. Enclosed is the portion of the variable with trailing space from what I understand. Actually, it could be leading spaces or tab or return/enter or whatever. You can see from the Excel file that I uploaded. If you find there are two 'identical' values, there must be one without a trailing space and one with it, such as 'Atrial Fibrillation' and 'Atrial Fibrillation '. When I created dummy variable, SAS can only recognize the first 'Atrial Fibrillation'---the one without the trailing space after text. Same issues happened with 'Chronic Alcoholism; Atherosclerotic and Hypertensive Cardiovascular Disease; Emphysema '. Because the variable doesn't have the value of this one without the trailing space, it only showed up once, but it still has a trailing space that I want to remove.

I have used the following syntax to generate the dummies, but realized that SAS doesn't recognize those ones with trailing spaces:

data have;

set want;

if significant_conditions in

('Chronic Alcoholism; Atherosclerotic and Hypertensive Cardiovascular Disease; Emphysema ',

'Cardiomegaly; Arteriosclerotic Cardiovascular Disease; Morbid Obesity ',

'Atrial Fibrillation',

'Atrial Fibrillation ',

'Cardiomegaly',

'Cardiomegaly ',

'Atherosclerotic and Hypertensive Cardiovascular Disease ',

'Atherosclerotic and Hypertensive Cardiovascular Disease') then heart_disease=1;

if significant_conditions in

('Chronic Alcoholism; Atherosclerotic and Hypertensive Cardiovascular Disease; Emphysema ',

'Cardiomegaly; Arteriosclerotic Cardiovascular Disease; Morbid Obesity ') then Alcoholism=1;

if significant_conditions in

('Chronic Alcoholism; Atherosclerotic and Hypertensive Cardiovascular Disease; Emphysema ',

'Cardiomegaly; Arteriosclerotic Cardiovascular Disease; Morbid Obesity ') then obesity=1;

if significant_conditions in

('Chronic Alcoholism; Atherosclerotic and Hypertensive Cardiovascular Disease; Emphysema ',

'Cardiomegaly; Arteriosclerotic Cardiovascular Disease; Morbid Obesity ') then respiratory_disease=1;

run;

 

 

 

 

 

 

 

 

Tom
Super User Tom
Super User

If you really need to test if 'XXX' is different than 'XXX     ' then you will probably need to change the spaces to some other character and then test.  But I cannot think of any real situation where that would be needed, and your example program does not need it. In your current code you are repeating the same string to test in the IN list. Having the same string twice in the list does not make the result of the IN operator any different. Prove it to yourself by writing a simple test program.

data have ;
  input string $20.;
cards;
xxx
abc
;

data want;
  set have;
   test1 = string  in ('xxx');
   test2 = string in ('xxx     ');
   test3 = string in ('xxx                 ');
run;

Results:

Obs    string    test1    test2    test3

 1      xxx        1        1        1
 2      abc        0        0        0

 

Please show some actual data for the variable significant_conditions in the dataset have and what result you want out. 

 

Patrick
Opal | Level 21

@CynthiaWei 

Repeating the code I've already posted here to answer your question. This code will also work for your medical data - and it will clean-up your strings by replacing any combination of characters that are not a letter, number, underscore or dash with a blank.

data have;
  infile datalines truncover dlm='|';
  input id lunch :$100.;
  datalines4;
1|egg; bread; carrot;
2|cheese;	veggie juice; cucumber;
3|noodles; three pieces	of pork ribs; lettuce 
4|some bread   stick; salmon; mango juice
5|milk; cookie; tomato;	
6|bread; carrot; egg;
;;;;

data have_long(drop=_:);
  length id 8;
  set have(rename=(id=group_id));
  length item $60;
  do _i=1 by 1;
    item=scan(lunch,_i,';');
    item=prxchange('s/[^\w-]+/ /',-1,strip(item));
    item=upcase(strip(item));
    if missing(item) then leave;
    id+1;
    output;
  end;
run;

proc format;
  value $foodcat (multilabel)
    'BREAD' = 'Carbon'
    'CARROT' = 'Fiber'
    'CHEESE' = 'Protein'
    'COOKIE' = 'Carbon'
    'CUCUMBER' = 'Fiber'
    'EGG' = 'Protein'
    'LETTUCE' = 'Fiber'
    'MANGO JUICE' = 'Fiber'
    'MANGO JUICE' = 'Juice'
    'MILK' = 'Protein'
    'NOODLES' = 'Carbon'
    'SALMON' = 'Protein'
    'SOME BREAD STICK' = 'Carbon'
    'THREE PIECES OF PORK RIBS' = 'Protein'
    'TOMATO' = 'Fiber'
    'VEGGIE JUICE' = 'Fiber'
    'VEGGIE JUICE' = 'Juice'
    ;
run;

proc summary data=have_long n nway;
  class group_id;
  class item / mlf;
  id lunch;
  format item $foodcat.;
  output out=long_n;
run; 

proc transpose data=long_n out=want(drop=_:) ;
  by group_id lunch;
  id item;
  var _freq_;
run;

proc print data=want;
run;

Patrick_0-1593385631457.png

Just create a list of distinct terms from the have_long table and then use this list to replace the format in above code. 

The multilabel option in the format allows you to assign more than one category to a single term. The category names become the SAS variable names when transposing the data so make sure to use strings that are valid to use as SAS variable names. 

proc sql;
  select distinct item
  from have_long;
quit;

 

 

Patrick
Opal | Level 21

I believe you're looking for compbl(). Solution approach to "clean-up" your data here where you're asking the same question but give us more information.

ChrisNZ
Tourmaline | Level 20

I may misunderstand how SAS stores strings.

SAS stores fixed-length character strings. If the string is shorter than the variable, its value is padded with trailing spaces. For this reason, when comparing values, SAS omits trailing spaces, thus 'A' is equal to 'A  '.

 

Reeza
Super User
Your strings actually had leading spaces not trailing if you're referring to your previous question. You can see them by printing out your values using a hex format and seeing a 20 as the first digits which is the hex code for a space.

proc freq data=have;
table item;
format item $hex32.;
run;

Tom
Super User Tom
Super User

Your question is not at all clear.  You posted this in you quesiton:

Egg; bread; carrot; 
and 
Egg; bread; carrot;

Both of those two strings separated by AND look exactly the same as you can tell when posted using a monospaced font.

ballardw
Super User

And problems like this are exactly why I suggested making a "long" set with single values and no delimiters. If your data source is so inconsistent then it really is up to you make it consistent. Or browbeat the source to fix the problems at the source.

I suspect it will be much easier to clean the data to a consistent behavior than the code around all of the garbage in long inconsistently delimited strings.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2136 views
  • 0 likes
  • 8 in conversation