BookmarkSubscribeRSS Feed
uopsouthpaw
Calcite | Level 5

I have some pretty messy text data that I need to clean for consistency (case, spaces, spelling, etc.).  Anyway, I'm using proc freq to check my progress in dealing with the entries and come up with Death showing up twice in my frequency table.  I'd appreciate any guidance.  Here is an example of where I'm going:

 

data import1_2;
set Import1 (rename='Harm Code'n = 'Harm Code: Raw'n);
'Harm Code'n = compbl(strip(left(upcase('Harm Code: Raw'n))));

...

run;

*Split observations by delimiter;

Data Import1_Harms (rename=new='Harm Code 2'n);

length new $50.;
set Import1_2;
do i=1 by 1 while(scan('Harm Code'n,i,',') ^=' ');
new=scan('Harm Code'n,i,',');
output;
end;
run;

proc freq data=Import1_Harms;
table 'Harm Code 2'n / missing;
run;

 

Harm Code 2FrequencyPercentCumulativeCumulative
FrequencyPercent
ABNORMAL BLOOD LOSS70.3770.37
ACCESS SITE COMPLICATIONS50.27130.69
CONVERSION10.0525113.33
DEATH311.6528715.24
DEATH (AAA RELATED)40.2129115.45
DEATH (AAA)60.3229715.77
DEATH (INCONCLUSIVE)20.1129915.88
DEATH (INDETERMINATE)10.0530015.93
…….    
AS110.0590247.9
BL320.1190448.01
CMP50.2790948.27
COMPLICATIONS100.5391948.81
CONVERSION10.0592048.86
CTI10.0592349.02
DEATH130.6993649.71
DEATH(UNKNOWN CAUSE)10.0595350.61

 

Death (and other values) show up twice in the frequency list.  Does this have something to do with the original format and field value?

 

Thanks,

 

Wes

2 REPLIES 2
Reeza
Super User
Use COMPRESS() to remove invisible blanks such as tabs or returns. You can use the modifiers to specify invisible blanks.
ballardw
Super User

You may only need to remove leading blanks. The sort order is apparently affected by what ever is causing your problem. A space appears before any letter in the sort order. Proc freq, and others, by default do not display the leading spaces but the order of the data is affected.

 

data example;
  length x $25 ;
  x=' Death'; output;
  x='Accident';output;
  x='Death'; output;
run;

proc freq; run;

data example2;
   set example;
   /* removes leading spaces among other things*/
   x=strip(x);
run;

proc freq data=example2;run;

other procedures allow you to use style options to reveal such:

proc tabulate data=example;
   class x;
   classlev x /style=[Asis=on];
   table x,n;
run;

ASIS=on tells the procedure not to remove the leading space for display.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 439 views
  • 2 likes
  • 3 in conversation