DATA Step, Macro, Functions and more

Data cleaning

Reply
Super Contributor
Posts: 647

Data cleaning

(ACCESS) Vagisil
Free Vagisil
*** Vagisil ***
HRC Vagisil

The above shown is the raw data for a drugname and (other variations are possible) .The task to create a new variable clean_drug. If the raw data contains vagisil, then clean_drug is "Vagisil".

This is the case with other drug names too.Any help is appreciated.
Super Contributor
Super Contributor
Posts: 3,174

Re: Data cleaning

Consider using a DATA step and the INDEXW function (and many others like it) against a list of data-strings representing your candidate core sub-string values.

For some technical papers/references on this topic, recommend searching the SAS support http://support.sas.com/ website using the keywords "cleaning" and also "cleansing", or here is a suggested Google advanced search argument which yields some interesting results:

data cleaning cleansing site:sas.com

Scott Barry
SBBWorks, Inc.
Super User
Posts: 5,255

Re: Data cleaning

If this is a common problem, and it is important to have the data properly cleansed, I suggest that you look into DataFlux PowerSudio and possible SAS Data Quality, where you have tools designed to solve these kind problems.

/LIinus
Data never sleeps
Frequent Contributor
Posts: 120

Re: Data cleaning

Just a quick example, case is not taken care of.....

First macro removes known trash and the second keeps known drugs...

data test;
length a $30;
input a 30.;
datalines;
(ACCESS) Vagisil
Free Vagisil
*** Vagisil ***
HRC Vagisil
Get Bobs
;
run;

%macro clean;
%let clean_list = (ACCESS)|HRC|Free|***;
data test1;
set test;
length clean_drug $200;
clean_drug = a;
%let i = 1;
%do %while (%bquote(%scan(&clean_list,&i,|)) ne %str());
clean_drug = strip(tranwrd(clean_drug,"%scan(%quote(&clean_list),&i,|)",""));
%let i = %eval(&i + 1);
%end;
run;
%mend;
%clean;

%macro keep;
%let keep_list = Vagisil|Bobs;
data test2;
set test;
length clean_drug $200;
%let i = 1;
%do %while (%bquote(%scan(&keep_list,&i,|)) ne %str());
if indexw(a,"%scan(%quote(&keep_list),&i,|)") then clean_drug = "%scan(%quote(&keep_list),&i,|)";
%let i = %eval(&i + 1);
%end;
run;
%mend;
%keep;
Contributor
Posts: 66

Re: Data cleaning

That's some code to help soothe your pain....
/Sorry, couldn't resist.
Ask a Question
Discussion stats
  • 4 replies
  • 168 views
  • 0 likes
  • 5 in conversation