Hi All,
I have a very dirty data for environmental pollutants organized into 5 columns waste1 thru waste5. Demo data 'have' is shown below. The actual data has 865 rows.
I tried to append waste1 thru waste5 into one column. However, stuck right there. What I need is the data that lists the text form of chemicals into a single column with no duplicates. Please see data 'want'. I don't need any number information, i.e., 1,1,1-Trichloroethane is good enough to become 'Trichloroethane' and texts in brackets are not needed, i.e., 1,1,1-Trichloroethane (FOO1 & FOO2) can become as simple as 'Trichloroethane'.
Any hints or suggestions will be appreciated. Thanks for help in advance!
DATA HAVE;
LENGTH waste1 $600 waste2 $600 waste3 $600 waste4 $600 waste5 $600;
INFILE DATALINES DLM='#';
INPUT id waste1 $ waste2 $ waste3 $ waste4 $ waste5 $;
DATALINES;
1 # Filter Cake-Celite, Carbon, Plasticizer # (60% by wt) Toluene, Sulfonic # acid catalyst # Plasticizers, consisting of 26 TM # Trimillitate and DOA Adipate Plasticizers
2 # Chromium sludge # Paint sludge # Metals # Chlorinated solvents #
3 # Trichlorethylene (F001) # Tetrachloroethene {(PCE or "perc.") F002} # 1,1,1-Trichloroethane # Cyanide (F007) # Arsenic (D003)
4 # Chromium sludge # Paint sludge # Metals # Chlorinated solvents #
5 # Plant #1 Ester Sump: Residual organics. # 2 Ethylhexanol, alcohols, diethylene glycol, # ethylene glycol, plasticizer, adipic acid, # polyester # Plant #2 polmyer solids
6 # Cadmium, chromium, dichlorobenzene, # Tetrachloroethylene. dieldrin. # Plating and painting wastes # #
7 # Toluene # Xylene # Ethyl Benzene # #
8 # 1,1,1-Trichloroethane {TCA (F001)} # # # #
9 # Plating sludges # Starchy clay sludge # (D006, D007, D008 wastes) # Trichloroethylene, PVC sludge # Tetrachloroethylene, Vinyl chloride
10 # Chlorinated Solvents (FOO1) (FOO1) # # # #
11 # Toluene # 1-2, Dichloroethylene # 1,1,1-Trichloroethane (FOO1) (FOO2) # Trichloroethylene (TCE) #
12 # Fuel oil, gasoline, solvents # Chlorinated hydrocarbons (FOO1 & FOO2) # # #
13 # Chlorinated hydrocarbons and solvents # (FOO1 & FOO2) # # #
14 # Heavy Metals, Fe, Cr, Cu, Zn, Ni # (D006) (D007) # # #
15 # Waste solvents, phenols # Trichloroethylene (TCE) # Xylene, Dichlorobenzene (FOO1) (FOO2) # Tetrachloroethylene (PCE or "perc.") #
16 # 1,1,1-Trichloroethane (FOO1 & FOO2) # Methylene chloride # # #
17 # Tetrachloroethylene (PCE or "perc.") # Trichloroethylene (TCE) # 1,1,1-Trichloroethane (TCA) # #
18 # Copper, lead (D008) # 1,1,1 Trichloroethane (F002) # # #
19 # Methylene chloride # 1,1,1-Trichloroethane, toluene, xylene, # Ethyl benzene, 1,2-dichloropropane # Aluminum, arsenic, copper, cadmium, chromium, # Lead, 1,1- dichloroethene, ethane and zinc
20 # Inks # Solvents (toluene) (FOO5) # Ethylbenzene # Ethylacetate #
21 # 1,1,1-Trichloroethane {(TCA) (F001 waste)} # Tetrachloroethylene {(PCE or "perc.") (F001)} # # #
;
data want(drop=waste:);
length combined $3000;
set have;
combined = catx(':',waste1,waste2,waste3,waste4,waste5);
run;
data want;
length Chemicals $50;
input Chemicals $;
cards;
Filter Cake-Celite
Carbon
Plasticizer
Chromium sludge
Trichlorethylene
Chromium sludge
Plant Ester Sump
Residual organics
Ester Sump
Cadmium
Toluene
Trichloroethane
Plating sludges
Chlorinated Solvents
Toluene
Fuel oil
Gasoline
Solvents
Chlorinated hydrocarbons
Solvents
Heavy Metals
Fe
Cr
Cu
Zn
Ni
Tetrachloroethylene
Copper
Lead
Methylene chrloride
Inks
Trichloroethane
;
I need your coomments on my recent post below. I also tagged you on the post. Please, if you have time.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.