BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi All,

 

I have a very dirty data for environmental pollutants organized into 5 columns waste1 thru waste5. Demo data 'have' is shown below. The actual data has 865 rows.

 

I tried to append waste1 thru waste5 into one column. However, stuck right there. What I need is the data that lists the text form of chemicals into a single column with no duplicates. Please see data 'want'.  I don't need any number information, i.e., 1,1,1-Trichloroethane is good enough to become 'Trichloroethane' and texts in brackets are not needed, i.e., 1,1,1-Trichloroethane (FOO1 & FOO2) can become as simple as 'Trichloroethane'.  

 

Any hints or suggestions will be appreciated. Thanks for help in advance!

 

DATA HAVE;
LENGTH waste1 $600 waste2 $600 waste3 $600 waste4 $600 waste5 $600;
INFILE DATALINES DLM='#'; 
INPUT id waste1 $ waste2 $ waste3 $ waste4 $ waste5 $;
DATALINES;
1  #  Filter Cake-Celite, Carbon, Plasticizer    # (60% by wt) Toluene, Sulfonic                     # acid catalyst							  # Plasticizers, consisting of 26 TM			  # Trimillitate and DOA Adipate Plasticizers                             
2  #  Chromium sludge                            # Paint sludge                                      # Metals                                     # Chlorinated solvents                          #   
3  #  Trichlorethylene (F001)                    # Tetrachloroethene {(PCE or "perc.") F002}         # 1,1,1-Trichloroethane                      # Cyanide (F007)                                # Arsenic  (D003)
4  #  Chromium sludge                            # Paint sludge                                      # Metals                                     # Chlorinated solvents                          # 
5  #  Plant #1 Ester Sump: Residual organics.    # 2 Ethylhexanol, alcohols, diethylene glycol,      # ethylene glycol, plasticizer, adipic acid, # polyester                                     # Plant #2 polmyer solids
6  #  Cadmium, chromium, dichlorobenzene,        # Tetrachloroethylene. dieldrin.                    # Plating and painting wastes                #                                               # 
7  #  Toluene                                    # Xylene                                            # Ethyl Benzene                              #                                               # 
8  #  1,1,1-Trichloroethane {TCA (F001)}         #                                                   #                                            #                                               # 
9  #  Plating sludges                            # Starchy clay sludge                               # (D006, D007, D008 wastes)                  # Trichloroethylene, PVC sludge                 # Tetrachloroethylene, Vinyl chloride
10 #  Chlorinated Solvents (FOO1) (FOO1)         #                                                   #                                            #                                               # 
11 #  Toluene                                    # 1-2, Dichloroethylene                             # 1,1,1-Trichloroethane (FOO1) (FOO2)        # Trichloroethylene (TCE)                       # 
12 #  Fuel oil, gasoline, solvents               # Chlorinated hydrocarbons (FOO1 & FOO2)            #                                            #                                               # 
13 #  Chlorinated hydrocarbons and solvents      # (FOO1 & FOO2)                                     #                                            #                                               # 
14 #  Heavy Metals, Fe, Cr, Cu, Zn, Ni           # (D006) (D007)                                     #                                            #                                               # 
15 #  Waste solvents, phenols                    # Trichloroethylene (TCE)                           # Xylene, Dichlorobenzene (FOO1) (FOO2)      # Tetrachloroethylene (PCE or "perc.")          # 
16 #  1,1,1-Trichloroethane (FOO1 & FOO2)        # Methylene chloride                                #                                            #                                               # 
17 #  Tetrachloroethylene (PCE or "perc.")       # Trichloroethylene (TCE)                           # 1,1,1-Trichloroethane (TCA)                #                                               # 
18 #  Copper, lead (D008)                        # 1,1,1 Trichloroethane (F002)                      #                                            #                                               # 
19 #  Methylene chloride                         # 1,1,1-Trichloroethane, toluene, xylene,           # Ethyl benzene,  1,2-dichloropropane        # Aluminum, arsenic, copper, cadmium, chromium, # Lead, 1,1- dichloroethene, ethane and zinc
20 #  Inks                                       # Solvents (toluene) (FOO5)                         # Ethylbenzene                               # Ethylacetate                                  # 
21 #  1,1,1-Trichloroethane {(TCA) (F001 waste)} # Tetrachloroethylene  {(PCE or "perc.") (F001)}    #                                            #                                               #                                           
;

 data want(drop=waste:);
  length combined $3000;
  set have;
  combined = catx(':',waste1,waste2,waste3,waste4,waste5);
 run;

data want;
length Chemicals $50;
input Chemicals $;
cards;
Filter Cake-Celite
Carbon
Plasticizer
Chromium sludge
Trichlorethylene
Chromium sludge
Plant Ester Sump 
Residual organics
Ester Sump
Cadmium
Toluene
Trichloroethane 
Plating sludges
Chlorinated Solvents
Toluene
Fuel oil
Gasoline
Solvents
Chlorinated hydrocarbons
Solvents
Heavy Metals
Fe
Cr
Cu
Zn
Ni
Tetrachloroethylene
Copper
Lead
Methylene chrloride
Inks
Trichloroethane
;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
Step 1 - Transpose using PROC TRANSPOSE
Step 2 - Clean it - using SCAN() as necessary
Step 3 - PROC SORT with NODUPKEY.

IME You have all the components to do this on your own. Is there some step you're unsure of?

View solution in original post

3 REPLIES 3
Reeza
Super User
Step 1 - Transpose using PROC TRANSPOSE
Step 2 - Clean it - using SCAN() as necessary
Step 3 - PROC SORT with NODUPKEY.

IME You have all the components to do this on your own. Is there some step you're unsure of?
Cruise
Ammonite | Level 13

@Reeza

I need your coomments on my recent post below. I also tagged you on the post. Please, if you have time. 

https://communities.sas.com/t5/SAS-Programming/Calculate-weighted-average-across-rows-with-condition...

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 670 views
  • 1 like
  • 2 in conversation