BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi All,

 

I have a very dirty data for environmental pollutants organized into 5 columns waste1 thru waste5. Demo data 'have' is shown below. The actual data has 865 rows.

 

I tried to append waste1 thru waste5 into one column. However, stuck right there. What I need is the data that lists the text form of chemicals into a single column with no duplicates. Please see data 'want'.  I don't need any number information, i.e., 1,1,1-Trichloroethane is good enough to become 'Trichloroethane' and texts in brackets are not needed, i.e., 1,1,1-Trichloroethane (FOO1 & FOO2) can become as simple as 'Trichloroethane'.  

 

Any hints or suggestions will be appreciated. Thanks for help in advance!

 

DATA HAVE;
LENGTH waste1 $600 waste2 $600 waste3 $600 waste4 $600 waste5 $600;
INFILE DATALINES DLM='#'; 
INPUT id waste1 $ waste2 $ waste3 $ waste4 $ waste5 $;
DATALINES;
1  #  Filter Cake-Celite, Carbon, Plasticizer    # (60% by wt) Toluene, Sulfonic                     # acid catalyst							  # Plasticizers, consisting of 26 TM			  # Trimillitate and DOA Adipate Plasticizers                             
2  #  Chromium sludge                            # Paint sludge                                      # Metals                                     # Chlorinated solvents                          #   
3  #  Trichlorethylene (F001)                    # Tetrachloroethene {(PCE or "perc.") F002}         # 1,1,1-Trichloroethane                      # Cyanide (F007)                                # Arsenic  (D003)
4  #  Chromium sludge                            # Paint sludge                                      # Metals                                     # Chlorinated solvents                          # 
5  #  Plant #1 Ester Sump: Residual organics.    # 2 Ethylhexanol, alcohols, diethylene glycol,      # ethylene glycol, plasticizer, adipic acid, # polyester                                     # Plant #2 polmyer solids
6  #  Cadmium, chromium, dichlorobenzene,        # Tetrachloroethylene. dieldrin.                    # Plating and painting wastes                #                                               # 
7  #  Toluene                                    # Xylene                                            # Ethyl Benzene                              #                                               # 
8  #  1,1,1-Trichloroethane {TCA (F001)}         #                                                   #                                            #                                               # 
9  #  Plating sludges                            # Starchy clay sludge                               # (D006, D007, D008 wastes)                  # Trichloroethylene, PVC sludge                 # Tetrachloroethylene, Vinyl chloride
10 #  Chlorinated Solvents (FOO1) (FOO1)         #                                                   #                                            #                                               # 
11 #  Toluene                                    # 1-2, Dichloroethylene                             # 1,1,1-Trichloroethane (FOO1) (FOO2)        # Trichloroethylene (TCE)                       # 
12 #  Fuel oil, gasoline, solvents               # Chlorinated hydrocarbons (FOO1 & FOO2)            #                                            #                                               # 
13 #  Chlorinated hydrocarbons and solvents      # (FOO1 & FOO2)                                     #                                            #                                               # 
14 #  Heavy Metals, Fe, Cr, Cu, Zn, Ni           # (D006) (D007)                                     #                                            #                                               # 
15 #  Waste solvents, phenols                    # Trichloroethylene (TCE)                           # Xylene, Dichlorobenzene (FOO1) (FOO2)      # Tetrachloroethylene (PCE or "perc.")          # 
16 #  1,1,1-Trichloroethane (FOO1 & FOO2)        # Methylene chloride                                #                                            #                                               # 
17 #  Tetrachloroethylene (PCE or "perc.")       # Trichloroethylene (TCE)                           # 1,1,1-Trichloroethane (TCA)                #                                               # 
18 #  Copper, lead (D008)                        # 1,1,1 Trichloroethane (F002)                      #                                            #                                               # 
19 #  Methylene chloride                         # 1,1,1-Trichloroethane, toluene, xylene,           # Ethyl benzene,  1,2-dichloropropane        # Aluminum, arsenic, copper, cadmium, chromium, # Lead, 1,1- dichloroethene, ethane and zinc
20 #  Inks                                       # Solvents (toluene) (FOO5)                         # Ethylbenzene                               # Ethylacetate                                  # 
21 #  1,1,1-Trichloroethane {(TCA) (F001 waste)} # Tetrachloroethylene  {(PCE or "perc.") (F001)}    #                                            #                                               #                                           
;

 data want(drop=waste:);
  length combined $3000;
  set have;
  combined = catx(':',waste1,waste2,waste3,waste4,waste5);
 run;

data want;
length Chemicals $50;
input Chemicals $;
cards;
Filter Cake-Celite
Carbon
Plasticizer
Chromium sludge
Trichlorethylene
Chromium sludge
Plant Ester Sump 
Residual organics
Ester Sump
Cadmium
Toluene
Trichloroethane 
Plating sludges
Chlorinated Solvents
Toluene
Fuel oil
Gasoline
Solvents
Chlorinated hydrocarbons
Solvents
Heavy Metals
Fe
Cr
Cu
Zn
Ni
Tetrachloroethylene
Copper
Lead
Methylene chrloride
Inks
Trichloroethane
;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
Step 1 - Transpose using PROC TRANSPOSE
Step 2 - Clean it - using SCAN() as necessary
Step 3 - PROC SORT with NODUPKEY.

IME You have all the components to do this on your own. Is there some step you're unsure of?

View solution in original post

3 REPLIES 3
Reeza
Super User
Step 1 - Transpose using PROC TRANSPOSE
Step 2 - Clean it - using SCAN() as necessary
Step 3 - PROC SORT with NODUPKEY.

IME You have all the components to do this on your own. Is there some step you're unsure of?
Cruise
Ammonite | Level 13

@Reeza

I need your coomments on my recent post below. I also tagged you on the post. Please, if you have time. 

https://communities.sas.com/t5/SAS-Programming/Calculate-weighted-average-across-rows-with-condition...

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 532 views
  • 1 like
  • 2 in conversation