BookmarkSubscribeRSS Feed
tazz_bh
Fluorite | Level 6

 would like to combine data from multiple cells into one cell using SAS Data Integration Studio.

My data is divided into three different tables, as follows:

 

Table 1 (Number of columns is unknown)

 

Col 1 Col 2 Col 3
City ACity BCity C

 

Table 2 (Number of columns is unknown)

 

Col 1 Col 2 Col 3
State AState BState C

 

Table 3

 

Variable 1 Variable 2
xy

 

Desired final table:

That is, I want to create a final table in which I can join the data from table 1 (Number of columns is unknown) in just one field and the data from table 2 (Number of columns is also unknown) in another field, separating the respective values ​​with a comma.

 

Variable 1 Variable 2 States Cities
xyState A, State B, State CCity A, City B, City C

 

Can someone help me with the code so I can count columns and then CATX () without having to do mappings, since I don't know how many columns I will have? In addition, the next time I run the job, some city may have left the list, which reduces my number of columns.

 

15 REPLIES 15
tazz_bh
Fluorite | Level 6

The objective is to join all the observations in Table 1 in a single column and all the observations in Table 2 in a single column, so that each of these two tables has only one column.

 

The columns in Tables 1 and 2 are created automatically if there is data in my source table. That is, if there are 30 cities, col 1 to col 30 are created automatically. If there are only 3 cities, only col 1 to col 3 will be created. And this same logic is used for the state table (table 2).

 

The point is that the number of columns that tables 1 and 2 have can be different each time I run my job. Since cities and states can be inserted or excluded from my source table.

PhilC
Rhodochrosite | Level 12

We need more data, 

 

For instance, in the data, table 1 & 2, when a column does not have a city or state, does it have a consistent value indicating that the data is missing.  Are empty strings used for that, or what? 

tazz_bh
Fluorite | Level 6

The columns in Tables 1 and 2 are created automatically if there is data in my source table. That is, if there are 30 cities, col 1 to col 30 are created automatically. If there are only 3 cities, only col 1 to col 3 will be created. And this same logic is used for the state table (table 2)

Kurt_Bremser
Super User

@tazz_bh wrote:

The columns in Tables 1 and 2 are created automatically if there is data in my source table. That is, if there are 30 cities, col 1 to col 30 are created automatically. If there are only 3 cities, only col 1 to col 3 will be created. And this same logic is used for the state table (table 2)


And do your source tables already have that unfavorable wide layout?

Note that data is much easier to handle with a fixed number of columns and a variable number of observations.

tazz_bh
Fluorite | Level 6

I transposed rows to columns. I can also do catx () for all rows, but it follows the same logic: The column number (cities) will be fixed at 1, but the number of rows will be unknown, so CATX would have to be made for an unknown number of rows. Do you think it's easier that way?

Kurt_Bremser
Super User

Concatenating is easy across rows:

data want;
set have end=done;
length cities $1000;
cities = catx(",",cities,city);
if done;
run;

If you have a grouping variable, use a BY statement and FIRST./LAST. syntax. In which case, you also create the final table by merging with that same BY variable.

 

Reeza
Super User

Assuming this is also you?

https://stackoverflow.com/questions/66766062/combine-data-from-two-or-more-field-into-one-field-in-s...

 

 

Do you have a naming convention for your City/State table? If so this is trivial, if not, then you need some extra steps.

You should show what you've tried, rather than say you've tried something. This assumes your city variables are named city1-cityn and your state variables are named state1-stateN.

 

Assuming you have a naming convention on your city and state table this should work. If you don't you'll have to query the names of the columns from dictionary.table or dictionary.column and save those to a macro variable to use the CATX() function. 

 

data want;
merge table1 table2 table3 table4;
state = catx(of state:);
city = catx(of city:);
run;

 

 

 

 

 

 

tazz_bh
Fluorite | Level 6

 

I have this  diagram: 

tazz_bh_1-1616687372160.png

With this data: 

tazz_bh_2-1616687427427.png

 

And I wall I want is to put NYC and LAX in a single field. In this example I have only two cities, but at any given moment I can have several other cities, so I need a code that is malleable enough to concatenate all cities rows, regardless of the number of cities I have in this column.

 

Should I use a User Written code to do so? 

 

tazz_bh_3-1616687697515.png

 

 

Reeza
Super User
Your reply on SO said you were using code.
tazz_bh
Fluorite | Level 6
I am trying to mix a code since there is no GUI to do what I need.
Reeza
Super User

Full solution to your original problem. 

 

data t1;
informat city1-city3 $8.;
input city1-city3;
cards;
CityA CityB CityC
;;;;;
run;

data t2;
informat state1-state4 $8.;
input state1-state4 ;
cards;
State1 State2 State3 State4
;;;;
run;

data t3;
input v1 $ v2 $;
cards;
x y
;
run;

data want;
merge t3 t1 t2;
list_state = catx(", ", of city:);
list_city = catx(", ", of state:);

drop state: city:;
run;

Results:


Obs	v1	v2	list_state	list_city
1	x	y	CityA, CityB, CityC	State1, State2, State3, State4
PhilC
Rhodochrosite | Level 12

Building on this.  The city dataset is a "tall data set" that requires a transpose to get it to be in the form of the original post.  - But - if each account ( ID, etc) has variable number of cities we will need to handle that too.

 

PROC SQL;
   CREATE TABLE WORK.QUERY_FOR_SHOES AS 
   SELECT DISTINCT t1.Region, 
          /* Subsidiary, ** First three characters** */
            (UPCASE(substr(t1.Subsidiary,1,3))) AS Subsidiary
      FROM SASHELP.SHOES t1;
QUIT;

PROC TRANSPOSE DATA=WORK.QUERY_FOR_SHOES
	OUT=WORK.Transposed(DROP=_:)
	PREFIX=City_
;
	BY Region;
	VAR Subsidiary;
RUN; QUIT;

data want;
  set Transposed;
    drop city_: ;
  array city $ city_:;
  length list_city $200;
    list_city=" ";
  do over city;
    if not missing(city)
      then list_city=catx(', ',list_city,city);
  end;
run;

 

Since we dont "have" example data to work with, I made some data from SASHELP.SHOES.

PhilC
Rhodochrosite | Level 12

I'll just say too, that its odd to me that you are taking one set of city and states per one record at a time.  I think I need to learn more about SAS Data Integration Studio, but if multiple sets of cities and states are being handled, then this is how I would handle it in SAS foundation.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 3728 views
  • 3 likes
  • 5 in conversation