- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
would like to combine data from multiple cells into one cell using SAS Data Integration Studio.
My data is divided into three different tables, as follows:
Table 1 (Number of columns is unknown)
City A | City B | City C |
Table 2 (Number of columns is unknown)
State A | State B | State C |
Table 3
x | y |
Desired final table:
That is, I want to create a final table in which I can join the data from table 1 (Number of columns is unknown) in just one field and the data from table 2 (Number of columns is also unknown) in another field, separating the respective values with a comma.
x | y | State A, State B, State C | City A, City B, City C |
Can someone help me with the code so I can count columns and then CATX () without having to do mappings, since I don't know how many columns I will have? In addition, the next time I run the job, some city may have left the list, which reduces my number of columns.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So all your tables will always have only one observation each?
How were those tables created?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The objective is to join all the observations in Table 1 in a single column and all the observations in Table 2 in a single column, so that each of these two tables has only one column.
The columns in Tables 1 and 2 are created automatically if there is data in my source table. That is, if there are 30 cities, col 1 to col 30 are created automatically. If there are only 3 cities, only col 1 to col 3 will be created. And this same logic is used for the state table (table 2).
The point is that the number of columns that tables 1 and 2 have can be different each time I run my job. Since cities and states can be inserted or excluded from my source table.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
We need more data,
For instance, in the data, table 1 & 2, when a column does not have a city or state, does it have a consistent value indicating that the data is missing. Are empty strings used for that, or what?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The columns in Tables 1 and 2 are created automatically if there is data in my source table. That is, if there are 30 cities, col 1 to col 30 are created automatically. If there are only 3 cities, only col 1 to col 3 will be created. And this same logic is used for the state table (table 2)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@tazz_bh wrote:
The columns in Tables 1 and 2 are created automatically if there is data in my source table. That is, if there are 30 cities, col 1 to col 30 are created automatically. If there are only 3 cities, only col 1 to col 3 will be created. And this same logic is used for the state table (table 2)
And do your source tables already have that unfavorable wide layout?
Note that data is much easier to handle with a fixed number of columns and a variable number of observations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I transposed rows to columns. I can also do catx () for all rows, but it follows the same logic: The column number (cities) will be fixed at 1, but the number of rows will be unknown, so CATX would have to be made for an unknown number of rows. Do you think it's easier that way?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Concatenating is easy across rows:
data want;
set have end=done;
length cities $1000;
cities = catx(",",cities,city);
if done;
run;
If you have a grouping variable, use a BY statement and FIRST./LAST. syntax. In which case, you also create the final table by merging with that same BY variable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming this is also you?
Do you have a naming convention for your City/State table? If so this is trivial, if not, then you need some extra steps.
You should show what you've tried, rather than say you've tried something. This assumes your city variables are named city1-cityn and your state variables are named state1-stateN.
Assuming you have a naming convention on your city and state table this should work. If you don't you'll have to query the names of the columns from dictionary.table or dictionary.column and save those to a macro variable to use the CATX() function.
data want;
merge table1 table2 table3 table4;
state = catx(of state:);
city = catx(of city:);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have this diagram:
With this data:
And I wall I want is to put NYC and LAX in a single field. In this example I have only two cities, but at any given moment I can have several other cities, so I need a code that is malleable enough to concatenate all cities rows, regardless of the number of cities I have in this column.
Should I use a User Written code to do so?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Full solution to your original problem.
data t1;
informat city1-city3 $8.;
input city1-city3;
cards;
CityA CityB CityC
;;;;;
run;
data t2;
informat state1-state4 $8.;
input state1-state4 ;
cards;
State1 State2 State3 State4
;;;;
run;
data t3;
input v1 $ v2 $;
cards;
x y
;
run;
data want;
merge t3 t1 t2;
list_state = catx(", ", of city:);
list_city = catx(", ", of state:);
drop state: city:;
run;
Results:
Obs v1 v2 list_state list_city
1 x y CityA, CityB, CityC State1, State2, State3, State4
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Building on this. The city dataset is a "tall data set" that requires a transpose to get it to be in the form of the original post. - But - if each account ( ID, etc) has variable number of cities we will need to handle that too.
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_SHOES AS
SELECT DISTINCT t1.Region,
/* Subsidiary, ** First three characters** */
(UPCASE(substr(t1.Subsidiary,1,3))) AS Subsidiary
FROM SASHELP.SHOES t1;
QUIT;
PROC TRANSPOSE DATA=WORK.QUERY_FOR_SHOES
OUT=WORK.Transposed(DROP=_:)
PREFIX=City_
;
BY Region;
VAR Subsidiary;
RUN; QUIT;
data want;
set Transposed;
drop city_: ;
array city $ city_:;
length list_city $200;
list_city=" ";
do over city;
if not missing(city)
then list_city=catx(', ',list_city,city);
end;
run;
Since we dont "have" example data to work with, I made some data from SASHELP.SHOES.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'll just say too, that its odd to me that you are taking one set of city and states per one record at a time. I think I need to learn more about SAS Data Integration Studio, but if multiple sets of cities and states are being handled, then this is how I would handle it in SAS foundation.