BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rgjpot
Calcite | Level 5

I have a problem with merging my data.

 

If have the following two tables

Table 1      
xyz2019-012019-022020-062021-01
abc12110
abd03111
bad2138
acb1452

 

Table 2       
xyz2019-012019-052020-072021-012021-02
abc001811
abd143220
bad01002
aee00021

 

When I merge them i get

xyz2019-012019-022020-062021-012019-052020-072021-02
abc02180111
abd13124320
acb1452...
aee0..2001
bad0130102

 

However i need to get this

xyz2019-012019-022019-052020-072020-062021-012021-02
abc120111811
abd134311320
bad2110382
acb1400520
aee0000021

 

I found some posts that came close to the awnser i needed but there is a second problem that i don't now the year-month upfront and the differ every time i get a new file. (origanal files contain over 40000 records with up to 80 colums) .

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Easy enough if X,Y,Z are character and the others are numeric.

data both;
  set table1 table2;
  by x y z;
run;

proc summary data=both ;
  by x y z;
  var _numeric_;
  output out=want(drop=_type_ _freq_) sum= ;
run;

You can even make the first data step create a view so that you don't have to make a physical copy of the combined data.

 

If some of the groups and some of the variables only appear on one of the datasets you might get some observations with missing values.  You could use PROC STDIZE to replace the missing with zeros if you need.

View solution in original post

6 REPLIES 6
ballardw
Super User

Please provide data in the form of a working data step for each data set.

Also, provide the code for how you are "merging" the data.

 

Almost certainly to get the desired sums you show a data step merge is not the approach you want. You would use SET to combine the data and then do the sums.

 

But we do have an issue in your data as the names you show are not default legal SAS data names AND you are hiding data in the variable name. Typically that means the data is structured poorly for almost any activity.

Reeza
Super User

You can make the process entirely dynamic by flipping the data to a long format and then appending the results, sorting and merging. 

Or you can generate a dynamic select statement. 

 

Rough idea for solution 1:

 

proc transpose data=table1 out=table1_long;
by x y z;
var _numeric_;
run;

proc transpose data=table2 out=table2_long;
by x y z;
var _numeric_;
run;

data combined;
set table1_long table2_long;
run;

proc sort data=combined;
by x y z _name_;
run;

proc transpose data=combined out=final prefix= D_;
by x y z;
var col1;
id _name_;
run;

And for solution 2:

*get dates in order;
proc sql noprint;
select nliteral(name) into :name_list separated by ", "
from sashelp.vcolumns
where libname='WORK' and upcase(memname) = 'MERGED'
and name not in ('X' , 'Y', 'Z')
order by name;
quit;

*recreate table with variables in order;
proc sql;
create table want as
select x, y, z, &name_list
from merged;
quit;

Untested code as I didn't want to make a data step to generate sample data.


@rgjpot wrote:

I have a problem with merging my data.

 

If have the following two tables

Table 1            
x y z 2019-01 2019-02 2020-06 2021-01
a b c 1 2 1 10
a b d 0 3 1 11
b a d 2 1 3 8
a c b 1 4 5 2

 

Table 2              
x y z 2019-01 2019-05 2020-07 2021-01 2021-02
a b c 0 0 1 8 11
a b d 1 4 3 2 20
b a d 0 1 0 0 2
a e e 0 0 0 2 1

 

When I merge them i get

x y z 2019-01 2019-02 2020-06 2021-01 2019-05 2020-07 2021-02
a b c 0 2 1 8 0 1 11
a b d 1 3 1 2 4 3 20
a c b 1 4 5 2 . . .
a e e 0 . . 2 0 0 1
b a d 0 1 3 0 1 0 2

 

However i need to get this

x y z 2019-01 2019-02 2019-05 2020-07 2020-06 2021-01 2021-02
a b c 1 2 0 1 1 18 11
a b d 1 3 4 3 1 13 20
b a d 2 1 1 0 3 8 2
a c b 1 4 0 0 5 2 0
a e e 0 0 0 0 0 2 1

 

I found some posts that came close to the awnser i needed but there is a second problem that i don't now the year-month upfront and the differ every time i get a new file. (origanal files contain over 40000 records with up to 80 colums) .

 

 

 


 

Tom
Super User Tom
Super User

Easy enough if X,Y,Z are character and the others are numeric.

data both;
  set table1 table2;
  by x y z;
run;

proc summary data=both ;
  by x y z;
  var _numeric_;
  output out=want(drop=_type_ _freq_) sum= ;
run;

You can even make the first data step create a view so that you don't have to make a physical copy of the combined data.

 

If some of the groups and some of the variables only appear on one of the datasets you might get some observations with missing values.  You could use PROC STDIZE to replace the missing with zeros if you need.

rgjpot
Calcite | Level 5
Thank you Tom this works as a charm.
Kurt_Bremser
Super User

Firdt of all, transpose the tables so that the date values become data instead of structure.

Then, stack those tables into one, and finally, build the sum with PROC SUMMARY, CLASS X Y Z DATE.

Kurt_Bremser
Super User

Just to illustrate the usefulness of a long layout:

data table1;
infile datalines dlm='09'x;
input (x y z) ($) '2019-01'n '2019-02'n '2020-06'n '2021-01'n;
datalines;
a	b	c	1	2	1	10
a	b	d	0	3	1	11
b	a	d	2	1	3	8
a	c	b	1	4	5	2
;

data table2; 	 	 	 	 	 	 
infile datalines dlm='09'x;
input (x y z) ($) '2019-01'n '2019-05'n '2020-07'n '2021-01'n '2021-02'n;
datalines;
a	b	c	0	0	1	8	11
a	b	d	1	4	3	2	20
b	a	d	0	1	0	0	2
a	e	e	0	0	0	2	1
;


proc transpose data=table1 out=t1;
by x y z notsorted;
var _numeric_;
run;

data t1_dt;
set t1;
date = input(cats(_name_,"-01"),yymmdd10.);
format date yymmd7.;
drop _name_;
rename col1=value;
run;

proc transpose data=table2 out=t2;
by x y z notsorted;
var _numeric_;
run;

data t2_dt;
set t2;
date = input(cats(_name_,"-01"),yymmdd10.);
format date yymmd7.;
drop _name_;
rename col1=value;
run;

data all;
set
  t1_dt
  t2_dt
;
run;

Once your data is in proper shape, it's simple:

proc summary data=all nway;
class x y z date;
var value;
output out=want (drop=_:) sum()=;
run;

And if you need a report out of the raw data, PROC REPORT does the sum on the fly:

proc report data=all;
column x y z value,date;
define x / group;
define y / group;
define z / group;
define value / "" sum;
define date / "" across;
run;

If you wrap that PROC REPORT into ODS EXCEL, you get an Excel file with the wide structure that even your boss can read. But as long as you need to do analysis, avoid the wide structure. See Maxim 19.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1792 views
  • 0 likes
  • 5 in conversation