SAS Programming

rgjpot · Posted 08-24-2021 11:47 AM

I have a problem with merging my data.

If have the following two tables

Table 1
x	y	z	2019-01	2019-02	2020-06	2021-01
a	b	c	1	2	1	10
a	b	d	0	3	1	11
b	a	d	2	1	3	8
a	c	b	1	4	5	2

Table 2
x	y	z	2019-01	2019-05	2020-07	2021-01	2021-02
a	b	c	0	0	1	8	11
a	b	d	1	4	3	2	20
b	a	d	0	1	0	0	2
a	e	e	0	0	0	2	1

When I merge them i get

x	y	z	2019-01	2019-02	2020-06	2021-01	2019-05	2020-07	2021-02
a	b	c	0	2	1	8	0	1	11
a	b	d	1	3	1	2	4	3	20
a	c	b	1	4	5	2	.	.	.
a	e	e	0	.	.	2	0	0	1
b	a	d	0	1	3	0	1	0	2

However i need to get this

x	y	z	2019-01	2019-02	2019-05	2020-07	2020-06	2021-01	2021-02
a	b	c	1	2	0	1	1	18	11
a	b	d	1	3	4	3	1	13	20
b	a	d	2	1	1	0	3	8	2
a	c	b	1	4	0	0	5	2	0
a	e	e	0	0	0	0	0	2	1

I found some posts that came close to the awnser i needed but there is a second problem that i don't now the year-month upfront and the differ every time i get a new file. (origanal files contain over 40000 records with up to 80 colums) .

Tom · Posted 08-24-2021 12:17 PM

Easy enough if X,Y,Z are character and the others are numeric.

data both;
  set table1 table2;
  by x y z;
run;

proc summary data=both ;
  by x y z;
  var _numeric_;
  output out=want(drop=_type_ _freq_) sum= ;
run;

You can even make the first data step create a view so that you don't have to make a physical copy of the combined data.

If some of the groups and some of the variables only appear on one of the datasets you might get some observations with missing values. You could use PROC STDIZE to replace the missing with zeros if you need.

View solution in original post

ballardw · Posted 08-24-2021 11:57 AM

Please provide data in the form of a working data step for each data set.

Also, provide the code for how you are "merging" the data.

Almost certainly to get the desired sums you show a data step merge is not the approach you want. You would use SET to combine the data and then do the sums.

But we do have an issue in your data as the names you show are not default legal SAS data names AND you are hiding data in the variable name. Typically that means the data is structured poorly for almost any activity.

Reeza · Posted 08-24-2021 12:09 PM

You can make the process entirely dynamic by flipping the data to a long format and then appending the results, sorting and merging.

Or you can generate a dynamic select statement.

Rough idea for solution 1:

proc transpose data=table1 out=table1_long;
by x y z;
var _numeric_;
run;

proc transpose data=table2 out=table2_long;
by x y z;
var _numeric_;
run;

data combined;
set table1_long table2_long;
run;

proc sort data=combined;
by x y z _name_;
run;

proc transpose data=combined out=final prefix= D_;
by x y z;
var col1;
id _name_;
run;

And for solution 2:

*get dates in order;
proc sql noprint;
select nliteral(name) into :name_list separated by ", "
from sashelp.vcolumns
where libname='WORK' and upcase(memname) = 'MERGED'
and name not in ('X' , 'Y', 'Z')
order by name;
quit;

*recreate table with variables in order;
proc sql;
create table want as
select x, y, z, &name_list
from merged;
quit;

Untested code as I didn't want to make a data step to generate sample data.

@rgjpot wrote:

I have a problem with merging my data.

If have the following two tables

Table 1

x y z 2019-01 2019-02 2020-06 2021-01

a b c 1 2 1 10

a b d 0 3 1 11

b a d 2 1 3 8

a c b 1 4 5 2

Table 2

x y z 2019-01 2019-05 2020-07 2021-01 2021-02

a b c 0 0 1 8 11

a b d 1 4 3 2 20

b a d 0 1 0 0 2

a e e 0 0 0 2 1

When I merge them i get

x y z 2019-01 2019-02 2020-06 2021-01 2019-05 2020-07 2021-02

a b c 0 2 1 8 0 1 11

a b d 1 3 1 2 4 3 20

a c b 1 4 5 2 . . .

a e e 0 . . 2 0 0 1

b a d 0 1 3 0 1 0 2

However i need to get this

x y z 2019-01 2019-02 2019-05 2020-07 2020-06 2021-01 2021-02

a b c 1 2 0 1 1 18 11

a b d 1 3 4 3 1 13 20

b a d 2 1 1 0 3 8 2

a c b 1 4 0 0 5 2 0

a e e 0 0 0 0 0 2 1

I found some posts that came close to the awnser i needed but there is a second problem that i don't now the year-month upfront and the differ every time i get a new file. (origanal files contain over 40000 records with up to 80 colums) .

Tom · Posted 08-24-2021 12:17 PM

Easy enough if X,Y,Z are character and the others are numeric.

data both;
  set table1 table2;
  by x y z;
run;

proc summary data=both ;
  by x y z;
  var _numeric_;
  output out=want(drop=_type_ _freq_) sum= ;
run;

You can even make the first data step create a view so that you don't have to make a physical copy of the combined data.

If some of the groups and some of the variables only appear on one of the datasets you might get some observations with missing values. You could use PROC STDIZE to replace the missing with zeros if you need.

rgjpot · Posted 08-26-2021 03:21 AM

Thank you Tom this works as a charm.

Kurt_Bremser · Posted 08-24-2021 12:25 PM

Firdt of all, transpose the tables so that the date values become data instead of structure.

Then, stack those tables into one, and finally, build the sum with PROC SUMMARY, CLASS X Y Z DATE.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 08-26-2021 03:44 AM

Just to illustrate the usefulness of a long layout:

data table1;
infile datalines dlm='09'x;
input (x y z) ($) '2019-01'n '2019-02'n '2020-06'n '2021-01'n;
datalines;
a	b	c	1	2	1	10
a	b	d	0	3	1	11
b	a	d	2	1	3	8
a	c	b	1	4	5	2
;

data table2; 	 	 	 	 	 	 
infile datalines dlm='09'x;
input (x y z) ($) '2019-01'n '2019-05'n '2020-07'n '2021-01'n '2021-02'n;
datalines;
a	b	c	0	0	1	8	11
a	b	d	1	4	3	2	20
b	a	d	0	1	0	0	2
a	e	e	0	0	0	2	1
;


proc transpose data=table1 out=t1;
by x y z notsorted;
var _numeric_;
run;

data t1_dt;
set t1;
date = input(cats(_name_,"-01"),yymmdd10.);
format date yymmd7.;
drop _name_;
rename col1=value;
run;

proc transpose data=table2 out=t2;
by x y z notsorted;
var _numeric_;
run;

data t2_dt;
set t2;
date = input(cats(_name_,"-01"),yymmdd10.);
format date yymmd7.;
drop _name_;
rename col1=value;
run;

data all;
set
  t1_dt
  t2_dt
;
run;

Once your data is in proper shape, it's simple:

proc summary data=all nway;
class x y z date;
var value;
output out=want (drop=_:) sum()=;
run;

And if you need a report out of the raw data, PROC REPORT does the sum on the fly:

proc report data=all;
column x y z value,date;
define x / group;
define y / group;
define z / group;
define value / "" sum;
define date / "" across;
run;

If you wrap that PROC REPORT into ODS EXCEL, you get an Excel file with the wide structure that even your boss can read. But as long as you need to do analysis, avoid the wide structure. See Maxim 19.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SAS Programming

merge and sum based on multiple variables and colum names

Re: merge and sum based on multiple variables and colum names

Re: merge and sum based on multiple variables and colum names

Re: merge and sum based on multiple variables and colum names

Re: merge and sum based on multiple variables and colum names

Re: merge and sum based on multiple variables and colum names

Re: merge and sum based on multiple variables and colum names

Re: merge and sum based on multiple variables and colum names

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...