Hello,
I have 2 data sets :
Data set "Have" with 3 columns: ID, month, Y
Data set Help_tbl with one column: month
I want to add for each ID in data set "Have" rows for months that are not existing there.
For example:
For ID=1 will add 4 rows with months values : 2012, 2106,2107,2108
For ID=2 will add 4 rows with months values : 2012, 2104,2105,2106,2107,2108
For ID=3 will add 4 rows with months values : 2012, 2101,2102,2103,2106,2107,2108
The values of Y available will be null in the rows that I added.
Wanted data set is :
The following code that I wrote doesn't give the desired result(As you can see)
Any help please?
Data Helpmonths;
input month;
cards;
2012
2101
2102
2103
2104
2105
2106
2107
2108
;
run;
Data have;
input id month Y;
cards;
1 2101 10
1 2102 30
1 2103 20
1 2104 40
1 2105 20
2 2101 15
2 2102 20
2 2103 10
3 2104 30
3 2105 40
;
Run;
proc sort data=Helpmonths; by month;
Run;
proc sort data=have; by month;
Run;
Data wanted ;
Retain ID month Y;
merge Helpmonths(in=a)have(in=b);
by month;
if a;
Run;
proc sort data=wanted;by ID month;Run;
Hello
I wrote good solution and I wonder to see also other solutions, thanks
Data have;
input id month Y;
cards;
1 2101 10
1 2102 30
1 2103 20
1 2104 40
1 2105 20
2 2101 15
2 2102 20
2 2103 10
3 2104 30
3 2105 40
;
Run;
Data Helpmonths;
input month;
cards;
2012
2101
2102
2103
2104
2105
2106
2107
2108
;
run;
PROC SQL;
create table Help2_tbl as
select a.month,b.ID
from Helpmonths as a,
(select distinct ID from have) as b
order by b.ID,a.month
;
QUIT;
proc sort data=Help2_tbl; by ID month;
Run;
proc sort data=have; by ID month;
Run;
Data wanted ;
Retain ID month Y;
merge Help2_tbl(in=a)have(in=b);
by ID month;
if a;
Run;
proc sort data=wanted;by ID month;Run;
I agree with @Kurt_Bremser comment that this is a situation in which SQL implementation of cartesian joins is a syntactical advantage.
But of course, cartesian join essentially means comparing every row in the left table against every row in the right table, which can be costly if the tables are big.
If data set HAVE is sorted by ID/MONTH, you can add some complexity to the program to avoid the cartesian comparison and possibly save significant resources.
/*First, build a comma-separated list of valid month values in macrovar MONTHS_LIST*/
proc sql noprint;
select month into :months_list separated by ',' from helpmonths;
%let n_months=&sqlobs;
quit;
%put &=n_months;
%put &=months_list;
/* Now read each ID twice, first time to identify found month, */
/* second time to output records*/
data want (drop=_:);
array mlist {&n_months} _temporary_ (&months_list);
array _found{&n_months} $1 ;
do until (last.id);
retain _sentinel1 .;
set have;
by id;
retain _sentinel2 .;
_found{whichn(month,of mlist{*})}='Y';
end;
_temp_id=id;
do _m=1 to dim(mlist);
if _found{_m}='Y' then set have;
else call missing(of _sentinel1--_sentinel2);
id=_temp_id;
month=mlist{_m};
output;
end;
run;
This is where SQL and its ability to build cartesian joins comes into its own:
proc sql;
create table all as
select distinct have.id, helpmonths.month
from have, helpmonths
;
create table want as
select
all.id,
all.month,
have.y
from all left join have
on all.id = have.id and all.month = have.month
order by id, month
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.