Add rows when month doesn't exist

Ronein · Posted 08-26-2021 07:01 AM

Hello,

I have 2 data sets :

Data set "Have" with 3 columns: ID, month, Y

Data set Help_tbl with one column: month

I want to add for each ID in data set "Have" rows for months that are not existing there.

For example:

For ID=1 will add 4 rows with months values : 2012, 2106,2107,2108

For ID=2 will add 4 rows with months values : 2012, 2104,2105,2106,2107,2108

For ID=3 will add 4 rows with months values : 2012, 2101,2102,2103,2106,2107,2108

The values of Y available will be null in the rows that I added.

Wanted data set is :

The following code that I wrote doesn't give the desired result(As you can see)

Any help please?


Data Helpmonths;
input month;
cards;
2012
2101
2102
2103
2104
2105
2106
2107
2108
;
run;

Data have;
input id month Y;
cards;
1 2101 10
1 2102 30
1 2103 20
1 2104 40
1 2105 20
2 2101 15
2 2102 20
2 2103 10
3 2104 30
3 2105 40
;
Run;

proc sort data=Helpmonths; by  month;
Run;
proc sort data=have; by  month;
Run;
Data wanted  ;
Retain ID month  Y;
merge Helpmonths(in=a)have(in=b);
by month;
if a;
Run;
proc sort data=wanted;by ID month;Run;

Ronein · Posted 08-26-2021 07:31 AM

Hello

I wrote good solution and I wonder to see also other solutions, thanks

Data have;
input id month Y;
cards;
1 2101 10
1 2102 30
1 2103 20
1 2104 40
1 2105 20
2 2101 15
2 2102 20
2 2103 10
3 2104 30
3 2105 40
;
Run; 


Data Helpmonths;
input month;
cards;
2012
2101
2102
2103
2104
2105
2106
2107
2108
;
run;
 

PROC SQL;
	create table Help2_tbl  as
	select a.month,b.ID 	   
	from  Helpmonths as a,
	(select distinct ID from have) as b
	order by b.ID,a.month
;
QUIT;



proc sort data=Help2_tbl; by  ID month;
Run;
proc sort data=have; by  ID month;
Run;
Data wanted  ;
Retain ID month  Y;
merge Help2_tbl(in=a)have(in=b);
by ID month;
if a;
Run;
proc sort data=wanted;by ID month;Run;

mkeintz · Posted 08-26-2021 10:09 AM

I agree with @Kurt_Bremser comment that this is a situation in which SQL implementation of cartesian joins is a syntactical advantage.

But of course, cartesian join essentially means comparing every row in the left table against every row in the right table, which can be costly if the tables are big.

If data set HAVE is sorted by ID/MONTH, you can add some complexity to the program to avoid the cartesian comparison and possibly save significant resources.

/*First, build a comma-separated list of valid month values in macrovar MONTHS_LIST*/
proc sql noprint;
  select month into :months_list separated by ',' from helpmonths;
  %let n_months=&sqlobs;
quit;
%put &=n_months;
%put &=months_list;


/* Now read each ID twice, first time to identify found month, */
/* second time to output records*/
data want (drop=_:);

  array mlist {&n_months} _temporary_ (&months_list);
  array _found{&n_months} $1 ;

  do until (last.id);
    retain _sentinel1 .;
    set have;
    by id;
    retain _sentinel2 .;
 
    _found{whichn(month,of mlist{*})}='Y'; 
  end;

  _temp_id=id;
  do _m=1 to dim(mlist);
    if _found{_m}='Y' then set have;
    else call  missing(of _sentinel1--_sentinel2);
    id=_temp_id;
    month=mlist{_m};
    output;    
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Kurt_Bremser · Posted 08-26-2021 07:40 AM

This is where SQL and its ability to build cartesian joins comes into its own:

proc sql;
create table all as
  select distinct have.id, helpmonths.month
  from have, helpmonths
;
create table want as
  select
    all.id,
    all.month,
    have.y
  from all left join have
  on all.id = have.id and all.month = have.month
  order by id, month
;
quit;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Add rows when month doesn't exist

Re: Add rows when month doesn't exist

Re: Add rows when month doesn't exist

Re: Add rows when month doesn't exist

Add rows when month doesn't exist

Re: Add rows when month doesn't exist

Re: Add rows when month doesn't exist

Re: Add rows when month doesn't exist

Registration is open

SAS Training: Just a Click Away