BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello,

I have 2 data sets :

Data set "Have" with 3 columns: ID, month, Y

Data set Help_tbl with one column: month

I want to add for each ID in data set "Have" rows for months that are not existing there.

For example:

For ID=1 will add 4 rows with months  values : 2012, 2106,2107,2108

For ID=2 will add 4 rows with months  values : 2012, 2104,2105,2106,2107,2108

For ID=3 will add 4 rows with months  values : 2012, 2101,2102,2103,2106,2107,2108

The values of Y available will be null in the rows that I added.

 

Wanted data set is :

Ronein_0-1629975603293.png

 

The following code that I wrote doesn't give the desired result(As you can see)

Any help please?

 

 

 


Data Helpmonths;
input month;
cards;
2012
2101
2102
2103
2104
2105
2106
2107
2108
;
run;

Data have;
input id month Y;
cards;
1 2101 10
1 2102 30
1 2103 20
1 2104 40
1 2105 20
2 2101 15
2 2102 20
2 2103 10
3 2104 30
3 2105 40
;
Run;

proc sort data=Helpmonths; by  month;
Run;
proc sort data=have; by  month;
Run;
Data wanted  ;
Retain ID month  Y;
merge Helpmonths(in=a)have(in=b);
by month;
if a;
Run;
proc sort data=wanted;by ID month;Run;

 

 

 

 

 

3 REPLIES 3
Ronein
Meteorite | Level 14

Hello

I wrote good solution and I wonder to see also other solutions, thanks

 

Data have;
input id month Y;
cards;
1 2101 10
1 2102 30
1 2103 20
1 2104 40
1 2105 20
2 2101 15
2 2102 20
2 2103 10
3 2104 30
3 2105 40
;
Run; 


Data Helpmonths;
input month;
cards;
2012
2101
2102
2103
2104
2105
2106
2107
2108
;
run;
 

PROC SQL;
	create table Help2_tbl  as
	select a.month,b.ID 	   
	from  Helpmonths as a,
	(select distinct ID from have) as b
	order by b.ID,a.month
;
QUIT;



proc sort data=Help2_tbl; by  ID month;
Run;
proc sort data=have; by  ID month;
Run;
Data wanted  ;
Retain ID month  Y;
merge Help2_tbl(in=a)have(in=b);
by ID month;
if a;
Run;
proc sort data=wanted;by ID month;Run;
mkeintz
PROC Star

I agree with @Kurt_Bremser comment that this is a situation in which SQL implementation of cartesian joins is a syntactical advantage.

 

But of course, cartesian join essentially means comparing every row in the left table against every row in the right table, which can be costly if the tables are big.

 

If data set HAVE is sorted by ID/MONTH, you can add some complexity to the program to avoid the cartesian comparison and possibly save significant resources.

 

 

/*First, build a comma-separated list of valid month values in macrovar MONTHS_LIST*/
proc sql noprint;
  select month into :months_list separated by ',' from helpmonths;
  %let n_months=&sqlobs;
quit;
%put &=n_months;
%put &=months_list;


/* Now read each ID twice, first time to identify found month, */
/* second time to output records*/
data want (drop=_:);

  array mlist {&n_months} _temporary_ (&months_list);
  array _found{&n_months} $1 ;

  do until (last.id);
    retain _sentinel1 .;
    set have;
    by id;
    retain _sentinel2 .;
 
    _found{whichn(month,of mlist{*})}='Y'; 
  end;

  _temp_id=id;
  do _m=1 to dim(mlist);
    if _found{_m}='Y' then set have;
    else call  missing(of _sentinel1--_sentinel2);
    id=_temp_id;
    month=mlist{_m};
    output;    
  end;
run;

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User

This is where SQL and its ability to build cartesian joins comes into its own:

proc sql;
create table all as
  select distinct have.id, helpmonths.month
  from have, helpmonths
;
create table want as
  select
    all.id,
    all.month,
    have.y
  from all left join have
  on all.id = have.id and all.month = have.month
  order by id, month
;
quit;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 996 views
  • 2 likes
  • 3 in conversation