Solved: Re: Incrementing values of variable

adi121 · Posted 07-02-2019 11:26 AM

I have a dataset spend with numeric values -

data spend;
input custid week spends;
cards;
1 1 4365
1 2 457
1 3 124
1 4 35547
2 5 4678
2 6 21253
3 7 32534
3 1 23423
3 2 23567
;
run;

i want output to look like

1 1 4365
1 2 457
1 3 124
1 4 35547

1 5 .

1 6 .

1 7 .

2 1 .

2 2 .

2 3 .

2 4.
2 5 4678
2 6 21253

2 7 .

3 1 23423

3 2 23567

3 3 .

3 4 .

3 5 .

3 6
3 7 32534

So basically i want to increment the values of week to 7 and those which don't have any values in spend should be missing.

data a;
set spend;
do j = 1 to 7;
week = j;
output;
end;
drop j;
run;

in above code cust id 1 and 3 are getting repeated 2 times and spend values are also not correct.

Please help.

Tom · Posted 07-03-2019 10:22 AM

The right solution depends on how to determine the wanted list of weeks.

If you know the range of weeks is always 1 to 7 then use that to generate the missing records.

data skeleton;
  set spend(keep=custid week);
  by custid;
  if first.custid then do week=1 to 7;
    output;
  end;
run;
data want;
  merge skeleton spend;
  by custid week;
run;

If the list is more flexible then you will need to do more work to generate the skeleton dataset. For example you might want to find the maximum values of week. Or perhaps the min and the max. Or just the list of weeks that appear for any custid.

View solution in original post

novinosrin · Posted 07-02-2019 11:37 AM

data spend;
input custid week spends;
cards;
1 1 4365
1 2 457
1 3 124
1 4 35547
2 5 4678
2 6 21253
3 7 32534
3 1 23423
3 2 23567
;
run;
%let num_of_weeks=7;
data want ;
if _n_=1 then do;
	if 0 then set spend;
   dcl hash H (dataset: "spend") ;
   h.definekey  ("custid","week") ;
   h.definedata ("spends") ;
   h.definedone () ;
   end;
set spend(keep=custid);
by custid;
if first.custid;
do week=1 to &num_of_weeks;
if h.find() ne 0 then call missing(spends);
output;
end;
run;

ErikLund_Jensen · Posted 07-02-2019 02:25 PM

Hi @adi121 @novinosrin

A very elegant solution - i am always impressed by people who has hash objects as their first choice in the tool box. This is an interesting problem, because it can be solved in so many ways, and I had great fun experimenting with different techniques. Here are two more traditional solutions:

data spend;
input custid week spends;
cards;
1 1 4365
1 2 457
1 3 124
1 4 35547
2 5 4678
2 6 21253
3 7 32534
3 1 23423
3 2 23567
;
run;

/* first method - requires input sorted by custid */
data want1 (drop= _:); 
	set spend; by custid;
	array w _wk1-_wk7;
	array s _sp1-_sp7;
	retain _wk1-_wk7 _sp1-_sp7;

	* initiate arrays for weeks and spends with missing values for given customer;
	if first.custid then do;
		call missing(of _wk1-_wk7);
		call missing(of _sp1-_sp7);
	end;

	* place actual week/spends in array variables;
	w{week} = week;
	s{week} = spends;

	* output all weeks after all observations for given customer are processed;
	if last.custid then do _i = 1 to 7;
		week = w{_i};
		spends = s{_i};
		output;
	end;
run;

/* second method - requires input sorted by custid and week */
proc sort data=spend; by custid week;
run;

data want2 (drop= last week spends rename=(nweek=week nspends=spends)); 
	set spend; by custid;
	retain last;
	last = lag(week);
	if first.custid then last = 0;

	* add missing weeks before/between weeks in input;
	do i = last + 1 to week - 1;
		nweek = i;
		nspends = .;
		output;
	end;

	* actual observation; 
	nweek = week;
	nspends = spends;
	output;

	* add missing weeks after last week in input;
	if last.custid and week < 7 then do i = week + 1 to 7;
		nweek = i;
		nspends = .;
		output;
	end;
run;

novinosrin · Posted 07-02-2019 03:21 PM

Thank you Sir @ErikLund_Jensen

here's another one


data spend;
input custid week spends;
cards;
1 1 4365
1 2 457
1 3 124
1 4 35547
2 5 4678
2 6 21253
3 7 32534
3 1 23423
3 2 23567
;
run;

proc freq data=spend(keep=custid week) noprint;
tables custid*week/sparse out=temp(keep=custid week);
run;

proc sort data=spend out=_spend;
by custid week;
run;

data want;
merge temp _spend;
by custid week;
run;

This assumes OP wants the maximum week i.e 7 in one of the by group

noling · Posted 07-02-2019 03:01 PM

Here's a less elegant but simple set of steps:

data spend;
input custid week spends;
cards;
1 1 4365
1 2 457
1 3 124
1 4 35547
2 5 4678
2 6 21253
3 7 32534
3 1 23423
3 2 23567
;
run;
proc sort data=spend; by custid week; run; 
proc sql noprint;
	select distinct custid into :custids separated by ","
	from spend;
quit;
/* create empty table*/
data empty (drop=i);
	length custid week 8.;
	do i = 1 to countc("&custids.",",") +1;
		do week=1 to 7;
			custid = scan("&custids.",i,",");
			output;
		end;
	end;
run;
/*populate values into formerly empty table*/
data want;
	merge empty spend;
	by custid week;
run;

Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

Ksharp · Posted 07-03-2019 07:59 AM

data spend;
input custid week spends;
cards;
1 1 4365
1 2 457
1 3 124
1 4 35547
2 5 4678
2 6 21253
3 7 32534
3 1 23423
3 2 23567
;
proc sql;
create table want as
 select a.*,spends
  from (select * from
(select distinct custid from spend),(select distinct week from spend)
) as a left join spend as b
on a.custid=b.custid and a.week=b.week
order by 1,2;
quit;

Tom · Posted 07-03-2019 10:22 AM

The right solution depends on how to determine the wanted list of weeks.

If you know the range of weeks is always 1 to 7 then use that to generate the missing records.

data skeleton;
  set spend(keep=custid week);
  by custid;
  if first.custid then do week=1 to 7;
    output;
  end;
run;
data want;
  merge skeleton spend;
  by custid week;
run;

If the list is more flexible then you will need to do more work to generate the skeleton dataset. For example you might want to find the maximum values of week. Or perhaps the min and the max. Or just the list of weeks that appear for any custid.

adi121 · Posted 07-20-2019 02:16 PM

Thanks everyone for quick response. all solutions are working properly.

But if i had to chose one , i would chose the solution given by Tom which was very easy to understand .

Thanks to noling ,Eriklund ,novinsorin and Ksharp for taking out your precious time.

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away