I have a dataset spend with numeric values -
data spend;
input custid week spends;
cards;
1 1 4365
1 2 457
1 3 124
1 4 35547
2 5 4678
2 6 21253
3 7 32534
3 1 23423
3 2 23567
;
run;
i want output to look like
1 1 4365
1 2 457
1 3 124
1 4 35547
1 5 .
1 6 .
1 7 .
2 1 .
2 2 .
2 3 .
2 4.
2 5 4678
2 6 21253
2 7 .
3 1 23423
3 2 23567
3 3 .
3 4 .
3 5 .
3 6
3 7 32534
So basically i want to increment the values of week to 7 and those which don't have any values in spend should be missing.
data a;
set spend;
do j = 1 to 7;
week = j;
output;
end;
drop j;
run;
in above code cust id 1 and 3 are getting repeated 2 times and spend values are also not correct.
Please help.
The right solution depends on how to determine the wanted list of weeks.
If you know the range of weeks is always 1 to 7 then use that to generate the missing records.
data skeleton;
set spend(keep=custid week);
by custid;
if first.custid then do week=1 to 7;
output;
end;
run;
data want;
merge skeleton spend;
by custid week;
run;
If the list is more flexible then you will need to do more work to generate the skeleton dataset. For example you might want to find the maximum values of week. Or perhaps the min and the max. Or just the list of weeks that appear for any custid.
data spend;
input custid week spends;
cards;
1 1 4365
1 2 457
1 3 124
1 4 35547
2 5 4678
2 6 21253
3 7 32534
3 1 23423
3 2 23567
;
run;
%let num_of_weeks=7;
data want ;
if _n_=1 then do;
if 0 then set spend;
dcl hash H (dataset: "spend") ;
h.definekey ("custid","week") ;
h.definedata ("spends") ;
h.definedone () ;
end;
set spend(keep=custid);
by custid;
if first.custid;
do week=1 to &num_of_weeks;
if h.find() ne 0 then call missing(spends);
output;
end;
run;
A very elegant solution - i am always impressed by people who has hash objects as their first choice in the tool box. This is an interesting problem, because it can be solved in so many ways, and I had great fun experimenting with different techniques. Here are two more traditional solutions:
data spend;
input custid week spends;
cards;
1 1 4365
1 2 457
1 3 124
1 4 35547
2 5 4678
2 6 21253
3 7 32534
3 1 23423
3 2 23567
;
run;
/* first method - requires input sorted by custid */
data want1 (drop= _:);
set spend; by custid;
array w _wk1-_wk7;
array s _sp1-_sp7;
retain _wk1-_wk7 _sp1-_sp7;
* initiate arrays for weeks and spends with missing values for given customer;
if first.custid then do;
call missing(of _wk1-_wk7);
call missing(of _sp1-_sp7);
end;
* place actual week/spends in array variables;
w{week} = week;
s{week} = spends;
* output all weeks after all observations for given customer are processed;
if last.custid then do _i = 1 to 7;
week = w{_i};
spends = s{_i};
output;
end;
run;
/* second method - requires input sorted by custid and week */
proc sort data=spend; by custid week;
run;
data want2 (drop= last week spends rename=(nweek=week nspends=spends));
set spend; by custid;
retain last;
last = lag(week);
if first.custid then last = 0;
* add missing weeks before/between weeks in input;
do i = last + 1 to week - 1;
nweek = i;
nspends = .;
output;
end;
* actual observation;
nweek = week;
nspends = spends;
output;
* add missing weeks after last week in input;
if last.custid and week < 7 then do i = week + 1 to 7;
nweek = i;
nspends = .;
output;
end;
run;
Thank you Sir @ErikLund_Jensen
here's another one
data spend;
input custid week spends;
cards;
1 1 4365
1 2 457
1 3 124
1 4 35547
2 5 4678
2 6 21253
3 7 32534
3 1 23423
3 2 23567
;
run;
proc freq data=spend(keep=custid week) noprint;
tables custid*week/sparse out=temp(keep=custid week);
run;
proc sort data=spend out=_spend;
by custid week;
run;
data want;
merge temp _spend;
by custid week;
run;
This assumes OP wants the maximum week i.e 7 in one of the by group
Here's a less elegant but simple set of steps:
data spend;
input custid week spends;
cards;
1 1 4365
1 2 457
1 3 124
1 4 35547
2 5 4678
2 6 21253
3 7 32534
3 1 23423
3 2 23567
;
run;
proc sort data=spend; by custid week; run;
proc sql noprint;
select distinct custid into :custids separated by ","
from spend;
quit;
/* create empty table*/
data empty (drop=i);
length custid week 8.;
do i = 1 to countc("&custids.",",") +1;
do week=1 to 7;
custid = scan("&custids.",i,",");
output;
end;
end;
run;
/*populate values into formerly empty table*/
data want;
merge empty spend;
by custid week;
run;
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
data spend;
input custid week spends;
cards;
1 1 4365
1 2 457
1 3 124
1 4 35547
2 5 4678
2 6 21253
3 7 32534
3 1 23423
3 2 23567
;
proc sql;
create table want as
select a.*,spends
from (select * from
(select distinct custid from spend),(select distinct week from spend)
) as a left join spend as b
on a.custid=b.custid and a.week=b.week
order by 1,2;
quit;
The right solution depends on how to determine the wanted list of weeks.
If you know the range of weeks is always 1 to 7 then use that to generate the missing records.
data skeleton;
set spend(keep=custid week);
by custid;
if first.custid then do week=1 to 7;
output;
end;
run;
data want;
merge skeleton spend;
by custid week;
run;
If the list is more flexible then you will need to do more work to generate the skeleton dataset. For example you might want to find the maximum values of week. Or perhaps the min and the max. Or just the list of weeks that appear for any custid.
Thanks everyone for quick response. all solutions are working properly.
But if i had to chose one , i would chose the solution given by Tom which was very easy to understand .
Thanks to noling ,Eriklund ,novinsorin and Ksharp for taking out your precious time.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.