Hello
For each customer there are multiple rows (Each row has information per specific month).
Data set t1 contain information of wealth per customer ID + month .
Data set t2 contain list of months .
Task: I want to add for each customer rows for months that have no data.
For example:
ID=1 have no information of months:2205,2206 so need to add 2 rows.
What is the way to create wanted data set?
Please see the way I did it and giving the requested results,
I want to ask if there is a shorter and more efficient way?
Please note that in real life t1 data set is very big (50 million rows and 20 columns)
Data t1;
input ID month wealth;
cards;
1 2201 10
1 2202 20
1 2203 30
1 2204 40
2 2201 50
2 2202 60
2 2203 70
2 2204 80
2 2205 90
2 2206 100
3 2201 110
3 2202 120
3 2204 130
3 2205 140
3 2206 150
;
run;
Data t2;
input month_;
cards;
2201
2202
2203
2204
2205
2206
;
Run;
proc sql;
create table t3 as
select a.*,b.month_
from t1 as a,t2 as b
order by ID,month,month_
;
quit;
Data t4;
set t3;
IF month ne month_ then wealth=.;
Run;
data part1(drop=month_);
set t4(where=(month=month_));
Run;
data part2;
set t4(where=(month ne month_));
Run;
proc sql;
create table part2b as
select distinct ID,month_
from part2
;
quit;
proc sql;
create table part2c(rename=(month_=month)) as
select a.*
from part2b as a
left join part1 as b
on a.month_=b.month and a.ID=b.ID
where b.ID is null
;
quit;
proc sort data=part1;by ID;Run;
proc sort data=part2c;by ID;Run;
Data want;
SET part1 part2c;
by ID;
Run;
proc sort data=want;by ID month;Run;
Do you have SAS/ETS?
ETS is the license that gives you access to Proc Timeseries and Proc Expand and so on.
Next code is tested and will do the work, supposed that t2 is not too big.
Data t2;
input month_;
cards;
2201
2202
2203
2204
2205
2206
;
Run;
proc sql noprint;
select month_ into : months separated by ','
from t2;
run;
data _null_;
mm = "&months";
nm = countw(mm);
call symput('nm',compress(put(nm,2.)));
run;
%put nm=&nm months = &months;
Data t1;
input ID month wealth;
cards;
1 2201 10
1 2202 20
1 2203 30
1 2204 40
2 2201 50
2 2202 60
2 2203 70
2 2204 80
2 2205 90
2 2206 100
3 2201 110
3 2202 120
3 2204 130
3 2205 140
3 2206 150
;
run;
data want;
set t1;
by ID;
retain m1-m&nm mmx1-mmx&nm;
array mmx {&nm} (&months) ;
array mm {&nm} m1-m&nm;
drop i m1-m&nm mmx1-mmx&nm;
if first.ID then do i=1 to &nm; mm(i)=.; end;
if month in mmx then do;
i=1;
do while (i le &nm);
if mmx(i) = month then do;
put _N_= ID= i= month= ;
mm(i)=1;
leave;
end;
i+1;
end;
output;
end;
if last.ID then do;
do i=1 to &nm;
if mm(i) =. then do;
wealth=.;
month = mmx(i);
output;
end;
end;
end;
run;
You may want to add a sort of the result file by ID month;
This is as simple as I could make it:
data want;
set t1;
by id;
if first.id then _P_=0;
do _P_=_P_+1 to _months until(month>=month_);
set t2 point=_P_ nobs=_months;
output;
end;
if last.id;
do _P_=_P_+1 to _months;
set t2 point=_P_ nobs=_months;
output;
end;
keep ID month_ wealth;
rename month_=month;
run;
You can try this:
proc sql;
create table t_1 as
select *
from t2
cross join (Select distinct Id from t1)
;
quit;
proc sql;
create table t_2 as
select a.Id, a.month_ , b.wealth
from t_1 as a
left join t1 as b
on a.month_=b.month and a.id=b.Id
order by a.ID, a.month_
;
quit;
Output T_2
ID | month_ | wealth |
1 | 2201 | 10 |
1 | 2202 | 20 |
1 | 2203 | 30 |
1 | 2204 | 40 |
1 | 2205 | . |
1 | 2206 | . |
2 | 2201 | 50 |
2 | 2202 | 60 |
2 | 2203 | 70 |
2 | 2204 | 80 |
2 | 2205 | 90 |
2 | 2206 | 100 |
3 | 2201 | 110 |
3 | 2202 | 120 |
3 | 2203 | . |
3 | 2204 | 130 |
3 | 2205 | 140 |
3 | 2206 | 150 |
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.