Dear All,
I am having a data set like below :
id | scn | year | value |
89 | 1 | 168 | 100000 |
89 | 1 | 169 | 0 |
89 | 1 | 169 | 111111 |
89 | 1 | 171 | 0 |
89 | 1 | 171 | 0 |
89 | 1 | 172 | 100003 |
89 | 1 | 179 | 100004 |
89 | 1 | 180 | 0 |
89 | 2 | 168 | 100006 |
89 | 2 | 169 | 0 |
89 | 2 | 172 | 100008 |
The columns id,scn and year is sorted in ascending order, i mean all are group by.
year column should be consecutive numbers, but here some year are missing, I need to add consecutive numbers in between with value as 0.
i need to do this in data step becuase it is very huge data, multiple steps will cause performance issue.
The output should lokks like :
id | scn | year | value |
89 | 1 | 168 | 100000 |
89 | 1 | 169 | 0 |
89 | 1 | 169 | 111111 |
89 | 1 | 170 | 0 |
89 | 1 | 171 | 0 |
89 | 1 | 171 | 0 |
89 | 1 | 172 | 100003 |
89 | 1 | 173 | 0 |
89 | 1 | 174 | 0 |
89 | 1 | 175 | 0 |
89 | 1 | 176 | 0 |
89 | 1 | 177 | 0 |
89 | 1 | 178 | 0 |
89 | 1 | 179 | 100004 |
89 | 1 | 180 | 0 |
89 | 2 | 168 | 100006 |
89 | 2 | 169 | 0 |
89 | 2 | 170 | 0 |
89 | 2 | 171 | 0 |
89 | 2 | 172 | 100008 |
Thanks in Advance,
Chithra
In that case, you're better of using @ErikLund_Jensen 's method of lookahead, as it already uses by-processing:
%let start=167;
data want (drop=_year _value nextyear);
set have;
by id scn;
nextrec = _N_ + 1;
if first.scn then do;
_year = year;
_value = value;
do year = &start to year - 1;
value = 0;
output;
end;
value = _value;
year = _year;
end;
output;
if not last.scn then do;
set have (keep=year rename=(year=nextyear)) point=nextrec;
do year = year + 1 to nextyear - 1;
value = 0;
output;
end;
end;
run;
Create a "look-ahead":
data have;
infile datalines dlm='09'x dsd;
input id $ scn year value;
datalines;
89 1 168 100000
89 1 169 0
89 1 169 111111
89 1 171 0
89 1 171 0
89 1 172 100003
89 1 179 100004
89 1 180 0
89 2 168 100006
89 2 169 0
89 2 172 100008
;
run;
data want;
merge
have
have (
firstobs=2
keep=id scn year
rename=(
id=n_id
scn=n_scn
year=n_year
)
)
;
output;
if n_id = id and n_scn = scn
then do year = year + 1 to n_year - 1;
value = 0;
output;
end;
drop n_:;
run;
proc print data=want noobs;
run;
Result:
id scn year value 89 1 168 100000 89 1 169 0 89 1 169 111111 89 1 170 0 89 1 171 0 89 1 171 0 89 1 172 100003 89 1 173 0 89 1 174 0 89 1 175 0 89 1 176 0 89 1 177 0 89 1 178 0 89 1 179 100004 89 1 180 0 89 2 168 100006 89 2 169 0 89 2 170 0 89 2 171 0 89 2 172 100008
Note how I presented your example data in a data step with datalines, so it is easy for others to recreate the dataset. Please do so in the future; help us to help you.
Many thanks.
This is working
Hi @chithra
Here is a slightly different way of doing just the same as Kurtbremser's code. It uses the point= option instead of firstobs=2. It is in no way better, I just post it because I want to promote use of the point= option, because it is so useful for look-ahead and -back, also in more complicated cases.
data want (drop= lastyear nextyear i);
set have; by id scn;
nextrec = _N_ + 1;
lastyear = year;
output;
if not last.scn then do;
set have (keep=year rename=(year=nextyear)) point=nextrec;
do i = lastyear + 1 to nextyear - 1;
year = i;
value = 0;
output;
end;
end;
run;
just one modification to this.
The firt observation can be changed, it is not constant. that is if 168 is not the first observation, the first observation is missed, that was actually 167.
that number can be found out and stored in a macro or in some other way, that is possible.(may be that is the min of the whole table etc..)
THen how can we modify this?
In that case, you're better of using @ErikLund_Jensen 's method of lookahead, as it already uses by-processing:
%let start=167;
data want (drop=_year _value nextyear);
set have;
by id scn;
nextrec = _N_ + 1;
if first.scn then do;
_year = year;
_value = value;
do year = &start to year - 1;
value = 0;
output;
end;
value = _value;
year = _year;
end;
output;
if not last.scn then do;
set have (keep=year rename=(year=nextyear)) point=nextrec;
do year = year + 1 to nextyear - 1;
value = 0;
output;
end;
end;
run;
Thank you
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.