Hi i am having data like this
data ;
input no year sal;
cards;
10 2001 100
10 2002 102
11 2001 90
11 2002 12
11 2003 13
23 2002 100
23 2003 112
run;
the this i want is if the no is there and if missing in the years 2001,2002,2003 then we should create new entry of the saame
no with then next value
output
no year sal new_entry
10 2001 100
10 2002 102
10 2003 102 y
11 2001 90
11 2002 12
11 2003 13
23 2001 100 y
23 2002 100
23 2003 112
in this the new _entry ="Y' means some of the year data is missing so we should give the next data for no 10 the 2003 is missing the data is picked for 2002
and for no 23 2001 is missing the data is picked from 2002.
Good point about my look ahead that may not be far enough. If I wasn't too worried about performance I would just sort descending and use LOCF to get NOCB.
Hi!
Here is a piece of code that gets what you want but not very elegantly:smileyplain:
/*read in your data*/
data in ;
input no year sal;
cards;
10 2001 100
10 2002 102
11 2001 90
11 2002 12
11 2003 13
23 2002 100
23 2003 112
run;
/*I want to change data from 'long' to 'wide*/
/*first need to sort*/
proc sort data = in;by no;run;
data temp;
set in;
by no;
/*create a couple of arrays, so now we have a column for each year and each sal*/
keep no yr2001-yr2003 t_sal2001-t_sal2003;
retain yr2001-yr2003 t_sal2001-t_sal2003;
array yr(2001:2003) yr2001-yr2003;
array t_sal(2001:2003) t_sal2001-t_sal2003;
if first.no then do;
do i = 2001 to 2003;
yr(i) = .;
t_sal(i) = .;
end;
end;
yr(year) = year;
t_sal(year) = sal;
if last.no then output;
run;
/*using a couple of IF-THEN we fill in the missing year*/
data temp_1;
set temp;
if yr2003 = . then do;
if yr2002 ne . then yr2003 = yr2002;
if t_sal2002 ne . then t_sal2003 = t_sal2002;
n_entry2003 = "y";
end;
if yr2001 = . then do;
if yr2002 ne . then yr2001 = yr2002;
if t_sal2002 ne . then t_sal2001 = t_sal2002;
n_entry2001 = "y";
end;
run;
/*re-organize the data to long*/
data want;
set temp_1;
array yr(2001:2003) yr2001-yr2003;
array t_sal(2001:2003) t_sal2001-t_sal2003;
array n_entry(2001:2003) n_entry2001-n_entry2003;
do i = 2001 to 2003;
year = yr(i);
sal = t_sal(i);
new_entry = n_entry(i);
output;
end;
drop yr: t_sal: n_entry:;
run;
There are probably more compact, efficient way to accomplish this...but this should get you started.
Good luck!
Anca.
Hi ,
try the following code...
proc sort data = test;
by no year;
run;
/* Identify the missing year */
proc sql;
create table test1 as
select no,year,count(sal) as sal
from test
group by 1
having sal < 3;
quit;
/* Insert the row for missing years */
proc sql;
insert into test
set no = 10,
year = 2003,
sal = 102
set no = 23,
year = 2001,
sal = 100;
quit;
/* And finally add the New Entry column for missing years */
data test;
set test;
if no = 10 and year = 2003 then new_entry = 'Y;
if no = 23 and year = 2001 then new_entry = 'Y;
run;
proc sort data = test;
by no year;
run;
you can also take the missing year values in macro variables from the test1 dataset and automate the process...
Thanks
If you don't mind creating a tmp table to hold all of the possible records:
data have;
input no year sal;
cards;
10 2001 100
10 2002 102
11 2001 90
11 2002 12
11 2003 13
23 2002 100
23 2003 112
run;
data years;
input year;
cards;
2001
2002
2003
;
run;
proc sql;
create table work.tmp as
select
tmp.no
,tmp.year
,have.sal
from
(select distinct
t1.no
,t2.year
from have t1, years t2) tmp
left outer join have
on tmp.no=have.no and tmp.year=have.year
order by tmp.no, tmp.year;
quit;
data want(keep=no year sal);
set tmp nobs=_Nobs;
by no year;
sal_prev = lag(sal);
sal_next=.;
RecPointer=_N_+1;
if sal=. then do;
if first.no then do;
/* get next sal*/
if RecPointer < _Nobs then do;
set tmp (keep=sal rename=(sal=sal_next)) point=RecPointer;
sal=sal_next;
end;
end;
else sal = sal_prev;
end;
run;
First you need to create the missing years. There are many ways to do this and SAS has some features to help make it easy. I use PROC SUMMARY COMPLETETYPES to add the missing years. Then the problems is a simple LOCF and Next obs carried back (NOCB?)
Nice answer, I was working on something very similar (using PROC SUMMARY with COMPLETETYPES). The bit I was struggling with to code relatively simply was missing imputation and coping with both lag and lead values (the lag part is easy).
The only problem I can see with your solution is if there is only a value for 2003 in the example data (i.e. 2001 and 2002 are missing). Using firstobs=2 will still return a missing value here. Ideally you would want to read sequential records until a value is encountered, then return that. The nature of the data means that it is not possible to go to the next BY group before a non-missing value occurs, so that isn't a worry.
Good point about my look ahead that may not be far enough. If I wasn't too worried about performance I would just sort descending and use LOCF to get NOCB.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.