BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sas_Forum
Calcite | Level 5

Hi i am having data like this

data ;
input no year sal;
cards;
10 2001 100
10 2002 102
11 2001 90
11 2002 12
11 2003 13
23 2002  100
23 2003 112
run;

the this i want is if the no is there and if missing in the years 2001,2002,2003 then we should create new entry of the saame
no with then next value

output

no year sal  new_entry
10 2001 100
10 2002 102
10 2003 102   y
11 2001 90
11 2002 12
11 2003 13
23 2001  100   y
23 2002  100
23 2003 112


in  this the new _entry ="Y' means some of the year data is missing so we should give the next data for no 10 the 2003 is missing the data is picked for 2002
and for no 23 2001 is missing the data is picked from 2002.

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

Good point about my look ahead that may not be far enough.  If I wasn't too worried about performance I would just sort descending and use LOCF to get NOCB.

data sal;
   input no year sal;
   cards;
10 2001 100
10 2002 102
11 2001 90
11 2002 12
11 2003 13
23 2002 100
23 2003 112
   run;
proc summary data=sal nway completetypes;
  
class no year;
   output out=sal2(drop=_type_) idgroup(out(sal)=);
   run;
data sal3; *LOCF;
  
update sal2(obs=0) sal2 end=eof;
   by no;
   output;
  
run;
proc sort data=sal3;
   key no;
  
key year / descending;
  
run;
data sal4; *NOCB;
  
update sal3(obs=0) sal3;
   by no;
   output;
  
run;

View solution in original post

6 REPLIES 6
AncaTilea
Pyrite | Level 9

Hi!

Here is a piece of code that gets what you want but not very elegantly:smileyplain:

/*read in your data*/

data in ;

input no year sal;

cards;

10 2001 100

10 2002 102

11 2001 90

11 2002 12

11 2003 13

23 2002  100

23 2003 112

run;

/*I want to change data from 'long' to 'wide*/

/*first need to sort*/

proc sort data = in;by no;run;

data temp;

    set in;

    by no;

     /*create a couple of arrays, so now we have a column for each year and each sal*/

    keep no yr2001-yr2003 t_sal2001-t_sal2003;

    retain yr2001-yr2003 t_sal2001-t_sal2003;

    array yr(2001:2003) yr2001-yr2003;

    array t_sal(2001:2003) t_sal2001-t_sal2003;

    if first.no then do;

        do i = 2001 to 2003;

            yr(i) = .;

            t_sal(i) = .;

        end;

    end;

    yr(year) = year;

    t_sal(year) = sal;

if last.no then output;

run;

/*using a couple of IF-THEN we fill in the missing year*/

data temp_1;

    set temp;

    if yr2003 = . then do;

        if yr2002 ne . then yr2003 = yr2002;

        if t_sal2002 ne . then t_sal2003 = t_sal2002;

        n_entry2003 = "y";

    end;

    if yr2001 = . then do;

        if yr2002 ne . then yr2001 = yr2002;

        if t_sal2002 ne . then t_sal2001 = t_sal2002;

        n_entry2001 = "y";

    end;

run;

/*re-organize the data to long*/

data want;

    set temp_1;

    array yr(2001:2003) yr2001-yr2003;

    array t_sal(2001:2003) t_sal2001-t_sal2003;

    array n_entry(2001:2003) n_entry2001-n_entry2003;

    do i = 2001 to 2003;

        year = yr(i);

        sal = t_sal(i);

        new_entry = n_entry(i);

        output;

    end;

    drop yr: t_sal: n_entry:;

run;

There are probably more compact, efficient way to accomplish this...but this should get you started.

Good luck!

Anca.

UrvishShah
Fluorite | Level 6

Hi ,

try the following code...

proc sort data = test;

      by no year;

run;

/* Identify the missing year */

proc sql;

      create table test1 as

      select no,year,count(sal) as sal

      from test

      group by 1

      having sal < 3;

quit;

/* Insert the row for missing years */

proc sql;

      insert into test

      set no = 10,

            year = 2003,

            sal = 102

      set no = 23,

            year = 2001,

           sal = 100;

quit;

/* And finally add the New Entry column for missing years */

data test;

       set test;

       if no = 10 and year = 2003 then new_entry = 'Y;

       if no = 23 and year = 2001 then new_entry = 'Y;

run;

proc sort data = test;

      by no year;

run;

you can also take the missing year values in macro variables from the test1 dataset and automate the process...

Thanks

DBailey
Lapis Lazuli | Level 10

If you don't mind creating a tmp table to hold all of the possible records:

data have;

input no year sal;

cards;

10 2001 100

10 2002 102

11 2001 90

11 2002 12

11 2003 13

23 2002  100

23 2003 112

run;

data years;

input year;

cards;

2001

2002

2003

;

run;

proc sql;

create table work.tmp as

select

    tmp.no

    ,tmp.year

    ,have.sal

from

    (select distinct

        t1.no

        ,t2.year

    from have t1, years t2) tmp

    left outer join have

        on tmp.no=have.no and tmp.year=have.year

order by tmp.no, tmp.year;

quit;

data want(keep=no year sal);

set tmp nobs=_Nobs;

by no year;

sal_prev = lag(sal);

sal_next=.;

RecPointer=_N_+1;

if sal=. then do;

    if first.no then do;

        /* get next sal*/

        if RecPointer < _Nobs then do;

            set tmp (keep=sal rename=(sal=sal_next)) point=RecPointer;

            sal=sal_next;

            end;

        end;

    else sal = sal_prev;

end;

run;

data_null__
Jade | Level 19

First you need to create the missing years.  There are many ways to do this and SAS has some features to help make it easy.  I use PROC SUMMARY COMPLETETYPES to add the missing years.  Then the problems is a simple LOCF and Next obs carried back (NOCB?)

data sal;
   input no year sal;
   cards;
10 2001 100
10 2002 102
11 2001 90
11 2002 12
11 2003 13
23 2002 100
23 2003 112
   run;
proc summary data=sal nway completetypes;
  
class no year;
   output out=sal2(drop=_type_) idgroup(out(sal)=);
   run;
data sal3;
   update sal2(obs=0) sal2 end=eof;
   by no;
   if not eof then set sal2(firstobs=2 keep=sal rename=(sal=nextsal));
   if first.no and not last.no and missing(sal) then sal=nextsal;
  
output;
  
drop nextsal;
   run;
You could drop _FREQ_ if you like.  When it is 0 it is an indication the observation has been LOCF or NOCB.
Keith
Obsidian | Level 7

Nice answer, I was working on something very similar (using PROC SUMMARY with COMPLETETYPES).  The bit I was struggling with to code relatively simply was missing imputation and coping with both lag and lead values (the lag part is easy). 

The only problem I can see with your solution is if there is only a value for 2003 in the example data (i.e. 2001 and 2002 are missing).  Using firstobs=2 will still return a missing value here.  Ideally you would want to read sequential records until a value is encountered, then return that.  The nature of the data means that it is not possible to go to the next BY group before a non-missing value occurs, so that isn't a worry.

data_null__
Jade | Level 19

Good point about my look ahead that may not be far enough.  If I wasn't too worried about performance I would just sort descending and use LOCF to get NOCB.

data sal;
   input no year sal;
   cards;
10 2001 100
10 2002 102
11 2001 90
11 2002 12
11 2003 13
23 2002 100
23 2003 112
   run;
proc summary data=sal nway completetypes;
  
class no year;
   output out=sal2(drop=_type_) idgroup(out(sal)=);
   run;
data sal3; *LOCF;
  
update sal2(obs=0) sal2 end=eof;
   by no;
   output;
  
run;
proc sort data=sal3;
   key no;
  
key year / descending;
  
run;
data sal4; *NOCB;
  
update sal3(obs=0) sal3;
   by no;
   output;
  
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1676 views
  • 7 likes
  • 6 in conversation