DATA Step, Macro, Functions and more

Replacing the value of duplicates in panel data

Reply
Occasional Contributor
Posts: 10

Replacing the value of duplicates in panel data

 

 

Now, I got following data structure.

 

 

 

   table1.jpg

 

 

 

What I want to do is to replace missing values of item2999 to item2999 of same id and same year but freq==A

 

so that it would be..

 

table2.png

 

 

 

Anyone know how to do this without lag operator? 

 

Many Thanks.

 

 

 

Below is the SAS code for making dataset.

 

data test;
input code year_ freq $ item2999;
cards;
1 2000 A 1934
1 2001 A 1111
1 2002 A 2425
1 2002 B .
1 2003 A 2134
1 2003 B .
1 2004 A 111
2 2000 A 2166
2 2001 A 124125
2 2002 A 34643634
2 2003 A 12321451
2 2003 B 241242
2 2004 B .
2 2004 A 484848
;
run;

PROC Star
Posts: 253

Re: Replacing the value of duplicates in panel data

something like this

proc sort data =test;
by code year_;
run;

data have;
set test;
by code year_;
retain item2999_new;
if item2999 ne . then item2999_new=item2999;
   else item2999= item2999_new;
drop item2999_new;
run;
Super User
Posts: 5,083

Re: Replacing the value of duplicates in panel data

Here's a small detail that makes a big difference in the effectiveness of various solutions.

 

Sometimes the A record appears before the B record, sometimes the other way around.  Is it OK to sort the data set by all three variables:  CODE YEAR_ FREQ ?

Occasional Contributor
Posts: 10

Re: Replacing the value of duplicates in panel data

Yes it's okey to do so.
Super User
Posts: 5,083

Re: Replacing the value of duplicates in panel data

OK, similar to what was already suggested:

 

proc sort data=have;

by code year_ freq;

run;

 

data want;

set have;

by code year_;

retain replacement_value;

if first.year_ then replacement_value = item2999;

else if item_2999=. then item_2999 = replacement_value;

drop replacement_value;

run;

 

There are safeguards that could complicate the program (not included here).  For example, what if a year contains two B records, but no A records.  Should the first B record supply a value to replace a missing second B record?  I skipped that sort of complication.

PROC Star
Posts: 172

Re: Replacing the value of duplicates in panel data

And if you want to avoid sorting:-

 

data test;

input code year_ freq $ item2999;

cards;

1 2000 A 1934

1 2001 A 1111

1 2002 A 2425

1 2002 B .

1 2003 A 2134

1 2003 B .

1 2004 A 111

2 2000 A 2166

2 2001 A 124125

2 2002 A 34643634

2 2003 A 12321451

2 2003 B 241242

2 2004 B .

2 2004 A 484848

;

run;

 

data want;

   if _N_ = 1 then do;

   if 0 then set test;

     declare hash h(dataset: "test(where=(not missing(item2999))keep=code year_ item2999");

       h.defineKey('code','year_');

      h.defineData('item2999');

      h.defineDone();

   end;

set test;

if missing(item2999) then h.find();

run;

 

Regards,

Naveen Srinivasan

 

PS if you are familiar with Hashes and know how to maintain and modify, you'll be fine otherwise just ignore. Cheers

Ask a Question
Discussion stats
  • 5 replies
  • 160 views
  • 0 likes
  • 4 in conversation