Solved: Re: Filling in observations to make a balanced panel

TrueTears · Posted 10-21-2020 07:30 AM

I have the following dataset:


DATA have; 
  INPUT person_id	item_id	year	amount	person_name $	item_name $	document_id ; 
CARDS; 
1001	2839	1999	3500	john	steel	1305203434434
1001	3242	1999	400	john	wood	1305203955327
1001	98223	1999	4	john	fire	1305203820220
1001	3242	2000	5	john	wood	1305204040897
1001	54233	2000	500	john	bronze	1305204063236
1001	98223	2000	540	john	fire	1305203399872
1001	7293	2001	230	john	silver	1305203821970
1001	2839	2003	30	john	steel	1305203457561
1001	3242	2003	23	john	wood	1305207077153
1001	7293	2003	24	john	silver	1305203683368
1001	54233	2003	55	john	bronze	1305203236260
1001	98223	2003	90	john	fire	1305203866126
1002	923	1987	44	sam	platinum	1305204074470
1002	3232	1987	45	sam	zinc	1305203358611
1002	923	2002	76	sam	platinum	1305203358698
1002	3232	2002	245	sam	zinc	1305203276427
1003	2839	2010	34	tom	steel	1305203276403
1003	3232	2014	22	tom	zinc	1305203463195
1003	923	2015	96	tom	platinum	1305203714656
1003	923	2015	450	tom	platinum	1301150262107

; 
RUN;

I want to achieve this dataset:


DATA want; 
  INPUT person_id	item_id	year	amount	person_name $	item_name $	document_id ; 
CARDS; 
1001	2839	1999	3500	john	steel	1305203434434
1001	3242	1999	400	john	wood	1305203955327
1001	7293	1999	0	john	silver	.
1001	54233	1999	0	john	bronze	.
1001	98223	1999	4	john	fire	1305203820220
1001	2839	2000	0	john	steel	.
1001	3242	2000	5	john	wood	1305204040897
1001	7293	2000	0	john	silver	.
1001	54233	2000	500	john	bronze	1305204063236
1001	98223	2000	540	john	fire	1305203399872
1001	2839	2001	0	john	steel	.
1001	3242	2001	0	john	wood	.
1001	7293	2001	230	john	silver	1305203821970
1001	54233	2001	0	john	bronze	.
1001	98223	2001	0	john	fire	.
1001	2839	2003	30	john	steel	1305203457561
1001	3242	2003	23	john	wood	1305207077153
1001	7293	2003	24	john	silver	1305203683368
1001	54233	2003	55	john	bronze	1305203236260
1001	98223	2003	90	john	fire	1305203866126
1002	923	1987	44	sam	platinum	1305204074470
1002	3232	1987	45	sam	zinc	1305203358611
1002	923	2002	76	sam	platinum	1305203358698
1002	3232	2002	245	sam	zinc	1305203276427
1003	923	2010	0	tom	platinum	.
1003	2839	2010	34	tom	steel	1305203276403
1003	3232	2010	0	tom	zinc	.
1003	923	2014	0	tom	platinum	.
1003	2839	2014	0	tom	steel	.
1003	3232	2014	22	tom	zinc	1305203463195
1003	923	2015	96	tom	platinum	1305203714656
1003	923	2015	450	tom	platinum	1301150262107
1003	2839	2015	0	tom	steel	.
1003	3232	2015	0	tom	zinc	.
; 
RUN;

The newly inserted observations are given in red as in the following picture:

The rule is quite simple: Fix a person_id. In a given year, if an item_id appears in some other year but does not appear in the given year, then fill in an observation for that given year with amount set to 0 while copying all other variables (person_name and item_name) but leave document_id empty.

For instance, fix person_id = 1001. If we look at year 1999, there are three item_id: 2839, 3242, and 98223. However, we can see that item_id 7293 and 54233 appears in other years but do not appear in year 1999. So we make two observations for these two item_id's in year 1999 but set amount = 0. Likewise, in year 2001, there is only one item_id = 7293, but we can see that four other item_ids (2839, 3242, 54233, 98223) appear in other years, so we make four observations for these four item_ids in year 2001 but with amount = 0. Then apply this rule across all person_id's to obtain the desired output.

I initially thought to create a list of all unique item_id's for a given person_id across all years, but then I am unsure of how to use that list to fill in observations with amount = 0. Any help would be appreciated!

Kurt_Bremser · Posted 10-21-2020 08:36 AM

Try this:

proc sql;
create table template as
  select distinct
    a.person_id,
    a.item_id,
    a.item_name,
    a.person_name,
    b.year
  from have a, have b
  where a.person_id = b.person_id
;
create table want as
  select
    a.person_id,
    a.item_id,
    a.year,
    case
      when b.amount = . then 0
      else b.amount
    end as amount,
    a.person_name,
    a.item_name,
    b.document_id
  from template a left join have b
  on a.person_id = b.person_id and a.item_id = b.item_id and a.year = b.year
;
quit;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 10-21-2020 08:36 AM

Try this:

proc sql;
create table template as
  select distinct
    a.person_id,
    a.item_id,
    a.item_name,
    a.person_name,
    b.year
  from have a, have b
  where a.person_id = b.person_id
;
create table want as
  select
    a.person_id,
    a.item_id,
    a.year,
    case
      when b.amount = . then 0
      else b.amount
    end as amount,
    a.person_name,
    a.item_name,
    b.document_id
  from template a left join have b
  on a.person_id = b.person_id and a.item_id = b.item_id and a.year = b.year
;
quit;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Filling in observations to make a balanced panel

Re: Filling in observations to make a balanced panel

Re: Filling in observations to make a balanced panel

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away