I have the following dataset:
DATA have;
INPUT person_id item_id year amount person_name $ item_name $ document_id ;
CARDS;
1001 2839 1999 3500 john steel 1305203434434
1001 3242 1999 400 john wood 1305203955327
1001 98223 1999 4 john fire 1305203820220
1001 3242 2000 5 john wood 1305204040897
1001 54233 2000 500 john bronze 1305204063236
1001 98223 2000 540 john fire 1305203399872
1001 7293 2001 230 john silver 1305203821970
1001 2839 2003 30 john steel 1305203457561
1001 3242 2003 23 john wood 1305207077153
1001 7293 2003 24 john silver 1305203683368
1001 54233 2003 55 john bronze 1305203236260
1001 98223 2003 90 john fire 1305203866126
1002 923 1987 44 sam platinum 1305204074470
1002 3232 1987 45 sam zinc 1305203358611
1002 923 2002 76 sam platinum 1305203358698
1002 3232 2002 245 sam zinc 1305203276427
1003 2839 2010 34 tom steel 1305203276403
1003 3232 2014 22 tom zinc 1305203463195
1003 923 2015 96 tom platinum 1305203714656
1003 923 2015 450 tom platinum 1301150262107
;
RUN;
I want to achieve this dataset:
DATA want;
INPUT person_id item_id year amount person_name $ item_name $ document_id ;
CARDS;
1001 2839 1999 3500 john steel 1305203434434
1001 3242 1999 400 john wood 1305203955327
1001 7293 1999 0 john silver .
1001 54233 1999 0 john bronze .
1001 98223 1999 4 john fire 1305203820220
1001 2839 2000 0 john steel .
1001 3242 2000 5 john wood 1305204040897
1001 7293 2000 0 john silver .
1001 54233 2000 500 john bronze 1305204063236
1001 98223 2000 540 john fire 1305203399872
1001 2839 2001 0 john steel .
1001 3242 2001 0 john wood .
1001 7293 2001 230 john silver 1305203821970
1001 54233 2001 0 john bronze .
1001 98223 2001 0 john fire .
1001 2839 2003 30 john steel 1305203457561
1001 3242 2003 23 john wood 1305207077153
1001 7293 2003 24 john silver 1305203683368
1001 54233 2003 55 john bronze 1305203236260
1001 98223 2003 90 john fire 1305203866126
1002 923 1987 44 sam platinum 1305204074470
1002 3232 1987 45 sam zinc 1305203358611
1002 923 2002 76 sam platinum 1305203358698
1002 3232 2002 245 sam zinc 1305203276427
1003 923 2010 0 tom platinum .
1003 2839 2010 34 tom steel 1305203276403
1003 3232 2010 0 tom zinc .
1003 923 2014 0 tom platinum .
1003 2839 2014 0 tom steel .
1003 3232 2014 22 tom zinc 1305203463195
1003 923 2015 96 tom platinum 1305203714656
1003 923 2015 450 tom platinum 1301150262107
1003 2839 2015 0 tom steel .
1003 3232 2015 0 tom zinc .
;
RUN;
The newly inserted observations are given in red as in the following picture:
The rule is quite simple: Fix a person_id. In a given year, if an item_id appears in some other year but does not appear in the given year, then fill in an observation for that given year with amount set to 0 while copying all other variables (person_name and item_name) but leave document_id empty.
For instance, fix person_id = 1001. If we look at year 1999, there are three item_id: 2839, 3242, and 98223. However, we can see that item_id 7293 and 54233 appears in other years but do not appear in year 1999. So we make two observations for these two item_id's in year 1999 but set amount = 0. Likewise, in year 2001, there is only one item_id = 7293, but we can see that four other item_ids (2839, 3242, 54233, 98223) appear in other years, so we make four observations for these four item_ids in year 2001 but with amount = 0. Then apply this rule across all person_id's to obtain the desired output.
I initially thought to create a list of all unique item_id's for a given person_id across all years, but then I am unsure of how to use that list to fill in observations with amount = 0. Any help would be appreciated!
Try this:
proc sql;
create table template as
select distinct
a.person_id,
a.item_id,
a.item_name,
a.person_name,
b.year
from have a, have b
where a.person_id = b.person_id
;
create table want as
select
a.person_id,
a.item_id,
a.year,
case
when b.amount = . then 0
else b.amount
end as amount,
a.person_name,
a.item_name,
b.document_id
from template a left join have b
on a.person_id = b.person_id and a.item_id = b.item_id and a.year = b.year
;
quit;
Try this:
proc sql;
create table template as
select distinct
a.person_id,
a.item_id,
a.item_name,
a.person_name,
b.year
from have a, have b
where a.person_id = b.person_id
;
create table want as
select
a.person_id,
a.item_id,
a.year,
case
when b.amount = . then 0
else b.amount
end as amount,
a.person_name,
a.item_name,
b.document_id
from template a left join have b
on a.person_id = b.person_id and a.item_id = b.item_id and a.year = b.year
;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.