Solved: Creating Rows w/ null values based on multi-year relationships

xezus · Posted 01-06-2017 11:11 AM

Hello,

I have a dataset of about 50,000 datapoints that can be summarized as

Year ID ITEM Value

2015 223 cars 5

2015 223 hats 10

2015 224 cars 12

2015 224 bikes 2

2014 223 cars 4

2014 223 bikes 1

2014 224 cars 5

2014 224 sneakers 3

I would like to do comparisons of 2015 values to 2014 values for each ID and Item. So for ID #223 compare the number of cars in 2015 and 2014, easy enough because they both exist. But to compare the number of hats in 2015 to hats in 2014 (when there were none) is giving me problems.

I need to create a 2014 row for ID#223 for hats and place a 0 value with it. Essentially I need a set of code that will go through all of the IDs for 2014 and 2015 and create rows with 0 values for all the items they do not share between years.

It is not feasible (in terms of time it would take) to create an entire dataset with null values + all the item names and then merge them without duplicates. The time it would take to create that null value dataset would pretty much be as long as it would take me to do this whole thing by hand one at a time.

Thank you for your help in advance.

data_null__ · Posted 01-06-2017 11:31 AM

This keeps the item expansion within each ID.

data have;
   input Year ID $ ITEM $ Value;
   cards;
2015    223     cars      5
2015    223     hats      10
2015    224     cars      12
2015    224     bikes      2
2014    223     cars       4
2014    223     bikes      1
2014    224     cars       5
2014    224     sneakers    3
;;;;
   run;
proc sort data=have;
   by id;
   run;
proc print;
   run;
proc summary data=have nway completetypes;
   by id;
   class year item;
   freq value;
   output out=expanded;
   run;
proc print;
   run;

View solution in original post

data_null__ · Posted 01-06-2017 11:31 AM

This keeps the item expansion within each ID.

data have;
   input Year ID $ ITEM $ Value;
   cards;
2015    223     cars      5
2015    223     hats      10
2015    224     cars      12
2015    224     bikes      2
2014    223     cars       4
2014    223     bikes      1
2014    224     cars       5
2014    224     sneakers    3
;;;;
   run;
proc sort data=have;
   by id;
   run;
proc print;
   run;
proc summary data=have nway completetypes;
   by id;
   class year item;
   freq value;
   output out=expanded;
   run;
proc print;
   run;

xezus · Posted 01-06-2017 01:15 PM

Thank you so much for your help.

After running into some problems with zero values and negatives in the original dataset (which i ended up removing), it has all come together perfectly!

Thanks again!

Astounding · Posted 01-06-2017 11:34 AM

50,000 observations is small enough that you can process it a couple of times. For example:

proc sort data=have out=just_2_years;

by id item year;

where year in (2014, 2015);

run;

data want;

set have;

by id item;

output;

if first.item and year=2015 then do;

value=0;

year=2014;

output;

end;

else if last.item and year=2014 then do;

value=0

year=2015;

output;

end;

run;

The observations might need to be sorted again ... it depends on how you will conduct the analysis from that point.

Reeza · Posted 01-06-2017 11:37 AM

If you only have two years why bother.

You don't show how you want the output.

data have;
input Year         ID    $       ITEM   $       Value;
cards;
2015        223          cars            5
2015        223          hats            10
2015        224          cars            12
2015        224          bikes            2
2014        223          cars             4
2014        223          bikes            1
2014        224          cars              5
2014        224         sneakers       3
;
run;

proc sort data=have; by id item year; run;

data want;
set have;
by id item;
diff=dif(value);
if first.item or (first.item and last.item) then diff=.;
run;

Or flip the data to a wide format, usually not recommended, but in this case it makes sense to me.

proc transpose data=have out=flipped prefix=Y;
by id item;
var value;
id year;
run;

data want2;
set flipped;
diff= Y2015-Y2014;
run;

Creating Rows w/ null values based on multi-year relationships

Re: Creating Rows w/ null values based on multi-year relationships

Re: Creating Rows w/ null values based on multi-year relationships

Re: Creating Rows w/ null values based on multi-year relationships

Re: Creating Rows w/ null values based on multi-year relationships

Re: Creating Rows w/ null values based on multi-year relationships

Creating Rows w/ null values based on multi-year relationships

Re: Creating Rows w/ null values based on multi-year relationships

Re: Creating Rows w/ null values based on multi-year relationships

Re: Creating Rows w/ null values based on multi-year relationships

Re: Creating Rows w/ null values based on multi-year relationships

Re: Creating Rows w/ null values based on multi-year relationships

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away