Hello,
I have a dataset of about 50,000 datapoints that can be summarized as
Year ID ITEM Value
2015 223 cars 5
2015 223 hats 10
2015 224 cars 12
2015 224 bikes 2
2014 223 cars 4
2014 223 bikes 1
2014 224 cars 5
2014 224 sneakers 3
I would like to do comparisons of 2015 values to 2014 values for each ID and Item. So for ID #223 compare the number of cars in 2015 and 2014, easy enough because they both exist. But to compare the number of hats in 2015 to hats in 2014 (when there were none) is giving me problems.
I need to create a 2014 row for ID#223 for hats and place a 0 value with it. Essentially I need a set of code that will go through all of the IDs for 2014 and 2015 and create rows with 0 values for all the items they do not share between years.
It is not feasible (in terms of time it would take) to create an entire dataset with null values + all the item names and then merge them without duplicates. The time it would take to create that null value dataset would pretty much be as long as it would take me to do this whole thing by hand one at a time.
Thank you for your help in advance.
This keeps the item expansion within each ID.
data have;
input Year ID $ ITEM $ Value;
cards;
2015 223 cars 5
2015 223 hats 10
2015 224 cars 12
2015 224 bikes 2
2014 223 cars 4
2014 223 bikes 1
2014 224 cars 5
2014 224 sneakers 3
;;;;
run;
proc sort data=have;
by id;
run;
proc print;
run;
proc summary data=have nway completetypes;
by id;
class year item;
freq value;
output out=expanded;
run;
proc print;
run;
This keeps the item expansion within each ID.
data have;
input Year ID $ ITEM $ Value;
cards;
2015 223 cars 5
2015 223 hats 10
2015 224 cars 12
2015 224 bikes 2
2014 223 cars 4
2014 223 bikes 1
2014 224 cars 5
2014 224 sneakers 3
;;;;
run;
proc sort data=have;
by id;
run;
proc print;
run;
proc summary data=have nway completetypes;
by id;
class year item;
freq value;
output out=expanded;
run;
proc print;
run;
Thank you so much for your help.
After running into some problems with zero values and negatives in the original dataset (which i ended up removing), it has all come together perfectly!
Thanks again!
50,000 observations is small enough that you can process it a couple of times. For example:
proc sort data=have out=just_2_years;
by id item year;
where year in (2014, 2015);
run;
data want;
set have;
by id item;
output;
if first.item and year=2015 then do;
value=0;
year=2014;
output;
end;
else if last.item and year=2014 then do;
value=0
year=2015;
output;
end;
run;
The observations might need to be sorted again ... it depends on how you will conduct the analysis from that point.
If you only have two years why bother.
You don't show how you want the output.
data have;
input Year ID $ ITEM $ Value;
cards;
2015 223 cars 5
2015 223 hats 10
2015 224 cars 12
2015 224 bikes 2
2014 223 cars 4
2014 223 bikes 1
2014 224 cars 5
2014 224 sneakers 3
;
run;
proc sort data=have; by id item year; run;
data want;
set have;
by id item;
diff=dif(value);
if first.item or (first.item and last.item) then diff=.;
run;
Or flip the data to a wide format, usually not recommended, but in this case it makes sense to me.
proc transpose data=have out=flipped prefix=Y;
by id item;
var value;
id year;
run;
data want2;
set flipped;
diff= Y2015-Y2014;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.