BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
xezus
Calcite | Level 5

Hello,

 

I have a dataset of about 50,000 datapoints that can be summarized as 

 

Year         ID           ITEM         Value

2015        223          cars            5

2015        223          hats            10

2015        224          cars            12

2015        224          bikes            2

2014        223          cars             4

2014        223          bikes            1

2014        224          cars              5

2014        224         sneakers       3

 

 

I would like to do comparisons of 2015 values to 2014 values for each ID and Item. So for ID #223 compare the number of cars in 2015 and 2014, easy enough because they both exist. But to compare the number of hats in 2015 to hats in 2014 (when there were none) is giving me problems.

 

I need to create a 2014 row for ID#223 for hats and place a 0 value with it. Essentially I need a set of code that will go through all of the IDs for 2014 and 2015 and create rows with 0 values for all the items they do not share between years.

 

It is not feasible (in terms of time it would take) to create an entire dataset with null values + all the item names and then merge them without duplicates. The time it would take to create that null value dataset would pretty much be as long as it would take me to do this whole thing by hand one at a time.

 

Thank you for your help in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

This keeps the item expansion within each ID.

 

data have;
   input Year ID $ ITEM $ Value;
   cards;
2015    223     cars      5
2015    223     hats      10
2015    224     cars      12
2015    224     bikes      2
2014    223     cars       4
2014    223     bikes      1
2014    224     cars       5
2014    224     sneakers    3
;;;;
   run;
proc sort data=have;
   by id;
   run;
proc print;
   run;
proc summary data=have nway completetypes;
   by id;
   class year item;
   freq value;
   output out=expanded;
   run;
proc print;
   run;

Capture.PNG 

View solution in original post

4 REPLIES 4
data_null__
Jade | Level 19

This keeps the item expansion within each ID.

 

data have;
   input Year ID $ ITEM $ Value;
   cards;
2015    223     cars      5
2015    223     hats      10
2015    224     cars      12
2015    224     bikes      2
2014    223     cars       4
2014    223     bikes      1
2014    224     cars       5
2014    224     sneakers    3
;;;;
   run;
proc sort data=have;
   by id;
   run;
proc print;
   run;
proc summary data=have nway completetypes;
   by id;
   class year item;
   freq value;
   output out=expanded;
   run;
proc print;
   run;

Capture.PNG 

xezus
Calcite | Level 5

Thank you so much for your help.

 

After running into some problems with zero values and negatives in the original dataset (which i ended up removing), it has all come together perfectly!

 

Thanks again!

 

 

Astounding
PROC Star

50,000 observations is small enough that you can process it a couple of times.  For example:

 

proc sort data=have out=just_2_years;

   by id item year;

  where year in (2014, 2015);

run;

 

data want;

set have;

by id item;

output;

if first.item and year=2015 then do;

   value=0;

   year=2014;

   output;

end;

else if last.item and year=2014 then do;

   value=0

   year=2015;

   output;

end;

run;

 

The observations might need to be sorted again ... it depends on how you will conduct the analysis from that point.

Reeza
Super User

If you only have two years why bother. 

 

You don't show how you want the output.

data have;
input Year         ID    $       ITEM   $       Value;
cards;
2015        223          cars            5
2015        223          hats            10
2015        224          cars            12
2015        224          bikes            2
2014        223          cars             4
2014        223          bikes            1
2014        224          cars              5
2014        224         sneakers       3
;
run;

proc sort data=have; by id item year; run;

data want;
set have;
by id item;
diff=dif(value);
if first.item or (first.item and last.item) then diff=.;
run;

Or flip the data to a wide format, usually not recommended, but in this case it makes sense to me.

 

proc transpose data=have out=flipped prefix=Y;
by id item;
var value;
id year;
run;

data want2;
set flipped;
diff= Y2015-Y2014;
run;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 901 views
  • 1 like
  • 4 in conversation