Hello,
I'm trying to find a price value associated with upgrading arcade machines.
My datasets contain store inventory of arcade machines and their associated price. I'm trying to find each years increase in cost associated with UPGRADING the machines.
sample dataset, I have a dataset YEARLYSTATEMENT_yyyy from 2015-2020, there are around about 400-600 rows each year.
YEARLYSTATEMENT_2015
STORE_NUM MACHINE_INV PRICE005 P006 301
001 P001 212
004 P016 351
012 P091 462
007 P092 504
005 P093 520
005 P090 480
012 P014 322
003 P015 412
etc.. ... ...
Each store keeps an inventory of each machine that they currently have and it reflects that here under machine_inv
Upgrading the machine is not simply buying a new one but buying a new model of the same kind of machine, I also have a small list of upgrades by the manufacturer that looks like this:
P014->P015->P016
P090->P091->P092->P093
P055->P056->P057
P080->P081->P082
Every year an arcade will buy new machines and upgrade old ones, i'm trying to find the cost associated with only upgrading to new machines year to year. For example in 2015 if 100,000 was our inventory worth on machines for all stores, and in 2016 130,000 was the new value, I want to see how much of that increase of 30,000 was solely because we upgraded the machines and not purchased a separate model.
Another factor that complicated this is that an upgrade can be from P014 -> P015 or it can also be P014 -> P016, any increase in the upgrade "chain" will count as an upgrade. When an arcade upgrades a machine, they will no longer carry any of the previous version.
An idea i had to tackle this was to single out each machine that has an upgrade path and then add the cost up and subtract by the number the previous year received. But by doing this I cant account for if the store had the previous model or not or if they just purchased one of these machines outright.
Is there a way in sas that I can easily code to look if a previous year carried a machine in previous part of the upgrade chain and calculate the cost associated with upgrading that year?
Sorry if this is kind of a complex problem, I'm just not sure how to tackle this at all.
I would really appreciate any help, insights, or ideas.
Please run this code and see if the resulting Price_dif is what you expect.
proc sort data=arcade; by store_num MACHINE_INV year; run; proc format; value $macseries 'P014','P015','P016' ='P014' 'P090','P091','P092','P093' ='P090' 'P055','P056','P057' ='P055' 'P080','P081','P082' ='P080' ; run; data temp; set arcade; macseries = put(machine_inv,$macseries.); run; data want; set temp; by store_num macseries; dp= dif(price); if first.macseries then price_dif=0; else price_dif=dp; drop dp; run;
There is an intermediate step that creates a variable indicating which series of upgrades, what I called “'basically the same thing' identifier”. That variable could be added with if/then/else statements but since these upgrade paths may not be ended it might be easier to simply add a value to the format and not have to change any other code.
If you haven't seen it before the DIF function returns the difference of the current value of variable from that on the previous record (actually you can use DIFx, x=1 to more than you should use generally to get differences from 1 to x values). Using BY statement on the group variable created allows us to test if this the first time that upgrade path or specific machine is encountered. We wouldn't want a difference from a different machine so is set to 0 (possibly missing will make more sense later, depends on use) for the price_dif variable and that actual difference from the last year otherwise.
Show what your SAS data sets look like as data step code.
If you do not have a YEAR for each record of store, machine and price then that will be part of the solution.
Then you will want the data sets combined.
Likely the "upgrade" information would be used to create some sort of 'basically the same thing' identifier for cross year purposes UNLESS those upgrade paths cross each other somewhere.
"Easily" would depend on your experience.
Thanks for the reply!
So this is a more accurate example of my data in a data step.
I've combined the datasets and added the year variable.
DATA arcade;
INPUT STORE_NUM $ MACHINE_INV $ PRICE YEAR;
CARDS;
005 P014 322 2015
001 P001 212 2015
004 P016 351 2015
012 P045 462 2015
007 P055 634 2015
005 P145 504 2015
005 P090 480 2015
012 P014 322 2015
003 P512 521 2015
005 P015 343 2016
001 P001 212 2016
004 P016 351 2016
005 P091 492 2016
007 P055 634 2016
005 P077 520 2016
005 P122 480 2016
012 P014 322 2016
003 P015 343 2016
005 P016 351 2017
001 P001 212 2017
004 P016 351 2017
012 P091 492 2017
007 P057 688 2017
005 P092 520 2017
005 P047 480 2017
012 P016 351 2017
003 P016 351 2017
RUN;
In this data I have some examples of what I'm looking at. In store 005 upgrading from P014->P015->P016 each year, and store 007 went from p055 in 2015 and 2016, to p057 in 2017. I'm just trying to find the price variable difference by upgrading these machines year over year. One assumption is that upgrading a machine will always increase the price variable, so there should always be an increase when looking at upgrades every year.
I’m not sure what you mean by “'basically the same thing' identifier”, and my SAS experience is not too high so I'm really not sure how to continue from here.
Please run this code and see if the resulting Price_dif is what you expect.
proc sort data=arcade; by store_num MACHINE_INV year; run; proc format; value $macseries 'P014','P015','P016' ='P014' 'P090','P091','P092','P093' ='P090' 'P055','P056','P057' ='P055' 'P080','P081','P082' ='P080' ; run; data temp; set arcade; macseries = put(machine_inv,$macseries.); run; data want; set temp; by store_num macseries; dp= dif(price); if first.macseries then price_dif=0; else price_dif=dp; drop dp; run;
There is an intermediate step that creates a variable indicating which series of upgrades, what I called “'basically the same thing' identifier”. That variable could be added with if/then/else statements but since these upgrade paths may not be ended it might be easier to simply add a value to the format and not have to change any other code.
If you haven't seen it before the DIF function returns the difference of the current value of variable from that on the previous record (actually you can use DIFx, x=1 to more than you should use generally to get differences from 1 to x values). Using BY statement on the group variable created allows us to test if this the first time that upgrade path or specific machine is encountered. We wouldn't want a difference from a different machine so is set to 0 (possibly missing will make more sense later, depends on use) for the price_dif variable and that actual difference from the last year otherwise.
Thank you so much for the help!
The solution you provided somewhat works, I encountered a few problems trying to implement it.
There is no distinction between stores using this code and it will just takes the previous rows price and uses that for the difference, for example the output looks like this:
STORE_NUM MACHINE_INV PRICE YEAR macseries price_dif
05 P012 322 2015 P014 0
05 P145 504 2015 P145 182
12 P095 390 2016 P095 -114
Here store 12 is calculating the difference between two stores instead of machines within the same store. It's easy to notice since the output is sorted by store and macseries, and each transition to the next store or macseries in the data there is a difference calculation going.
It does this throughout the output whenever the macseries value changes, pretty much just taking the difference from the previous row and ignoring things like store number and machine type.
Edit: looks like this issue was caused by some bad additions to the code, I've resolved this now and it works perfectly, sorry for the confusion.
Show the actual code you used. Starting with the Proc Sort step. The example data and the code provided do treat stores as part of the group.
The code you provided performs as it should, my error was poorly implementing some restrictions in the code. I do have a few questions if you don't mind answering.
1. Some machine upgrades go in descending order ( ie 200->199->198), for this I created a descending order on a different set during the proc sort phase, but when I try to run the code it tells me that the BY variables are not properly sorted on the dataset. I'm not sure what the problem is here, where the only change is adding descending before the variable machine_inv in proc sort.
2. The calculation for machine upgrades only calculates it for one machine, for example:
STORE_NUM MACHINE_INV PRICE YEAR macseries price_dif
005 P014 322 2015 P014 0
005 P014 322 2015 P014 0
005 P015 343 2016 P014 21
005 P015 343 2016 P014 0
005 P015 343 2016 P014 0
For my situation I want to populate all the rows of that year where a difference was calculated, so the last 3 rows in this example should all show 21. I'm thinking of just using an if then in a datastep to do this, but I'm not sure if there is a simpler solution.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.