Updated (7-27-20)
Hello all,
I could use your help figuring out some code. I have different 'buckets' that take on different values depending on the time, bucketcode, and state, and I am trying to compute the change in values (count) across time and state. Please see an illustration of my data below.
A couple new notes:
(1) I need it to accommodate multiple bucketcodes per state per time period.
(2) I need to account for the appearance / disappearance of bucketcodes within a state. For example, a new bucketcode might enter OH at time period 103 that wasnt previously there in the prior time period (102). If a new bucketcode appears in a time period (eg., 102) (that isn't the first) then that should be reflected in the changescore differential for that bucketcode/state/timeperiod triplet with whatever the addition in count was, eg., 1. If a bucket that was there in a time period disappears in a subsequent period (was there in 102, but not in a state-time-bucketcode triplet in the next period, eg., 103) then the state-time-bucketcode triplet should reflect whatever the drop in value is/was for that subsequent period. For example, VA-104-333 has a count of 2, but if VA-105-333 is not there then VA-105-333 would be -2.
(3) Dropping the first time period is fine, but after that if a bucketcode enters a state and time period, that wasn't previously there, it should be treated as entering the dataset.
data have; input time Bucketcode State $ Count; datalines; 101 222 OH 3 102 222 OH 4 103 222 OH 4 103 333 OH 1 104 333 OH 1 101 111 TX 2 102 111 TX 1 103 111 TX 1 104 111 TX 2 101 222 VA 1 102 222 VA 2 103 222 VA 2 104 222 VA 4 104 444 VA 2 ; data want; input Time Bucketcode State $ Countdiff; datalines; 102 222 OH 1 103 222 OH 0 103 333 OH 1 104 222 OH -4 104 333 OH 0 102 111 TX -1 103 111 TX 0 104 111 TX 1 102 222 VA 1 103 222 VA 0 104 222 VA 2 104 444 VA 2 ;
Thanks!
This appears to do what is requested except for output order.
data have; input Time Bucketcode State $ Count; datalines; 101 111 TX 2 101 222 VA 1 101 222 OH 3 102 111 TX 1 102 222 VA 2 102 222 OH 4 103 111 TX 1 103 222 VA 2 103 222 OH 4 104 111 TX 2 104 222 VA 4 104 222 OH 1 ; proc sort data=have; by state bucketcode time; run; data want; set have; by state bucketcode ; Countdif =dif(count); if not first.bucketcode; run; proc sort data=want; by time bucketcode state; run;
I have used a different variable to hold the count difference for demonstration.
However, since you don't show any example data where a state has multiple bucketcodes no promises about more complex data.
The DIF function used is the difference between the current row's value of a variable and the previous row's value. Normally I would set the value for the first record of a group such as state
to missing but since you apparently are dropping the first record that isn't needed here.
If the final order is important you need to describe that as a requirement.
So do you want the first state buckcode dropped or not I followed your original example which did so because that appeared to be the requirement. But if that is not a requirement and likely shouldn't then say so.
Please in some detail describe exactly what you wantto have happen with a bucketcode that disappears per "Conversely, a bucketcode that was there in OH during time period 103 might exit that state and not be there in time period 104. In both cases, the appearance / disappearances need to be marked in the change variable (e.g., +1, -4, etc.; see example)."
And I am not going to spend a lot time trying to make that verbiage match your data because the order you are displaying things gives me a headache. If you want to have things by "groups", then display them by groups. Then we don't have to try to hop around a bad order to tell when something may need to be inserted that wasn't there previously (which may be what I think you are requesting).
And Please provide the data in the form of a data step. I have shown you how to do that. And post the code in a code box opened with the </> icon.
data have; input time Bucketcode State $ Count; datalines; 101 222 OH 3 102 222 OH 4 103 222 OH 4 103 333 OH 1 104 333 OH 1 101 111 TX 2 102 111 TX 1 103 111 TX 1 104 111 TX 2 101 222 VA 1 102 222 VA 2 103 222 VA 2 104 222 VA 4 104 444 VA 2 ; data want; input Time Bucketcode State $ Countdiff; datalines; 102 222 OH 1 103 222 OH 0 103 333 OH 1 104 222 OH -4 104 333 OH 0 102 111 TX -1 103 111 TX 0 104 111 TX 1 102 222 VA 1 103 222 VA 0 104 222 VA 2 104 444 VA 2 ;
Hello Ballard, yes, it is fine that the first state bucketcode is dropped as long as it appears in that first time period (i.e., 101). Appearances/disappearances of bucketcodes in subsequent time periods should be treated differently. If a new bucketcode appears in a time period (eg., 102) (that isn't the first) then that should be reflected in the changescore differential for that bucketcode/state/timeperiod triplet with whatever the addition in count was, eg., 1. If a bucket that was there in a time period disappears in a subsequent period (was there in 102, but not in a state-time-bucketcode triplet in the next period, eg., 103) then the state-time-bucketcode triplet should reflect whatever the drop in value is/was for that subsequent period. For example, VA-104-333 has a count of 2, but VA-105-333 is not there then VA-105-333 would be -2. Thank you again for your help.
First, neither of those data step will run as posted. Datalines ONLY has the values. If you copy either of those to your editor and run the code you should get some errors. Pleases fix them. Datalines blocks end with a ;
And you should not have blank rows inside a datalines block either.
And did you understand at all what I said about grouping order? I can't follow your logic with the out of state order results of your example.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.