i am trying to subtract rows based on variables
id        Month         TYPE       ENC          count      
11      OCT2019     DO           Pan            249
11      OCT2019     DO           Unpan        12
11      OCT2019     TA             Pan            63
11      OCT2019     TA             Unpan         5
11 OCT2019 VA Pan 3
11      NOV2019     DO           Pan            202
11      NOV2019     DO           Unpan        23
11      NOV2019     TA             Pan            55
11      NOV2019     TA             Unpan        12    
what i want
id        Month         TYPE            diff 
11      OCT2019     DO               -237
11      OCT2019     TA                -58
11 OCT2019 VA -3
11      NOV2019     DO               -178
11      NOV2019     TA                 -43
If there are always exactly two values of ENC for each Type and they occur so that you want the last difference then this may work:
data want; set have; by id month type notsorted; diff = dif(count); if last.type; drop enc count; run;
DIF is a special function that does the difference between the current value and the previous value. You can look at a lot of previous by using DIF2 DIF3 .. DIF99 and such where the number indicates how far back you want to look in the data set but it is ONLY those two specific values. The default is DIF1 if no number is supplied.
If there are always exactly two values of ENC for each Type and they occur so that you want the last difference then this may work:
data want; set have; by id month type notsorted; diff = dif(count); if last.type; drop enc count; run;
DIF is a special function that does the difference between the current value and the previous value. You can look at a lot of previous by using DIF2 DIF3 .. DIF99 and such where the number indicates how far back you want to look in the data set but it is ONLY those two specific values. The default is DIF1 if no number is supplied.
thank you that works however i just noticed that i have another ENC 'VA' which sometimes has a pan and some times has UNpan and i am noticing that its not giving the correct difference for that ENC
so that the dat looks like this
The data i have
id        Month         TYPE       ENC          count      
11      OCT2019     DO           Pan            249
11      OCT2019     DO           Unpan        12
11      OCT2019     TA             Pan            63
11      OCT2019     TA             Unpan         5
11 OCT2019 VA Pan 3
11      NOV2019     DO           Pan            202
11      NOV2019     DO           Unpan        23
11      NOV2019     TA             Pan            55
11      NOV2019     TA             Unpan        12    
what i want
id        Month         TYPE            diff 
11      OCT2019     DO               -237
11      OCT2019     TA                -58
11 OCT2019 VA -3
11      NOV2019     DO               -178
11      NOV2019     TA                 -43
@hk2013 wrote:
thank you that works however i just noticed that i have another ENC 'VA' which sometimes has a pan and some times has UNpan and i am noticing that its not giving the correct difference for that ENC
so that the dat looks like this
The data i have
id Month TYPE ENC count
11 OCT2019 DO Pan 249
11 OCT2019 DO Unpan 12
11 OCT2019 TA Pan 63
11 OCT2019 TA Unpan 511 OCT2019 VA Pan 3
11 NOV2019 DO Pan 202
11 NOV2019 DO Unpan 23
11 NOV2019 TA Pan 55
11 NOV2019 TA Unpan 12
what i want
id Month TYPE diff
11 OCT2019 DO -237
11 OCT2019 TA -5811 OCT2019 VA -3
11 NOV2019 DO -178
11 NOV2019 TA -43
Please note that my response says : If there are always exactly two values of ENC for each Type and they occur so that you want the last difference then this may work.
So you have different data then you provided as an example. What should the value be when there is a type with only one ENC value. Please provide an explicit rule as a single example doesn't really show all the possibilities.
This may also be a good time to look if you have more combinations of Type and Enc. What if you have Type with a single Enc of Unpan? Would that have a different rule than a single Enc of Pan? Are there any other values of Enc then Pan and Unpan? Would they have different rules?
Sorry i know your reply had said that it will only work for two values. i just noticed the 3rd one just now. The issue is with ENC 'VA' it is sometime with type pan and sometimes with unpan but i have gone through all my data and there are only three ENC "TA" 'DO' and "VA' and type 'pan' and 'unpan'
id        Month         TYPE       ENC          count      
11      OCT2019     DO           Pan            249
11      OCT2019     DO           Unpan        12
11      OCT2019     TA             Pan            63
11      OCT2019     TA             Unpan         5
11 OCT2019 VA Pan 3
11      NOV2019     DO           Pan            202
11      NOV2019     DO           Unpan        23
11      NOV2019     TA             Pan            55
11      NOV2019     TA             Unpan        12  
11 NOV2019 VA Unpan 9
what i want
id        Month         TYPE            diff 
11      OCT2019     DO               -237
11      OCT2019     TA                -58
11 OCT2019 VA -3
11      NOV2019     DO               -178
11      NOV2019     TA                 -43
11 NOV2019 VA 9
@hk2013 wrote:
Sorry i know your reply had said that it will only work for two values. i just noticed the 3rd one just now. The issue is with ENC 'VA' it is sometime with type pan and sometimes with unpan but i have gone through all my data and there are only three ENC "TA" 'DO' and "VA' and type 'pan' and 'unpan'
id Month TYPE ENC count
11 OCT2019 DO Pan 249
11 OCT2019 DO Unpan 12
11 OCT2019 TA Pan 63
11 OCT2019 TA Unpan 511 OCT2019 VA Pan 3
11 NOV2019 DO Pan 202
11 NOV2019 DO Unpan 23
11 NOV2019 TA Pan 55
11 NOV2019 TA Unpan 1211 NOV2019 VA Unpan 9
what i want
id Month TYPE diff
11 OCT2019 DO -237
11 OCT2019 TA -5811 OCT2019 VA -3
11 NOV2019 DO -178
11 NOV2019 TA -4311 NOV2019 VA 9
I suspect I guess the rule for the single TYPE cases but I really would like you to state it explicitly.
When you use a BY statement in the data step as in this case there are a number of automatic variables created for each one. You reference these with the First.variablename or Last.varaiblename syntax. These automatic variables have a value of 1 when true First.variablename =1 means this is first record of that group, Last.variablename=1 means it is the last record of a group, otherwise the result is 0. You can test if a group only has one record by:
If First.variablename and Last.variablename
So an outline would be something like:
data want;
   set have;
   by id month type notsorted;
   diff = dif(count);
   if first.type and last.type then do;
     <something apparently depending on the value of ENC>
   end;
   else if last.type then output;
   drop enc count;
run;
where you would replace the "<something apparently depending on the value of ENC>" with the rules of how to assign the Diff value when only one Type exists. And would likely include an output as the last bit inside the Do/End.
The original solultion looks like this:
data want;
   set have;
   by id month type notsorted;
   diff = dif(count);
   if last.type;
   drop enc count;
run;It seems a small change ... adding one line ... will satisfy the more complex requirements:
data want;
   set have;
   by id month type notsorted;
   diff = dif(count);
   if first.type then diff = -count;
   if last.type;
   drop enc count;
run;It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
