Meteorite | Level 14

Calculate difference between columns

Hello

Let's say that I have a data set and I want to do the following:

From column 4  I want to calculate the difference between the following columns:

Col 4 and Col 6

Col 5 and Col 7

Col 6 and Col 8

Col 7 and Col 9

and so on...

What is the way to do it please?

``````
DATA have;
INPUT   X
Z
W
v20230101
P20230101
v20230102
P20230102
v20230103
P20230103
;
cards;
1 22 999 70 0.2 80 0.15 90 0.15
1 22 999 10 0.4 14 0.1 12 0.2
1 33 999 8 0.7 9 0.8 14 0.3
1 33 999 4 0.1 11 0.3 8 0.05
1 44 999 0 0 2 0.05 4 0.1
1 44 999 0 0 0 0 0 0
1 55 999 15 0.5 20 0.8 25 0.4
1 55 999 12 0.7 14 0.9 22 0.15
1 66 999 14 1.5 15 1.4 18 0.3
1 66 999 3 0.8 4 0.8 19 0.1
;
run;
``````

Wanted data set

1 ACCEPTED SOLUTION

Accepted Solutions
Diamond | Level 26

Re: Calculate difference between columns

You continue to make the mistake of putting calendar information into variable names, and thus you miss much easier ways to handle this.

If your data had a column named DATE which contains the date values, and a column named P which contains the values of your  Pyyyymmdd variables, and a column named V which contains the values of your Vyyyymmdd variables, then the programming is incredibly simple ... you use the DIF function, and you get these differences without having to write code with easy to mis-type variable names that will necessarily change over time. If your data layout is as I described, you can have hundreds of dates, which change over time, and the programming handles them all without modification and without you having to specify the dates in the variable names.

``````data fake;
input id date :yymmdd8. p v;
format date yymmddn8.;
datalines;
1 20230101 70 0.4
1 20230102 80 0.9
1 20230103 72 0.22
1 20230104 109 0
2 20230101 65 0.33
2 20230102 64 0.67
2 20230103 70.1 0.9
2 20230104 109 0.88
;``````

Now, look how little programming is required to obtain the differences you want:

``````data want;
set fake;
by id;
p_diff=dif(p);
v_diff=dif(v);
if first.id then call missing(p_diff,v_diff);
run;``````

Notice how if there are more dates added to the data set, no changes to the programming are required. The differences are still properly computed.

--
Paige Miller
3 REPLIES 3
Diamond | Level 26

Re: Calculate difference between columns

You continue to make the mistake of putting calendar information into variable names, and thus you miss much easier ways to handle this.

If your data had a column named DATE which contains the date values, and a column named P which contains the values of your  Pyyyymmdd variables, and a column named V which contains the values of your Vyyyymmdd variables, then the programming is incredibly simple ... you use the DIF function, and you get these differences without having to write code with easy to mis-type variable names that will necessarily change over time. If your data layout is as I described, you can have hundreds of dates, which change over time, and the programming handles them all without modification and without you having to specify the dates in the variable names.

``````data fake;
input id date :yymmdd8. p v;
format date yymmddn8.;
datalines;
1 20230101 70 0.4
1 20230102 80 0.9
1 20230103 72 0.22
1 20230104 109 0
2 20230101 65 0.33
2 20230102 64 0.67
2 20230103 70.1 0.9
2 20230104 109 0.88
;``````

Now, look how little programming is required to obtain the differences you want:

``````data want;
set fake;
by id;
p_diff=dif(p);
v_diff=dif(v);
if first.id then call missing(p_diff,v_diff);
run;``````

Notice how if there are more dates added to the data set, no changes to the programming are required. The differences are still properly computed.

--
Paige Miller
Meteorite | Level 14

Re: Calculate difference between columns

Thanks a lot, i understand that the correct and better structure is Long structure and not wide structure.
I understand that in long structure it is easy to calculate difference between rows using diff function.
Here is some information about dif function that i found.
The DIF functions, DIF1, DIF2, ..., DIF100, return the first differences between the argument and its nth lag. DIF1 can also be written as DIF. DIF n is defined as DIF n( x)= x-LAG n( x).

My question:
What do you think is better way to dispaly the final report? Wide or long structure. I understand that the data set structure is better to be long but it might happen that the user who want to see the report will prefer wide way.
What is the way then to create the report that represents the information in wide way?
Do you think that better report is wide or long structure (in this example)?
Diamond | Level 26

Re: Calculate difference between columns

From this long data set, you can display the results any way you want. You can display this long data set as long, or you can display this long data set as wide (without having to modify the structure of the data set). There is no way for me to be more definitive as I don't understand your problem or the needs of the people who will be viewing this report.

To display these differences wide, you would use PROC REPORT, where DATE is an across variable. This gives you columns named 20230101 and 20230102 and so on. This is done internally by SAS for across variables, so you don't have to create and code these column names yourself. Then the program works for any set of date values in the data set, so if tomorrow another date value 20230105 is added to the data, the whole thing runs properly without changing the code.

--
Paige Miller
Discussion stats
• 3 replies
• 1524 views
• 1 like
• 2 in conversation