Hello
Let's say that I have a data set and I want to do the following:
From column 4 I want to calculate the difference between the following columns:
Col 4 and Col 6
Col 5 and Col 7
Col 6 and Col 8
Col 7 and Col 9
and so on...
What is the way to do it please?
DATA have;
INPUT X
Z
W
v20230101
P20230101
v20230102
P20230102
v20230103
P20230103
;
cards;
1 22 999 70 0.2 80 0.15 90 0.15
1 22 999 10 0.4 14 0.1 12 0.2
1 33 999 8 0.7 9 0.8 14 0.3
1 33 999 4 0.1 11 0.3 8 0.05
1 44 999 0 0 2 0.05 4 0.1
1 44 999 0 0 0 0 0 0
1 55 999 15 0.5 20 0.8 25 0.4
1 55 999 12 0.7 14 0.9 22 0.15
1 66 999 14 1.5 15 1.4 18 0.3
1 66 999 3 0.8 4 0.8 19 0.1
;
run;
Wanted data set
You continue to make the mistake of putting calendar information into variable names, and thus you miss much easier ways to handle this.
If your data had a column named DATE which contains the date values, and a column named P which contains the values of your Pyyyymmdd variables, and a column named V which contains the values of your Vyyyymmdd variables, then the programming is incredibly simple ... you use the DIF function, and you get these differences without having to write code with easy to mis-type variable names that will necessarily change over time. If your data layout is as I described, you can have hundreds of dates, which change over time, and the programming handles them all without modification and without you having to specify the dates in the variable names.
So, let's start with data in a better arrangement.
data fake;
input id date :yymmdd8. p v;
format date yymmddn8.;
datalines;
1 20230101 70 0.4
1 20230102 80 0.9
1 20230103 72 0.22
1 20230104 109 0
2 20230101 65 0.33
2 20230102 64 0.67
2 20230103 70.1 0.9
2 20230104 109 0.88
;
Now, look how little programming is required to obtain the differences you want:
data want;
set fake;
by id;
p_diff=dif(p);
v_diff=dif(v);
if first.id then call missing(p_diff,v_diff);
run;
Notice how if there are more dates added to the data set, no changes to the programming are required. The differences are still properly computed.
You continue to make the mistake of putting calendar information into variable names, and thus you miss much easier ways to handle this.
If your data had a column named DATE which contains the date values, and a column named P which contains the values of your Pyyyymmdd variables, and a column named V which contains the values of your Vyyyymmdd variables, then the programming is incredibly simple ... you use the DIF function, and you get these differences without having to write code with easy to mis-type variable names that will necessarily change over time. If your data layout is as I described, you can have hundreds of dates, which change over time, and the programming handles them all without modification and without you having to specify the dates in the variable names.
So, let's start with data in a better arrangement.
data fake;
input id date :yymmdd8. p v;
format date yymmddn8.;
datalines;
1 20230101 70 0.4
1 20230102 80 0.9
1 20230103 72 0.22
1 20230104 109 0
2 20230101 65 0.33
2 20230102 64 0.67
2 20230103 70.1 0.9
2 20230104 109 0.88
;
Now, look how little programming is required to obtain the differences you want:
data want;
set fake;
by id;
p_diff=dif(p);
v_diff=dif(v);
if first.id then call missing(p_diff,v_diff);
run;
Notice how if there are more dates added to the data set, no changes to the programming are required. The differences are still properly computed.
From this long data set, you can display the results any way you want. You can display this long data set as long, or you can display this long data set as wide (without having to modify the structure of the data set). There is no way for me to be more definitive as I don't understand your problem or the needs of the people who will be viewing this report.
To display these differences wide, you would use PROC REPORT, where DATE is an across variable. This gives you columns named 20230101 and 20230102 and so on. This is done internally by SAS for across variables, so you don't have to create and code these column names yourself. Then the program works for any set of date values in the data set, so if tomorrow another date value 20230105 is added to the data, the whole thing runs properly without changing the code.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.