Solved: Calculate difference between columns

Ronein · Posted 01-23-2023 08:07 AM

Hello

Let's say that I have a data set and I want to do the following:

From column 4 I want to calculate the difference between the following columns:

Col 4 and Col 6

Col 5 and Col 7

Col 6 and Col 8

Col 7 and Col 9

and so on...

What is the way to do it please?


DATA have;
INPUT   X 
        Z             
        W               
        v20230101        
        P20230101        
        v20230102        
        P20230102       
        v20230103       
        P20230103            
;
cards;
1 22 999 70 0.2 80 0.15 90 0.15
1 22 999 10 0.4 14 0.1 12 0.2
1 33 999 8 0.7 9 0.8 14 0.3
1 33 999 4 0.1 11 0.3 8 0.05
1 44 999 0 0 2 0.05 4 0.1
1 44 999 0 0 0 0 0 0
1 55 999 15 0.5 20 0.8 25 0.4
1 55 999 12 0.7 14 0.9 22 0.15
1 66 999 14 1.5 15 1.4 18 0.3
1 66 999 3 0.8 4 0.8 19 0.1
;
run;

Wanted data set

PaigeMiller · Posted 01-23-2023 08:31 AM

You continue to make the mistake of putting calendar information into variable names, and thus you miss much easier ways to handle this.

If your data had a column named DATE which contains the date values, and a column named P which contains the values of your Pyyyymmdd variables, and a column named V which contains the values of your Vyyyymmdd variables, then the programming is incredibly simple ... you use the DIF function, and you get these differences without having to write code with easy to mis-type variable names that will necessarily change over time. If your data layout is as I described, you can have hundreds of dates, which change over time, and the programming handles them all without modification and without you having to specify the dates in the variable names.

So, let's start with data in a better arrangement.

data fake;
    input id date :yymmdd8. p v;
    format date yymmddn8.;
    datalines;
1 20230101 70 0.4
1 20230102 80 0.9
1 20230103 72 0.22
1 20230104 109 0
2 20230101 65 0.33
2 20230102 64 0.67
2 20230103 70.1 0.9
2 20230104 109 0.88
;

Now, look how little programming is required to obtain the differences you want:

data want;
    set fake;
    by id;
    p_diff=dif(p);
    v_diff=dif(v);
    if first.id then call missing(p_diff,v_diff);
run;

Notice how if there are more dates added to the data set, no changes to the programming are required. The differences are still properly computed.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 01-23-2023 08:31 AM

You continue to make the mistake of putting calendar information into variable names, and thus you miss much easier ways to handle this.

If your data had a column named DATE which contains the date values, and a column named P which contains the values of your Pyyyymmdd variables, and a column named V which contains the values of your Vyyyymmdd variables, then the programming is incredibly simple ... you use the DIF function, and you get these differences without having to write code with easy to mis-type variable names that will necessarily change over time. If your data layout is as I described, you can have hundreds of dates, which change over time, and the programming handles them all without modification and without you having to specify the dates in the variable names.

So, let's start with data in a better arrangement.

data fake;
    input id date :yymmdd8. p v;
    format date yymmddn8.;
    datalines;
1 20230101 70 0.4
1 20230102 80 0.9
1 20230103 72 0.22
1 20230104 109 0
2 20230101 65 0.33
2 20230102 64 0.67
2 20230103 70.1 0.9
2 20230104 109 0.88
;

Now, look how little programming is required to obtain the differences you want:

data want;
    set fake;
    by id;
    p_diff=dif(p);
    v_diff=dif(v);
    if first.id then call missing(p_diff,v_diff);
run;

Notice how if there are more dates added to the data set, no changes to the programming are required. The differences are still properly computed.

--
Paige Miller

Ronein · Posted 01-23-2023 10:05 AM

Thanks a lot, i understand that the correct and better structure is Long structure and not wide structure.
I understand that in long structure it is easy to calculate difference between rows using diff function.
Here is some information about dif function that i found.
The DIF functions, DIF1, DIF2, ..., DIF100, return the first differences between the argument and its nth lag. DIF1 can also be written as DIF. DIF n is defined as DIF n( x)= x-LAG n( x).

My question:
What do you think is better way to dispaly the final report? Wide or long structure. I understand that the data set structure is better to be long but it might happen that the user who want to see the report will prefer wide way.
What is the way then to create the report that represents the information in wide way?
Do you think that better report is wide or long structure (in this example)?

PaigeMiller · Posted 01-23-2023 10:28 AM

From this long data set, you can display the results any way you want. You can display this long data set as long, or you can display this long data set as wide (without having to modify the structure of the data set). There is no way for me to be more definitive as I don't understand your problem or the needs of the people who will be viewing this report.

To display these differences wide, you would use PROC REPORT, where DATE is an across variable. This gives you columns named 20230101 and 20230102 and so on. This is done internally by SAS for across variables, so you don't have to create and code these column names yourself. Then the program works for any set of date values in the data set, so if tomorrow another date value 20230105 is added to the data, the whole thing runs properly without changing the code.

--
Paige Miller

Calculate difference between columns

Re: Calculate difference between columns

Re: Calculate difference between columns

Re: Calculate difference between columns

Re: Calculate difference between columns

Calculate difference between columns

Re: Calculate difference between columns

Re: Calculate difference between columns

Re: Calculate difference between columns

Re: Calculate difference between columns

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!