BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

Let's say that I have a data set and I want to do the following:

From column 4  I want to calculate the difference between the following columns:

Col 4 and Col 6

Col 5 and Col 7

Col 6 and Col 8

Col 7 and Col 9

and so on...

What is the way to do it please?


DATA have;
INPUT   X 
        Z             
        W               
        v20230101        
        P20230101        
        v20230102        
        P20230102       
        v20230103       
        P20230103            
;
cards;
1 22 999 70 0.2 80 0.15 90 0.15
1 22 999 10 0.4 14 0.1 12 0.2
1 33 999 8 0.7 9 0.8 14 0.3
1 33 999 4 0.1 11 0.3 8 0.05
1 44 999 0 0 2 0.05 4 0.1
1 44 999 0 0 0 0 0 0
1 55 999 15 0.5 20 0.8 25 0.4
1 55 999 12 0.7 14 0.9 22 0.15
1 66 999 14 1.5 15 1.4 18 0.3
1 66 999 3 0.8 4 0.8 19 0.1
;
run;
 

Wanted data set

Ronein_0-1674479208656.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

You continue to make the mistake of putting calendar information into variable names, and thus you miss much easier ways to handle this.

 

If your data had a column named DATE which contains the date values, and a column named P which contains the values of your  Pyyyymmdd variables, and a column named V which contains the values of your Vyyyymmdd variables, then the programming is incredibly simple ... you use the DIF function, and you get these differences without having to write code with easy to mis-type variable names that will necessarily change over time. If your data layout is as I described, you can have hundreds of dates, which change over time, and the programming handles them all without modification and without you having to specify the dates in the variable names.

 

So, let's start with data in a better arrangement.

 

data fake;
    input id date :yymmdd8. p v;
    format date yymmddn8.;
    datalines;
1 20230101 70 0.4
1 20230102 80 0.9
1 20230103 72 0.22
1 20230104 109 0
2 20230101 65 0.33
2 20230102 64 0.67
2 20230103 70.1 0.9
2 20230104 109 0.88
;

 

Now, look how little programming is required to obtain the differences you want:

 

data want;
    set fake;
    by id;
    p_diff=dif(p);
    v_diff=dif(v);
    if first.id then call missing(p_diff,v_diff);
run;

 

Notice how if there are more dates added to the data set, no changes to the programming are required. The differences are still properly computed.

--
Paige Miller

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

You continue to make the mistake of putting calendar information into variable names, and thus you miss much easier ways to handle this.

 

If your data had a column named DATE which contains the date values, and a column named P which contains the values of your  Pyyyymmdd variables, and a column named V which contains the values of your Vyyyymmdd variables, then the programming is incredibly simple ... you use the DIF function, and you get these differences without having to write code with easy to mis-type variable names that will necessarily change over time. If your data layout is as I described, you can have hundreds of dates, which change over time, and the programming handles them all without modification and without you having to specify the dates in the variable names.

 

So, let's start with data in a better arrangement.

 

data fake;
    input id date :yymmdd8. p v;
    format date yymmddn8.;
    datalines;
1 20230101 70 0.4
1 20230102 80 0.9
1 20230103 72 0.22
1 20230104 109 0
2 20230101 65 0.33
2 20230102 64 0.67
2 20230103 70.1 0.9
2 20230104 109 0.88
;

 

Now, look how little programming is required to obtain the differences you want:

 

data want;
    set fake;
    by id;
    p_diff=dif(p);
    v_diff=dif(v);
    if first.id then call missing(p_diff,v_diff);
run;

 

Notice how if there are more dates added to the data set, no changes to the programming are required. The differences are still properly computed.

--
Paige Miller
Ronein
Meteorite | Level 14
Thanks a lot, i understand that the correct and better structure is Long structure and not wide structure.
I understand that in long structure it is easy to calculate difference between rows using diff function.
Here is some information about dif function that i found.
The DIF functions, DIF1, DIF2, ..., DIF100, return the first differences between the argument and its nth lag. DIF1 can also be written as DIF. DIF n is defined as DIF n( x)= x-LAG n( x).

My question:
What do you think is better way to dispaly the final report? Wide or long structure. I understand that the data set structure is better to be long but it might happen that the user who want to see the report will prefer wide way.
What is the way then to create the report that represents the information in wide way?
Do you think that better report is wide or long structure (in this example)?
PaigeMiller
Diamond | Level 26

From this long data set, you can display the results any way you want. You can display this long data set as long, or you can display this long data set as wide (without having to modify the structure of the data set). There is no way for me to be more definitive as I don't understand your problem or the needs of the people who will be viewing this report.


To display these differences wide, you would use PROC REPORT, where DATE is an across variable. This gives you columns named 20230101 and 20230102 and so on. This is done internally by SAS for across variables, so you don't have to create and code these column names yourself. Then the program works for any set of date values in the data set, so if tomorrow another date value 20230105 is added to the data, the whole thing runs properly without changing the code.

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1733 views
  • 1 like
  • 2 in conversation