Hi,
I have data like this and with almost 500 columns numerical value.
ID | A | B | C | D |
1 | 1 | 2 | 3 | 4 |
1 | 2 | 3 | 4 | 5 |
2 | 1 | 3 | 4 | 5 |
2 | 2 | 6 | 3 | 8 |
3 | 1 | 3 | 4 | 7 |
3 | 3 | 4 | 3 | 6 |
I need an output like this, which use the last. ID _numeric_ MINUS first.ID _numeric_, Since there are almost 500 numerical features, I cannot subtract one by one,
ID | A | B | C | D |
1 | 1 | 1 | 1 | 1 |
2 | 1 | 3 | -1 | 3 |
3 | 2 | 1 | -1 | -1 |
Hey @Li2024! PROC EXPAND can do this very quickly. It's a great swiss-army knife of multi-row operations.
proc expand data=a out=want(where=(time > 0));
by id;
convert _NUMERIC_ / method=none transform=(dif 1);
run;
ID TIME A B C D
1 1 1 1 1 1
2 1 1 3 -1 3
3 1 2 1 -1 -1
You can ignore the TIME variable and drop it later if needed.
With a DATA Step, it looks like this:
data want;
set a;
by id;
array numvar[*] _NUMERIC_;
array difvar[5] _TEMPORARY_;
do i = 2 to dim(difvar);
difvar[i] = dif(numvar[i]);
if(NOT first.id) then numvar[i] = difvar[i];
end;
if(NOT first.id) then output;
drop i;
run;
ID A B C D
1 1 1 1 1
2 1 3 -1 3
3 2 1 -1 -1
The Data step code works perfect. My SAS doesn't support the proc expand.
Thank you so much!
You don't need the extra array.
And if you make ID character then it is easy to apply to all of the numeric variables.
data have;
input ID $ A B C D;
cards;
1 1 2 3 4
1 2 3 4 5
2 1 3 4 5
2 2 6 3 8
3 1 3 4 7
3 3 4 3 6
;
data want;
set have;
by id;
array _num_ _numeric_;
do over _num_;
_num_=dif(_num_);
end;
if not first.id;
run;
@Stu_SAS wrote:
Great idea for making the ID a character to make the solution very elegant. Nicely done!
If the ID is numeric you can just add a second SET statement with KEEP= dataset option to reset the ID back to the original value.
data want;
set have;
by id;
array _num_ _numeric_;
do over _num_;
_num_=dif(_num_);
end;
set have(keep=id);
if not first.id;
run;
Does every ID have exactly 2 records? Then the solution offered by @Tom is the way to go.
If you have only TWO obs for each ID, I think the Tom's code is a best try. If there were not, you could try this IML code,but you also have license of SAS/IML to run IML code.
P.S. My IML code also could be used to TWO obs for each ID.
data have;
input ID $ A B C D;
cards;
1 1 2 3 4
1 2 3 4 5
2 1 3 4 5
2 2 6 3 8
3 1 3 4 7
3 3 4 3 6
;
data first last;
set have;
by id;
if first.id then output first;
if last.id then output last;
run;
proc iml;
use first;
read all var {ID};
close;
use first(drop=id);
read all var _num_ into first[c=vname];
close;
use last(drop=id);
read all var _num_ into last;
want=last-first;
create want from ID want[c=('id'||vname)];
append from ID want;
close;
quit;
I would say that if you can have more than two obs for a given ID, just implement a minor modification to @Tom 's code:
data have;
input ID $ A B C D;
cards;
1 1 2 3 4
1 2 3 4 5
2 1 3 4 5
2 2 6 3 8
3 1 3 4 7
3 9 9 9 9 /*extra obs for ID 3 */
3 3 4 3 6
;
data want;
set have;
by id;
if first.id or last.id; /*No further processing of "middle" obs*/
array _n _numeric_;
do over _n;
_n=dif(_n);
end;
if not (first.id); /*Avoid instances of 1 obs per id*/
run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.