Hello!
I'll try to explain what I mean in the thread title as clear as possible.
I have a database in SAS 9.4 with +1500 variables and around 200,000 observations. I want to group those observations by ID (around 60,000 unique IDs), ordered by a date and get the last data available for each variable and ID. For example, if my data is:
ID VAR1 VAR2 DATE (already ordered)
1 2 3 Day1
1 1 4 Day2
1 . 2 Day3
2 2 3 Day1
2 3 . Day2
And I want to get:
ID VAR1 VAR2 DATE
1 1 2 Day3
2 3 3 Day2
I've been using the sentence BY and last.ID/last.Date since most of the data is available in the last observation, but I don't know if it is possible to get into such detail as (I made up the following code in order to try to explain what I need):
from VAR1 to VAR1500
i = 0
while (last-i).ID (VAR1) = . do
i+1
last.ID (VAR1) = (last-i).ID (VAR1)
I believe it is impossible especially bearing in mind that there will be 60.000 groups of IDs and 1.500 variables making the loop work 90 million times, but I'd appreciate any kind of help!
Thank you!
SAS actually has tools to make this incredibly simple. Assuming you have already completed this step:
proc sort data=have;
by id date;
run;
All you need to do from that point is:
data want;
update have (obs=0) have;
by id;
run;
You don't even need to know all the variable names.
By group processing, its well documented in the manual. For your issue, sort, then by group:
proc sort data=have; by id date; run; data want; set have; by id; if last.id; run;
You could also cheat somewhat by using the way proc sort works:
proc sort data=have out=want nodupkey; by id descending date; run;
Sort keeps the first record in any sequence when nodupkey is on, so descending means last is first record.
Hi RW9, thank you for your answer.
Nevertheless that will only get the whole last observation, not minding whether a value is missing or not.
The basic code for what you want to achieve with one variable is this:
data want;
set have (rename=(var1=_var1));
by id date;
retain var1;
if _var1 ne . then var1 = _var1;
if last.id then output;
run;
With macro processing, we can expand this code to deal with an arbitrary number of similarly named variables:
%macro make_want(numvars);
data want;
set have (rename=(
%do i = 1 %to &numvars;
var&i.=_var&i.
%end;
));
by id date;
retain
%do i = 1 %to &numvars;
var&i.
%end;
;
%do i = 1 %to &numvars;
if _var&i. ne . then var&i. = _var&i.;
%end;
if last.id then output;
run;
%mend;
%make_want(1500)
Ah, sorry, I didn't see the value carry forward. Maybe:
data have; input id var1 var2 date $; datalines; 1 2 3 Day1 1 1 4 Day2 1 . 2 Day3 2 2 3 Day1 2 3 . Day2 ; run; data want; set have; by id; retain lst_var1 lst_var2; if first.id then call missing(lst_var1,lst_var2); if var1 ne . then lst_var1=var1; if var2 ne . then lst_var2=var2; if var1=. then var1=lst_var1; if var2=. then var2=lst_var2; if last.id; run;
Could simplfy it a bit.
Assuming that all var* are numbers:
data want;
set have;
by id;
length _var1-_var2 i 8;
retain _var:;
drop _var: i;
array v var1-var2;
array r _var1-_var2;
do i = 1 to dim(v);
if not missing(v[i]) then r[i] = v[i];
end;
if last.id then do;
do i = 1 to dim(v);
v[i] = r[i];
end;
output;
end;
run;
SAS actually has tools to make this incredibly simple. Assuming you have already completed this step:
proc sort data=have;
by id date;
run;
All you need to do from that point is:
data want;
update have (obs=0) have;
by id;
run;
You don't even need to know all the variable names.
@Astounding: NOOOOO! This can't be that simple ... ok, have to find something else to increase job security 😉
Thank you all! Alas, each of the 1500 variables have a different name so most of the solutions would take a lot of space to develop, although they'd work. The one from Astounding makes it so simple.
This can do the job.
proc sort data=konsort out=konsort;by gruppe;
data last; set konsort;by gruppe;
if last.gruppe then output;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.