BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DaniLagetsson
Calcite | Level 5

Hello!

I'll try to explain what I mean in the thread title as clear as possible. 

 

I have a database in SAS 9.4 with +1500 variables and around 200,000 observations. I want to group those observations by ID (around 60,000 unique IDs), ordered by a date and get the last data available for each variable and ID. For example, if my data is:

 

ID VAR1 VAR2 DATE (already ordered)

1  2           3          Day1

1           4          Day2

1  .            2          Day3

2  2           3          Day1

3           .           Day2

 

And I want to get:

ID VAR1 VAR2 DATE

1  1           2          Day3

2  3           3          Day2

 

I've been using the sentence BY and last.ID/last.Date since most of the data is available in the last observation, but I don't know if it is possible to get into such detail as (I made up the following code in order to try to explain what I need):

 

from VAR1 to VAR1500
i = 0
while (last-i).ID (VAR1) = . do
i+1
last.ID (VAR1) = (last-i).ID (VAR1)

I believe it is impossible especially bearing in mind that there will be 60.000 groups of IDs and 1.500 variables making the loop work 90 million times, but I'd appreciate any kind of help!

 

Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

SAS actually has tools to make this incredibly simple.  Assuming you have already completed this step:

 

proc sort data=have;

by id date;

run;

 

All you need to do from that point is:

 

data want;

update have (obs=0) have;

by id;

run;

 

You don't even need to know all the variable names.

View solution in original post

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

By group processing, its well documented in the manual.  For your issue, sort, then by group:

proc sort data=have;
  by id date;
run;
data want;
  set have;
  by id;
  if last.id;
run;

You could also cheat somewhat by using the way proc sort works:

proc sort data=have out=want nodupkey;
  by id descending date;
run;

Sort keeps the first record in any sequence when nodupkey is on, so descending means last is first record.

DaniLagetsson
Calcite | Level 5

Hi RW9, thank you for your answer.

Nevertheless that will only get the whole last observation, not minding whether a value is missing or not. 

Kurt_Bremser
Super User

The basic code for what you want to achieve with one variable is this:

data want;
set have (rename=(var1=_var1));
by id date;
retain var1;
if _var1 ne . then var1 = _var1;
if last.id then output;
run;

With macro processing, we can expand this code to deal with an arbitrary number of similarly named variables:

%macro make_want(numvars);
data want;
set have (rename=(
%do i = 1 %to &numvars;
  var&i.=_var&i.
%end;
));
by id date;
retain
%do i = 1 %to &numvars;
  var&i.
%end;
;
%do i = 1 %to &numvars;
if _var&i. ne . then var&i. = _var&i.;
%end;
if last.id then output;
run;
%mend;
%make_want(1500)
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah, sorry, I didn't see the value carry forward.  Maybe:

data have;
  input id var1 var2 date $;
datalines;
1 2 3 Day1
1 1 4 Day2
1 . 2 Day3
2 2 3 Day1
2 3 . Day2
;
run;
data want;
  set have;
  by id;
  retain lst_var1 lst_var2;
  if first.id then call missing(lst_var1,lst_var2);
  if var1 ne . then lst_var1=var1;
  if var2 ne . then lst_var2=var2;
  if var1=. then var1=lst_var1;
  if var2=. then var2=lst_var2;
  if last.id;
run;

Could simplfy it a bit.

error_prone
Barite | Level 11

Assuming that all var* are  numbers:

 

data want;
   set have;
   by id;

   length _var1-_var2 i 8;
   retain _var:;
   drop _var: i;

   array v var1-var2;
   array r _var1-_var2;

   do i = 1 to dim(v);
      if not missing(v[i]) then r[i] = v[i];
   end;

   if last.id then do;
      do i = 1 to dim(v);
         v[i] = r[i];
      end;
      output;
   end;
run;
Astounding
PROC Star

SAS actually has tools to make this incredibly simple.  Assuming you have already completed this step:

 

proc sort data=have;

by id date;

run;

 

All you need to do from that point is:

 

data want;

update have (obs=0) have;

by id;

run;

 

You don't even need to know all the variable names.

error_prone
Barite | Level 11

@Astounding: NOOOOO! This can't be  that simple ... ok, have to find something else to increase job security 😉

DaniLagetsson
Calcite | Level 5

Thank you all! Alas, each of the 1500 variables have a different name so most of the solutions would take a lot of space to develop, although they'd work. The one from Astounding makes it so simple. 

EjnerBørsting
Calcite | Level 5

This can do the job.

 


proc sort data=konsort out=konsort;by gruppe;

 

data last; set konsort;by gruppe;
if last.gruppe then output;
run;

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2233 views
  • 8 likes
  • 6 in conversation