DATA Step, Macro, Functions and more

Merge

Reply
Occasional Contributor
Posts: 8

Merge

Dear All:
These are my two data sets

ID    Time          Value
A     9:30:01       16
A     9:30:02       17
A     9:30:04       18
A     9:30:06       18
B     9:30:01        5
B     9:30:03        6
B     9:30:06        7


I have a time table

Time
9:30:01
9:30:02
9:30:03
9:30:04
9:30:05
9:30:06

and so forth

What I want is

ID    Time          Value
A     9:30:01       16
A     9:30:02       17
A     9:30:03       17
A     9:30:04       18
A     9:30:05       18
A     9:30:06       18
B     9:30:01        5
B     9:30:02        5
B     9:30:03        6
B     9:30:04        6
B     9:30:05        6
B     9:30:06        7

Can you please help me. 

 

 

Contributor
Posts: 44

Re: Merge

Just been curious. In real world what are you trying to achieve with this ?
Valued Guide
Posts: 797

Re: Merge

I assume

  1. Your time table is intended to represent every exact second between 09:30:01 and 09:30:06. It has no holes
  2. You want last observation carried forwad for missing time points
  3. Every ID starts with a non-missing record at time=09:30:01

 

data one;
  input ID :$1.   Time :time8.0         Value;
  format time time8.0;
datalines;
A     9:30:01       16
A     9:30:02       17
A     9:30:04       18
A     9:30:06       18
B     9:30:01        5
B     9:30:03        6
B     9:30:06        7
run;
data time_table;
  input time time8.0;
  format time time8.0;
datalines;
9:30:01
9:30:02
9:30:03
9:30:04
9:30:05
9:30:06
run;

data want (keep=id time value);
  if _n_=1 then set time_table (rename=(time=tim_end)) nobs=ntimes point=ntimes;

  set one;
  by id;
  merge one one (firstobs=2 keep=time rename=(time=nxt_tim));

  if last.id then nxt_tim=tim_end+1;
  do time=time to nxt_tim-1; output; end;
run;
Occasional Contributor
Posts: 8

Re: Merge

Thank you so much. 

Super User
Posts: 9,671

Re: Merge

data one;
  input ID :$1.   Time :time8.0         Value;
  format time time8.0;
datalines;
A     9:30:01       16
A     9:30:02       17
A     9:30:04       18
A     9:30:06       18
B     9:30:01        5
B     9:30:03        6
B     9:30:06        7
run;
data time_table;
  input time time8.0;
  format time time8.0;
datalines;
9:30:01
9:30:02
9:30:03
9:30:04
9:30:05
9:30:06
run;

proc sql;
create table temp as
select a.*,b.value
from (
select * from
(select distinct id from one),(select distinct time from time_table) 
) as a
left join one as b 
on a.id=b.id and a.time=b.time;
quit;

data want;
 set temp;
 by id;
 retain v;
 if first.id then v=.;
 if not missing(value) then v=value;
 drop value;
run;
Regular Contributor
Posts: 194

Re: Merge

Note that if you want to fill each second between consecutive times, you don't need

a time table. Using @mkeintz input dataset "one", this should give the wanted result :

 

/* We sort by descending time because there is no "lead" operator in SAS */
proc sort data=one out=eno; by ID descending time; run; data eno_filled; set eno(rename=(time=oldTime)); by ID; format time time8.0; drop oldTime; time=lag(oldTime)-1; if first.ID then do; time=oldTime; output; end; else do; do while (time ge oldTime); output; time=time-1; end; end; run; proc sort data=eno_filled out=want; by ID time; run;
Valued Guide
Posts: 797

Re: Merge

[ Edited ]

True, SAS provides no lead function.  After all it can't be a simple "lookahead" like in Excel.  But then  lag is not a "lookback", it is a queue manager, which is a big difference when dealing with by groups, or interleaved series.

 

But the absence of a lead function is usually not enough of a reason to sort by descending time, use the lag function, and then re-sort to ascending order - a big penaly for large data sets.  It's usually far more efficient to take advantage of the "firstobs=" parameter for data set names as objects of the SET or MERGE statements.  That's why I submitted the code with merge.

 

Implementing a lead function could not be analogous to the lag function.  It' would not be too burdomsome for "normal" leads (constant lead length and no filtering of upcoming observations), but how would you be able to deal with the following?

 

  data want;

    set sashelp.class;

    if sex='F' then last_age=lag(age);

    else if sex='M' then last_age=lag(age);

  run;

 

Because lag is a queue manager, there are two queues being maintained above, one for each sex.  Note that there is no extra disk input processing required.  Now imagine there were such a thing as a lead function, i.e.

     next_age=lead(age)

You still would not have a way to maintain two separate series of leads, because the lead function would provide sas with no guidance on how to filter the upcoming observations. 

 

<promotion>

If you are coming to SGF next month, these are some of the issues I'll be presenting (Leads and Lags: Static and Dynamic Queues in the SAS® DATA STEP) Tuesday, April 4 5pm-6pm.

</promotion>

 

 

 

 

 

Regular Contributor
Posts: 194

Re: Merge

Thanks @mkeintz for your interesting comments.

My goal was to show that the time table was not necessary because it merely increments a counter.

Using a merge as you suggest, the program becomes :

 

data want;
	set one;
	by ID;
	merge one one (firstobs=2 keep=time rename=(time=nextTime));
	drop nextTime;

	if last.ID then do;
		output;
	end;
	else do;
		do while (time lt nextTime);			
		    output;
		    time=time+1;
		end;
	end;
run;
Ask a Question
Discussion stats
  • 7 replies
  • 299 views
  • 3 likes
  • 5 in conversation