BookmarkSubscribeRSS Feed
PhilC
Rhodochrosite | Level 12

Is there a way to use merge, modify, update or set to do the following

data Left (sortedby=ID date);
  infile datalines;
  input ID $4.  RDATE  MMDDYY12. date MMDDYY10.;
    format RDATE date mmddyy10.;
datalines;
A998 06/01/2018 06/01/2018  
B027 05/21/2018 05/21/2018  
C001 03/31/2018 03/31/2018  
D221 11/01/2017 11/01/2017  
E379 10/07/2017 10/07/2017  
F693 06/19/2018 06/19/2018  
;
run;

data Right (sortedby=ID date);
  infile datalines;
  input ID $4.	INTVDT MMDDYY12.	STA  $1.	date MMDDYY11.;
    format INTVDT date mmddyy10.;
datalines;
A998 09/30/2012 Z 09/30/2012  
A998 03/11/2013 Y 03/11/2013  
A998 07/31/2017 X 07/31/2017  
A998 02/21/2018 W 02/21/2018  
B027 05/17/2011 V 05/17/2011  
E379 10/04/2017 U 10/04/2017  
G032 03/01/2017 T 03/01/2017   
;
run;

data whole;
  set left Right;
  by ID date;
run;

This is the desired dataset.  Where the bolded data are lagged, retained or it is as if the SET statement forgot to clear the data vector each record.

ID   RDATE    date     INTVDT   STA
A998 . 09/30/12 09/30/12 Z A998 . 03/11/13 03/11/13 Y A998 . 07/31/17 07/31/17 X A998 . 02/21/18 02/21/18 W A998 06/01/18 06/01/18 02/21/18 W
B027 06/01/18 05/17/11 05/17/11 V B027 05/21/18 05/21/18 05/17/11 V C001 03/31/18 03/31/18 05/17/11 V D221 11/01/17 11/01/17 05/17/11 V E379 11/01/17 10/04/17 10/04/17 U E379 10/07/17 10/07/17 10/04/17 U
F693 06/19/18 06/19/18 10/04/17 U
G032 06/19/18 03/01/17 03/01/17 T

 Instead of this, as coded:

ID   RDATE    date     INTVDT   STA
A998 . 09/30/12 09/30/12 Z A998 . 03/11/13 03/11/13 Y A998 . 07/31/17 07/31/17 X A998 . 02/21/18 02/21/18 W A998 06/01/18 06/01/18 . B027 . 05/17/11 05/17/11 V B027 05/21/18 05/21/18 . C001 03/31/18 03/31/18 . D221 11/01/17 11/01/17 . E379 . 10/04/17 10/04/17 U E379 10/07/17 10/07/17 . F693 06/19/18 06/19/18 . G032 . 03/01/17 03/01/17 T
11 REPLIES 11
Astounding
PROC Star

This looks about right:

 

data whole;
   set left right;
   by id date;
   prior_rdate = lag(rdate);
   prior_intvdt = lag(intvdt);
   prior_sta = lag(sta);
   if rdate = . then rdate = prior_rdate;
   if intvdt = . then intvdt = prior_intvdt;
   if sta = ' ' then sta = prior_sta;
   drop prior_: ;
run;

Give it a test and see if it does what you need.

PhilC
Rhodochrosite | Level 12

Pro: it works.

Kuddos: drop prior_:

 

Con: To scale this up, we'd need a lag() function, if statement and prior_ variable for each column not in the by statement.

 

I am looking for something else.

Astounding
PROC Star

There are several ways to go.  You could use macro language to generate as many "prior" variables as needed.  But there are issues that require knowing something about your data.  These would be helpful questions:

 

  1. Do you know how many numeric variables you have?
  2. Do you know how many character variables you have?
  3. Will the BY variables always be the same?
  4. For character variables, do you know the maximum length across all character variables?
  5. What is the maximum length for variable names (so that we know there would be room to prefix them with "prior_")?

The answers can guide the proper approach.  Approximate answers are probably OK.

 

PhilC
Rhodochrosite | Level 12

This is a common thing that I want to do.  Each time I encounter it, I feel like I have to re-engineer the solution.  Last time I asked this question I was educated about set statement interleaving, this blew my mind since I never encountered this even after seven years of SAS experience.   I'm wondering if there is something else to learn...  but if not, then maybe this is a simple enough thing to ask for a new option in the set statement.

 

Edit: ok, that was you that showed me the set statement interleave. ha ha

PhilC
Rhodochrosite | Level 12

I always do some of my best thinking after I've asked smart people for help.  I came up with this after asking.  Adding indexes to the two input files one can do this.

 

data whole2;
  set Left  (keep=ID date)
      Right (keep=ID date);
  by ID date;
  set Left  key=IDdate;
  set Right key=IDdate;
run;

Con: extra SET KEY statement for each file.

Con: possible file errors

Annoyances: Keep statements and indexes

 

Can someone one-up me here?

Astounding
PROC Star

Does this solution actually work when you cross the boundary from one ID to the next?

PhilC
Rhodochrosite | Level 12

A Do Whitlock loop and a call missing function would handle that.

Astounding
PROC Star

Here's something to consider.  It assumes there are no more than 999 numeric variables, no more than 999 character variables, and a maximum length for any character variable of $1000.

 

data want;
if 5=4 then do;
   set left (drop=id year) right (drop=id year);
   array nums {*} _numeric_;
   array chars {*} _character_;
   array priorn {999} _temporary_;
   array prilorc {999} $1000 _temporary_;
end;
set left right;
by id date;
do _n_=1 to dim(nums);
   priorn{_n_} = lag(nums{_n_});
   if nums{_n_} = . then nums{_n_} = priorn{_n_};
end;
do _n_=1 to dim(chars);
   priorc{_n_} = lag(chars{_n_});
   if chars{_n_} = " " then chars{_n_} = priorc{_n_};
end;
run;

It's long, it's somewhat clumsy, and it's untested.  Perhaps it's useful nevertheless.

PhilC
Rhodochrosite | Level 12

That's so great, to me, -- my coworkers, OTOH, would hate me if I coded like that.Smiley LOL

 

Honestly,  I want this:

data whole;
  set left Right /noPDVreset;
  by ID date;
run;

Thanks for your contribution,  also thanks for teaching me about SET statement, way back, too.

FreelanceReinh
Jade | Level 19

Hi @PhilC,

 

Maybe the "UPDATE trick" (which I've learned in this forum) can help a bit:

data tmp / view=tmp;
set left Right;
by ID date;
_i=.;
run;

data whole(drop=_i);
update tmp(obs=0) tmp;
by _i;
output;
run;
PhilC
Rhodochrosite | Level 12

Pro: it works

Pro: This works if you want to respect the ID boundaries.

data tmp0 tmp1 ;
  do until (last.ID); 
    set left Right; by ID date; output tmp1;
  end;
  output tmp0;
data whole3;
  update tmp0 tmp1; by ID; output; 
run;

Annoyances: opening a file twice, a little too many lines...

 

Is this just the "Update trick" it needs another name.  Thanks,

 

edit: The "KMW-update trick"?  See SAS-Programming/How-can-I-combine-multiple-rows-to-a-single-row@kmw 

 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 842 views
  • 3 likes
  • 3 in conversation