BookmarkSubscribeRSS Feed
tan-wongv
Obsidian | Level 7

I tried to replace missing character values (alcohol_user) with the  closest non-missing values (either before or after) within the same DMRN. 

 

This is how my data set looks like:

DMRN          DX_DATE          ALCOHOL_USER

16                    28FEB2017 

16                    13JUN2017                 No

16                    26SEP2017                 No

24                   12MAY2014                No

24                   28MAR2016

24                   25AUG2017

25                   02SEP2021

25                   29NOV2021               Yes

 

The outcome that I would like it to be:

DMRN          DX_DATE          ALCOHOL_USER

16                    28FEB2017                No

16                    13JUN2017                 No

16                    26SEP2017                 No

24                   12MAY2014                No

24                   28MAR2016               No

24                   25AUG2017                No

25                   02SEP2021                Yes

25                   29NOV2021               Yes

 

I tried this code but it didn't work out. Any help would be very appreciated. Thank you!

Screenshot 2024-03-13 001420.png

4 REPLIES 4
mkeintz
PROC Star

In the absence of sample data in the form of a working DATA step, this should work.

 

data vneed / view=vneed;
  set have (rename=(alcohol_user=_alc_user));
  by dmrn;
  where _alc_user^=' ';
  dx_date=ifn(first.dmrn=1,'01jan1900'd,mean(dx_date,lag(dx_date)),'01jan1900'd);
run;

data want (drop=_:);
  set have (in=inh)  vneed (in=inv keep=dmrn dx_date);
  by dmrn dx_date;
  if inv then set vneed (keep=_alc_user);
  if inh;
  if alcohol_user=' ' then alcohol_user=_alc_user;
run;

How does this work?   Data set view VNEED has only non-missing alcohol_user data (renamed to _alc_user).  But its dx_dates are changed.   dx_date is reset to halfway between the current valid obs and the prior valid obs ("valid" means non-missing _alc_user for the same DMRN).   And in the case of the first non-missing _ALC_USER for a given DMRN, the DX_DATE is set back to 01jan1900.

 

As a result, when VNEED is interleaved with HAVE (the SET HAVE ... VNEED; BY DMRN DX_DATE statements), one can use the IF INV condition to read in (and retain) the non-missing _ALC_USER variable, until the next IF INV is satisfied.  And since the DX_DATEs in vneed are at halfway points between observed DX_DATEs, missing ALCOHOL_USER will always get the closest DX_DATE with non-missing values.

 

Important note:  the first SET statement in DATA WANT, must NOT read in _ALC_USER.  Therefore _ALC_USER is not automatically replaced, until the IF INV condition is satisfied.

 

Edited note: This assume data are sorted by DMRN/DX_DATE.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

@mkeintz Appears to be a "duplicate" question with an accepted solution here

mkeintz
PROC Star

@Patrick wrote:

@mkeintz Appears to be a "duplicate" question with an accepted solution here


True.  

 

The code I offered above takes advantage of the fact that the data are sorted, which AFAICT, SQL does not.  There is likely to be a significant performance benefit.

 

It uses the same technique as the "carry the future back" section in a presentation I gave at PharmaSUG 2019 (History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies), except it mixes a subset of a series with the whole series.

 

One of the reasons I have come to promote this technique is that it is fairly simple to expand mixing of two series to three or more series, which I think can become rather messy in SQL.  Applying it to finding the "closest" value only explicitly occurred to me when contemplating this OP's post.

 

Plug:  I'll be presenting content from the PharmaSUG paper at next month's Boston Area SAS User's group.  And I hope to include an example of "finding the closest" as well.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User

Load your data into a date-indexed array:

%let start = %sysfunc(inputn(2000-01-01,yymmdd10.));
%let end = %sysfunc(inputn(2024-12-31,yymmdd10.));

data want;
array vals {&start.:&end.} $3 _temporary_;
do i = &start. to &end.;
  vals{i} = "";
end;
do until (last.dmrn);
  set have;
  by dmrn;
  vals{dx_date} = alcohol_user;
end;
do until (last.dmrn);
  set have;
  by dmrn;
  if alcohol_user = ""
  then do;
  dist = &end. - & start.;
  do i = &start. to &end.;
    if vals{i} ne "" and abs(i - dx_date) < dist
    then do;
      dist = abs(i - dx_date);
      alcohol_user = vals{i};
    end;
  end;
  output;
end;
drop dist i;
run;

Untested, posted from my tablet.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 414 views
  • 0 likes
  • 4 in conversation