BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
websitelogins94
Calcite | Level 5


Hello All, 

 

Hope all is well. This is my first post since I've reviewed several others but unfortunately was not able to find the right solution to fit my circumstances. I have a data set that requires me to provide: (1) one row per patient id, (2) collapsed (corrected) values from subsequent rows into the earliest patient record. Please see below.

 

EXAMPLE

Datum                   id      DayFromTxBase  CorrectionBas Var1 Var2 Var3 Var4 Var5

1/27/2016 22:06   Z029  -2                                                 ABC  1      2      3       UVW

6/23/2017 15:36   Z029  -2                          1                     DEF   .      2      5       XYZ

9/27/2019 10:37   Z029  -2                          1                     GHI   3     .       4        XYZ 

 

 

DESIRED RESULT

Datum                 id      DayFromTxBase CorrectionBas Var1 Var2 Var3 Var4 Var5 

1/27/2016 22:06 Z029  -2                                                 GHI  3      2      4      XYZ

 

 

I've inherited some code from a colleague that runs an import macro which (1) checks the data set to see if there are corrections, and (2) if so, then make an original and a corrections data set, and (3) update the original values with corrected values. Please see below.

 

%macro import2b(site=want);
proc import out=want
datafile="\\.."
dbms=xlsx replace ;
sheet= " ";
getnames=yes;
run;

 

%symdel correct;

 

/* checks the dataset if there are any corrections*/


proc sql;
create table want as
select max(CORRECTIONBAS) as max_c,CORRECTIONBAS,*
from want ;
QUIT;

 

data want ;
%let correct=0;
set want;
call symput('correct',max_c);
run;


/* if there are corrections then make a original dataset and a corrections dataset*/

 

%if &correct=1 %then
%do;

DATA ORIGINAL CORRECTION;
SET want;

if CORRECTIONBAS=. then
output ORIGINAL;

if CORRECTIONBAS=1 then
output CORRECTION;
RUN;

proc sort data=original;
by ID DayFromTxBase;

run;


proc sort data=correction;
by ID DayFromTxBase;
run;


/* Update the original values (in original data)with corrected values (in corrected dataset)*/


DATA want;
UPDATE ORIGINAL CORRECTION;
BY ID DayFromTxBase;


RUN;

proc datasets;
delete ORIGINAL CORRECTION;
run;

%end;


%mend;

 

import2b (site =want );

 

The long story short here is that this current program works for some patient ids and does not work (returns more than record) for other patient ids. Also, we've also advised to remove the DayFromTxBase column altogether as this data has been deemed to be unreliable, and thus we will likely devise a solution that sorts the tables by Datum. We've been wracking our brains on trying to figure this out, so hopefully there's someone out there who can help resolve this issue. Thank you.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

For each ID, you apparent want the earliest DATE, TIME, but the latest non-missing values for all other variables.

 

This is relatively easy, using the UPDATE statement (which gets the latest non-missing values for all variable), and then a SET with POINT= option, as in:

data have;
  input DATE mmddyy10. time time5.0   id :$4.    
                DayFromTxBase
                CorrectionBas  Var1 :$3. Var2 Var3 Var4 Var5 :$3.;
  format date date9. time hhmm5.0;
datalines;
1/27/2016 22:06   Z029  -2                          .                     ABC  1      2      3       UVW
6/23/2017 15:36   Z029  -2                          1                     DEF   .      2      5       XYZ
9/27/2019 10:37   Z029  -2                          1                     GHI   3     .       4        XYZ 
run;

data want;
  update have (obs=0) have;
  by ID;
  retain ptr;
  if first.id then ptr=_n_;
  if last.id then set have (Keep=date time) point=ptr;
run;

 

This program assumes the data are sorted by ID/DATE/TIME.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
mkeintz
PROC Star

For each ID, you apparent want the earliest DATE, TIME, but the latest non-missing values for all other variables.

 

This is relatively easy, using the UPDATE statement (which gets the latest non-missing values for all variable), and then a SET with POINT= option, as in:

data have;
  input DATE mmddyy10. time time5.0   id :$4.    
                DayFromTxBase
                CorrectionBas  Var1 :$3. Var2 Var3 Var4 Var5 :$3.;
  format date date9. time hhmm5.0;
datalines;
1/27/2016 22:06   Z029  -2                          .                     ABC  1      2      3       UVW
6/23/2017 15:36   Z029  -2                          1                     DEF   .      2      5       XYZ
9/27/2019 10:37   Z029  -2                          1                     GHI   3     .       4        XYZ 
run;

data want;
  update have (obs=0) have;
  by ID;
  retain ptr;
  if first.id then ptr=_n_;
  if last.id then set have (Keep=date time) point=ptr;
run;

 

This program assumes the data are sorted by ID/DATE/TIME.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
websitelogins94
Calcite | Level 5

We tested out this solution, and it works! Many thanks for the quick response!

ballardw
Super User

Without actual example data of records that create "duplicate" id output the most likely cause is the step that creates the ORIGINAL data set. If your source data file, the one imported apparently, has more than one record for any given Id that is not marked as correction then you will get one for each such record in the output.

 

Since these are "patient id" then likely you have case of lapses or starts of difference sequences of services.

Depending on what your actual goal need is you might try something like this at the top of that macro:

 

proc sql;
   create table work.temp as
   select max(CORRECTIONBAS) as max_c,CORRECTIONBAS,*
   from want 
   order by id,datum;
QUIT;

DATA ORIGINAL CORRECTION;
   SET work.temp (drop = dayfromtxbase);
   by id;
   if first.id  then output ORIGINAL;
   if CORRECTIONBAS=1 then
   output CORRECTION;
RUN;

This assumes that the actual desired single "original" value is 1) the earliest dated and 2) that one is itself not marked as correction.

 

Other wise you may need to provide a more extensive example of some records with duplicates, the actual desired output for those and any rules regarding how to pick the "correct" original.

 

You can drop the unwanted variable almost anywhere. I would likely do it in the data split to Original and Correction as above.

 

CAUTION: I am very leery of any step that starts with Proc Import, especially if the source is a spreadsheet, as things like variable types and lengths frequently change when reading different files.

websitelogins94
Calcite | Level 5

You are correct in your overall observation, and your cautionary statement rings true given beginning a data step with a proc import has been particularly troublesome. I have not yet had the opportunity to test your code, however, given that this a large project that warrants further exploration into alternative solutions, I will do so and provide further feedback. Thanks. 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2010 views
  • 1 like
  • 3 in conversation