Solved: Creating a separated dataset for time changing?

ybz12003 · Posted 08-27-2024 10:25 AM

I would like to create two separate datasets based on whether the time was updated. As shown in the sample dataset below, the caseids with the same state were unchanged, but the sample IDs will change if the date is updated. I want to identify which casids have been changed based on the date. The result datasets are shown below.

data test;
Length State $2 CaseID $9 SampleID 7;
input State $ CaseID $ SampleID date: mmddyy10. ;
format date mmddyy10.;
infile datalines delimiter='#';
datalines;
GA#EC1R00002#7913247#01/04/2024#
GA#EC1R00002#7913289#01/06/2024#
GA#EC1R00007#8627568#01/04/2024#
GA#EC1R00007#8627568#01/04/2024#
IN#EC1N00002#7913259#03/04/2024#
IN#EC1N00002#7913389#03/09/2024#
IN#EC1N00009#8827568#01/04/2023#
IN#EC1N00009#8827588#01/24/2023#
KS#EC1K00010#7945647#03/04/2022#
KS#EC1K00010#7945647#03/04/2022#
KS#EC1K00027#8827568#01/04/2023#
KS#EC1K00027#8827588#01/24/2023#
;
run;

data Updated;
Length State $2 CaseID $9 SampleID 7 change $1;
input State $ CaseID $ SampleID date: mmddyy10. Change $;
format date mmddyy10.;
infile datalines delimiter='#';
datalines;
GA#EC1R00002#7913247#01/04/2024#Y#
GA#EC1R00002#7913289#01/06/2024#Y#
IN#EC1N00002#7913259#03/04/2024#Y#
IN#EC1N00002#7913389#03/09/2024#Y#
IN#EC1N00009#8827568#01/04/2023#Y#
IN#EC1N00009#8827588#01/24/2023#Y#
KS#EC1K00027#8827568#01/04/2023#Y#
KS#EC1K00027#8827588#01/24/2023#Y#
;
run;

data No_Updates;
Length State $2 CaseID $9 SampleID 7 change $1;
input State $ CaseID $ SampleID date: mmddyy10. change $1;
format date mmddyy10.;
infile datalines delimiter='#';
datalines;
GA#EC1R00007#8627568#01/04/2024#N#
GA#EC1R00007#8627568#01/04/2024#N#
KS#EC1K00010#7945647#03/04/2022#N#
KS#EC1K00010#7945647#03/04/2022#N#
;
run;

Tom · Posted 08-27-2024 12:03 PM

Not sure why you would want two datasets instead of one, but that is easy to do once you have derived the CHANGE variable.

First let's just add your expected value for the new CHANGE variable to your example input so it will be easier to see if we have derived CHANGE properly.

data test;
  length State $2 CaseID $9 SampleID Date 8 expect $1;
  format date yymmdd10.;
  informat date mmddyy.;
  input State--expect ;
datalines;
GA EC1R00002 7913247 01/04/2024 Y
GA EC1R00002 7913289 01/06/2024 Y
GA EC1R00007 8627568 01/04/2024 N
GA EC1R00007 8627568 01/04/2024 N
IN EC1N00002 7913259 03/04/2024 Y
IN EC1N00002 7913389 03/09/2024 Y
IN EC1N00009 8827568 01/04/2023 Y
IN EC1N00009 8827588 01/24/2023 Y
KS EC1K00010 7945647 03/04/2022 N
KS EC1K00010 7945647 03/04/2022 N
KS EC1K00027 8827568 01/04/2023 Y
KS EC1K00027 8827588 01/24/2023 Y
;

One easy way to tell if a group of observations has different values for a variable is the COUNT(DISTINCT ...) aggregate function in SQL.

proc sql;
create table want as
  select * 
       , case when (count(distinct sampleid)>1) then 'Y' else 'N' end as Change
  from test
  group by state,caseid 
;
quit;

Results

                              Sample
Obs    State     CaseID         ID            Date    expect    Change

  1     GA      EC1R00002    7913247    2024-01-04      Y         Y
  2     GA      EC1R00002    7913289    2024-01-06      Y         Y
  3     GA      EC1R00007    8627568    2024-01-04      N         N
  4     GA      EC1R00007    8627568    2024-01-04      N         N
  5     IN      EC1N00002    7913389    2024-03-09      Y         Y
  6     IN      EC1N00002    7913259    2024-03-04      Y         Y
  7     IN      EC1N00009    8827568    2023-01-04      Y         Y
  8     IN      EC1N00009    8827588    2023-01-24      Y         Y
  9     KS      EC1K00010    7945647    2022-03-04      N         N
 10     KS      EC1K00010    7945647    2022-03-04      N         N
 11     KS      EC1K00027    8827588    2023-01-24      Y         Y
 12     KS      EC1K00027    8827568    2023-01-04      Y         Y

Looks like that logic worked for your example.

If you want to create two separate datasets that is now simple:

data Updated No_Updates;
  set want;
  if change='Y' then output Updated;
  else output No_Updates;
run;

View solution in original post

ballardw · Posted 08-27-2024 10:38 AM

I am guessing that something in data set TEST is supposed to tell us that something was updated. What is it?

If I have to guess at a rule I am pretty sure I am going to miss something. Plus you may have cases in the actual data that have other rules that might not be guessed from a small example.

ybz12003 · Posted 08-27-2024 11:56 AM

If there is any patient's demography information is updated, the date will be updated. However, too many variables are involved in the demography information. To simplify things, I want to check whether the Sample IDs or the time were changed. If changing, I will do further investigation. If not, I will ignor.

Tom · Posted 08-27-2024 12:03 PM

Not sure why you would want two datasets instead of one, but that is easy to do once you have derived the CHANGE variable.

First let's just add your expected value for the new CHANGE variable to your example input so it will be easier to see if we have derived CHANGE properly.

data test;
  length State $2 CaseID $9 SampleID Date 8 expect $1;
  format date yymmdd10.;
  informat date mmddyy.;
  input State--expect ;
datalines;
GA EC1R00002 7913247 01/04/2024 Y
GA EC1R00002 7913289 01/06/2024 Y
GA EC1R00007 8627568 01/04/2024 N
GA EC1R00007 8627568 01/04/2024 N
IN EC1N00002 7913259 03/04/2024 Y
IN EC1N00002 7913389 03/09/2024 Y
IN EC1N00009 8827568 01/04/2023 Y
IN EC1N00009 8827588 01/24/2023 Y
KS EC1K00010 7945647 03/04/2022 N
KS EC1K00010 7945647 03/04/2022 N
KS EC1K00027 8827568 01/04/2023 Y
KS EC1K00027 8827588 01/24/2023 Y
;

One easy way to tell if a group of observations has different values for a variable is the COUNT(DISTINCT ...) aggregate function in SQL.

proc sql;
create table want as
  select * 
       , case when (count(distinct sampleid)>1) then 'Y' else 'N' end as Change
  from test
  group by state,caseid 
;
quit;

Results

                              Sample
Obs    State     CaseID         ID            Date    expect    Change

  1     GA      EC1R00002    7913247    2024-01-04      Y         Y
  2     GA      EC1R00002    7913289    2024-01-06      Y         Y
  3     GA      EC1R00007    8627568    2024-01-04      N         N
  4     GA      EC1R00007    8627568    2024-01-04      N         N
  5     IN      EC1N00002    7913389    2024-03-09      Y         Y
  6     IN      EC1N00002    7913259    2024-03-04      Y         Y
  7     IN      EC1N00009    8827568    2023-01-04      Y         Y
  8     IN      EC1N00009    8827588    2023-01-24      Y         Y
  9     KS      EC1K00010    7945647    2022-03-04      N         N
 10     KS      EC1K00010    7945647    2022-03-04      N         N
 11     KS      EC1K00027    8827588    2023-01-24      Y         Y
 12     KS      EC1K00027    8827568    2023-01-04      Y         Y

Looks like that logic worked for your example.

If you want to create two separate datasets that is now simple:

data Updated No_Updates;
  set want;
  if change='Y' then output Updated;
  else output No_Updates;
run;

Tom · Posted 08-27-2024 12:25 PM

For your example you could also check for changes in DATE. But perhaps your general case is to look for changes in either.

proc sql;
create table want as
  select * 
       , case when (count(distinct date)>1) then 'Y' 
              when (count(distinct sampleid)>1) then 'Y' 
              else 'N'
         end as Change
  from test
  group by state,caseid 
;
quit;

mkeintz · Posted 08-27-2024 12:06 PM

Read all obs for each id twice. First to determine max and min dates. Second to output to one of two datasets, based on whether min and max dates are the same.

data updated (drop=_:) notupdated (drop=_:) ;
  do until (last.id);
    set test ;
    by id notsorted;
    _mindate=min(date,_mindate);
    _maxdate=max(date,_maxdate);
  end;
  do until (last.id);
    set test ;
    by id notsorted;
    if _mindate^=_maxdate then output updated;
    else output notupdated;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ybz12003 · Posted 08-27-2024 12:23 PM

Thanks for all the expert suggestions. I will try to modify the code to run my actual dataset.

Creating a separated dataset for time changing?

Re: Creating a separated dataset for time changing?

Re: Creating a separated dataset for time changing?

Re: Creating a separated dataset for time changing?

Re: Creating a separated dataset for time changing?

Re: Creating a separated dataset for time changing?

Re: Creating a separated dataset for time changing?

Re: Creating a separated dataset for time changing?

SAS Innovate 2025: Call for Content

Classroom Training Available!