BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ybz12003
Rhodochrosite | Level 12

I would like to create two separate datasets based on whether the time was updated. As shown in the sample dataset below, the caseids with the same state were unchanged, but the sample IDs will change if the date is updated. I want to identify which casids have been changed based on the date. The result datasets are shown below.

 

data test;
Length State $2 CaseID $9 SampleID 7;
input State $ CaseID $ SampleID date: mmddyy10. ;
format date mmddyy10.;
infile datalines delimiter='#';
datalines;
GA#EC1R00002#7913247#01/04/2024#
GA#EC1R00002#7913289#01/06/2024#
GA#EC1R00007#8627568#01/04/2024#
GA#EC1R00007#8627568#01/04/2024#
IN#EC1N00002#7913259#03/04/2024#
IN#EC1N00002#7913389#03/09/2024#
IN#EC1N00009#8827568#01/04/2023#
IN#EC1N00009#8827588#01/24/2023#
KS#EC1K00010#7945647#03/04/2022#
KS#EC1K00010#7945647#03/04/2022#
KS#EC1K00027#8827568#01/04/2023#
KS#EC1K00027#8827588#01/24/2023#
;
run;

data Updated;
Length State $2 CaseID $9 SampleID 7 change $1;
input State $ CaseID $ SampleID date: mmddyy10. Change $;
format date mmddyy10.;
infile datalines delimiter='#';
datalines;
GA#EC1R00002#7913247#01/04/2024#Y#
GA#EC1R00002#7913289#01/06/2024#Y#
IN#EC1N00002#7913259#03/04/2024#Y#
IN#EC1N00002#7913389#03/09/2024#Y#
IN#EC1N00009#8827568#01/04/2023#Y#
IN#EC1N00009#8827588#01/24/2023#Y#
KS#EC1K00027#8827568#01/04/2023#Y#
KS#EC1K00027#8827588#01/24/2023#Y#
;
run;

data No_Updates;
Length State $2 CaseID $9 SampleID 7 change $1;
input State $ CaseID $ SampleID date: mmddyy10. change $1;
format date mmddyy10.;
infile datalines delimiter='#';
datalines;
GA#EC1R00007#8627568#01/04/2024#N#
GA#EC1R00007#8627568#01/04/2024#N#
KS#EC1K00010#7945647#03/04/2022#N#
KS#EC1K00010#7945647#03/04/2022#N#
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Not sure why you would want two datasets instead of one, but that is easy to do once you have derived the CHANGE variable. 

 

First let's just add your expected value for the new CHANGE variable to your example input so it will be easier to see if we have derived CHANGE properly.

data test;
  length State $2 CaseID $9 SampleID Date 8 expect $1;
  format date yymmdd10.;
  informat date mmddyy.;
  input State--expect ;
datalines;
GA EC1R00002 7913247 01/04/2024 Y
GA EC1R00002 7913289 01/06/2024 Y
GA EC1R00007 8627568 01/04/2024 N
GA EC1R00007 8627568 01/04/2024 N
IN EC1N00002 7913259 03/04/2024 Y
IN EC1N00002 7913389 03/09/2024 Y
IN EC1N00009 8827568 01/04/2023 Y
IN EC1N00009 8827588 01/24/2023 Y
KS EC1K00010 7945647 03/04/2022 N
KS EC1K00010 7945647 03/04/2022 N
KS EC1K00027 8827568 01/04/2023 Y
KS EC1K00027 8827588 01/24/2023 Y
;

One easy way to tell if a group of observations has different values for a variable is the COUNT(DISTINCT ...) aggregate function in SQL.

proc sql;
create table want as
  select * 
       , case when (count(distinct sampleid)>1) then 'Y' else 'N' end as Change
  from test
  group by state,caseid 
;
quit;

Results

                              Sample
Obs    State     CaseID         ID            Date    expect    Change

  1     GA      EC1R00002    7913247    2024-01-04      Y         Y
  2     GA      EC1R00002    7913289    2024-01-06      Y         Y
  3     GA      EC1R00007    8627568    2024-01-04      N         N
  4     GA      EC1R00007    8627568    2024-01-04      N         N
  5     IN      EC1N00002    7913389    2024-03-09      Y         Y
  6     IN      EC1N00002    7913259    2024-03-04      Y         Y
  7     IN      EC1N00009    8827568    2023-01-04      Y         Y
  8     IN      EC1N00009    8827588    2023-01-24      Y         Y
  9     KS      EC1K00010    7945647    2022-03-04      N         N
 10     KS      EC1K00010    7945647    2022-03-04      N         N
 11     KS      EC1K00027    8827588    2023-01-24      Y         Y
 12     KS      EC1K00027    8827568    2023-01-04      Y         Y

Looks like that logic worked for your example.

If you want to create two separate datasets that is now simple:

data Updated No_Updates;
  set want;
  if change='Y' then output Updated;
  else output No_Updates;
run;

View solution in original post

6 REPLIES 6
ballardw
Super User

I am guessing that something in data set TEST is supposed to tell us that something was updated. What is it?

If I have to guess at a rule I am pretty sure I am going to miss something. Plus you may have cases in the actual data that have other rules that might not be guessed from a small example.

ybz12003
Rhodochrosite | Level 12
If there is any patient's demography information is updated, the date will be updated. However, too many variables are involved in the demography information. To simplify things, I want to check whether the Sample IDs or the time were changed. If changing, I will do further investigation. If not, I will ignor.
Tom
Super User Tom
Super User

Not sure why you would want two datasets instead of one, but that is easy to do once you have derived the CHANGE variable. 

 

First let's just add your expected value for the new CHANGE variable to your example input so it will be easier to see if we have derived CHANGE properly.

data test;
  length State $2 CaseID $9 SampleID Date 8 expect $1;
  format date yymmdd10.;
  informat date mmddyy.;
  input State--expect ;
datalines;
GA EC1R00002 7913247 01/04/2024 Y
GA EC1R00002 7913289 01/06/2024 Y
GA EC1R00007 8627568 01/04/2024 N
GA EC1R00007 8627568 01/04/2024 N
IN EC1N00002 7913259 03/04/2024 Y
IN EC1N00002 7913389 03/09/2024 Y
IN EC1N00009 8827568 01/04/2023 Y
IN EC1N00009 8827588 01/24/2023 Y
KS EC1K00010 7945647 03/04/2022 N
KS EC1K00010 7945647 03/04/2022 N
KS EC1K00027 8827568 01/04/2023 Y
KS EC1K00027 8827588 01/24/2023 Y
;

One easy way to tell if a group of observations has different values for a variable is the COUNT(DISTINCT ...) aggregate function in SQL.

proc sql;
create table want as
  select * 
       , case when (count(distinct sampleid)>1) then 'Y' else 'N' end as Change
  from test
  group by state,caseid 
;
quit;

Results

                              Sample
Obs    State     CaseID         ID            Date    expect    Change

  1     GA      EC1R00002    7913247    2024-01-04      Y         Y
  2     GA      EC1R00002    7913289    2024-01-06      Y         Y
  3     GA      EC1R00007    8627568    2024-01-04      N         N
  4     GA      EC1R00007    8627568    2024-01-04      N         N
  5     IN      EC1N00002    7913389    2024-03-09      Y         Y
  6     IN      EC1N00002    7913259    2024-03-04      Y         Y
  7     IN      EC1N00009    8827568    2023-01-04      Y         Y
  8     IN      EC1N00009    8827588    2023-01-24      Y         Y
  9     KS      EC1K00010    7945647    2022-03-04      N         N
 10     KS      EC1K00010    7945647    2022-03-04      N         N
 11     KS      EC1K00027    8827588    2023-01-24      Y         Y
 12     KS      EC1K00027    8827568    2023-01-04      Y         Y

Looks like that logic worked for your example.

If you want to create two separate datasets that is now simple:

data Updated No_Updates;
  set want;
  if change='Y' then output Updated;
  else output No_Updates;
run;
Tom
Super User Tom
Super User

For your example you could also check for changes in DATE.  But perhaps your general case is to look for changes in either.

proc sql;
create table want as
  select * 
       , case when (count(distinct date)>1) then 'Y' 
              when (count(distinct sampleid)>1) then 'Y' 
              else 'N'
         end as Change
  from test
  group by state,caseid 
;
quit;
mkeintz
PROC Star

Read all obs for each id twice.  First to determine max and min dates.  Second to output to one of two datasets, based on whether min and max dates are the same.

data updated (drop=_:) notupdated (drop=_:) ;
  do until (last.id);
    set test ;
    by id notsorted;
    _mindate=min(date,_mindate);
    _maxdate=max(date,_maxdate);
  end;
  do until (last.id);
    set test ;
    by id notsorted;
    if _mindate^=_maxdate then output updated;
    else output notupdated;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ybz12003
Rhodochrosite | Level 12

Thanks for all the expert suggestions.  I will try to modify the code to run my actual dataset.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 473 views
  • 3 likes
  • 4 in conversation