BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
M_96
Calcite | Level 5

Hello!

In my task, I have to check if a string present in a dataset with a date (f.e april 30, 2025) is present in the dataset with the previous date (april 29, 2025). This is a dynamic task, so I think I need to use the macro sas code. 

 

I created one dataset for each date of April (in total, 30 datasets). Then I have to check if a string is matched on the dataset of the previous day and so on for each day of April (maybe something like (df april 30, 2025) left join (df april 29, 2025) where the string is null in (df april 29, 2025)).

 

Do you have any idea/advice about how to do this task?

 

Thankss

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Here could give you a start.

And you could use CALL EXECUTE to go through macro %check()  with all the date of April . 

data data_1apr2025;
input  data date9. tkt $;
format data date9. ;
datalines;
01APR2025 3333123
01APR2025 43333111
;
RUN;
 
data data_2apr2025;
input  data date9. tkt $;
format data date9. ;
datalines;
02APR2025 99999999
02APR2025 43333111
02APR2025 11111111
;
RUN;


%macro check(date=);
proc sql;
create table want_&date. as
select * from data_&date. 
 where tkt not in (select distinct tkt
  from data_%sysfunc(prxchange(s/^0//,1,%sysfunc(intnx(day,"&date."d,-1),date9.)) ));
quit;
%mend;

%check(date=2apr2025)

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

Not sure if macros are needed. If you have a number of data sets (do you mean SAS data sets?) and they are all in one library (folder) with some sort of common naming scheme, you should be able to combine them all into one large SAS data set and then just do a loop in a DATA step.

 

Please describe the location(s) of the data sets, the naming scheme, and what they contain in more detail.

--
Paige Miller
M_96
Calcite | Level 5

I have a SAS dataset like this:

 

data data_1apr2025;
input  data date9. tkt $;
format data date9. ;
datalines;
01APR2025 3333123
01APR2025 43333111
;
RUN;
 
and then I have:
data data_2apr2025;
input  data date9. tkt $;
format data date9. ;
datalines;
02APR2025 99999999
02APR2025 43333111
02APR2025 11111111
;
RUN;

 

and so on for all the 30 days of April....

I ned to check if in the dataset data_2apr2025 there are NOT MATCHING TKT with the dataset data_1apr2025. 

 

this is what I did with these 2 datasets: 

proc sql;
create table not_matching as select distinct a.tkt from data_2apr2025 a left join data_1apr2025b on a.tkt=b.tktwhere b.tktis null;
quit;

The output is: 

02APR2025 99999999
02APR2025 11111111

 

So this query works if you have 2 static dataset; in my task, I need to loop the query for each day of April and comparing it with the previous day.

Any idea?

 

Thanks

PaigeMiller
Diamond | Level 26

Okay, it helps to see what you are working with. I think a macro is required here. You make things harder by using data set names that don't sort alphabetically, a data set named _20250401 for April 1 would at least sort properly, across months and within months, but maybe that doesn't even matter to produce the SAS code for this problem, but it might matter when you go ahead and try to use these datasets somehow.

 

%macro do_this;
    %do date=%sysfunc(mdy(4,2,2025)) %to %sysfunc(mdy(4,30,2025));     
        %let previous_day=%eval(&date-1);
        %let date1=%sysfunc(putn(&date,date9.));
        %let previous_day1=%sysfunc(putn(&previous_day,date9.));
        /* Remove leading zero from dates */
        %if %substr(&date1,1,1)=0 %then %let date1=%substr(&date1,2);
        %if %substr(&previous_day1,1,1)=0 %then %let previous_day=%substr(&previous_day1,2);
        proc sql;
            create table not_matching_&date1 as select distinct a.tkt 
            from data_&date1 a
                  left join data_&previous_day1 b on a.tkt=b.tkt where b.tktis null;
        quit;
    %end;
%mend;
%do_this

 

--
Paige Miller
Ksharp
Super User

Here could give you a start.

And you could use CALL EXECUTE to go through macro %check()  with all the date of April . 

data data_1apr2025;
input  data date9. tkt $;
format data date9. ;
datalines;
01APR2025 3333123
01APR2025 43333111
;
RUN;
 
data data_2apr2025;
input  data date9. tkt $;
format data date9. ;
datalines;
02APR2025 99999999
02APR2025 43333111
02APR2025 11111111
;
RUN;


%macro check(date=);
proc sql;
create table want_&date. as
select * from data_&date. 
 where tkt not in (select distinct tkt
  from data_%sysfunc(prxchange(s/^0//,1,%sysfunc(intnx(day,"&date."d,-1),date9.)) ));
quit;
%mend;

%check(date=2apr2025)
Tom
Super User Tom
Super User

If you want to compare the values of a variable (whether it is character or numeric) between two datasets a MERGE is a good method.  Make sure the data is sorted by the variable.

data data_1apr2025;
  input data :date. tkt $;
  format data date9. ;
datalines;
01APR2025 3333123
01APR2025 43333111
;
data data_2apr2025;
  input data :date. tkt $;
  format data date9. ;
datalines;
02APR2025 11111111
02APR2025 43333111
02APR2025 99999999
;

Now you can merge and use the IN= dataset option to check if the values exists in both datasets or not.

data want;
  merge data_1apr2025(in=in1) data_2apr2025(in=in2);
  by tkt;
  if not (in1 and in2);
run;

Results:

OBS         data      tkt

 1     02APR2025    11111111
 2     01APR2025    3333123
 3     02APR2025    99999999

If you don't want that second mismatch for some reason then just change the criteria.

if in2 and not in1;

But since you also have the DATE (named DATA for some reason) in the dataset perhaps it would be easier to interleave the datasets instead?  Then the check for a mismatch is just whether there is more than one observation. So the IN= dataset option is not needed.  

data want;
  set data_1apr2025 data_2apr2025;
  by tkt data;
  if (first.tkt and last.tkt);
run;

Or perhaps you  want to find the places where there is a gap in the appearance of TKT for one or more dates?

data data_3apr2025;
  input data :date. tkt $;
  format data date9. ;
datalines;
03APR2025 3333123
03APR2025 43333111
;

data want;
  set data_: ;
  by tkt data;
  lag_data=lag(data);
  format lag_data date9.;
  if (not first.tkt) and (data-1 ne lag(data));
run;

Result

OBS         data      tkt       lag_data

 1     03APR2025    3333123    01APR2025

 

mkeintz
PROC Star

You could compare two daily datasets at a time, but that would mean processing most of the datasets twice, once as the "before" date, and once as the "after".  

 

But if each of the datasets are sorted by TKT, then you could process all of the datasets in a single pass.  Something like (I have changed the daily dataset names to  DATA_20250401, DATA_20250402, ... DATA_20250430):

 

 

data want;
  set data_202504: ;
  by tkt descending date;
  if first.tkt=0 and dif(date)^=-1 then output;
  else if first.tkt=1 and date^='30apr2025'd then output;
run;

 

 

If the data are not sorted by TKT and if sorting would be expensive, then read the datasets in reverse chronological order.  You could use two hash objects to hold current and next daily data (NEXTDAY in the code below). If an incoming observation has a TKT not found in the NEXTDAY object, then output it.  At the end of each day, clear the NEXTDAY object and copy the CURRDAY data into it, in preparation for new current date.

 

data want;
  set data_202504: ;
  by descending date;
  if _n_=1 then do;
    declare hash currday();
      currday.definekey('tkt');
      currday.definedata('tkt','date');
      currday.definedone();
    declare hiter i ('currday');

    declare hash nextday();
      nextday.definekey('tkt');
      nextday.definedata('tkt','date');
      nextday.definedone();
  end;

  if date='30apr2025'd then do; 
    nextday.add();
    return;
  end;
  currday.add();

  if nextday.check()^=0 then output;

  if last.date then do;
    /*Replace NEXTDAY with CURRDAY hash object */
    nextday.clear();
    do while (i.next()=0);
      nextday.add();
    end;
    currday.clear();
  end;

run;

Note these programs assume there are no duplicate TKT values within each daily dataset.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 581 views
  • 0 likes
  • 5 in conversation