Headsup ! I am a beginner in sas. Appreciate your time, kindness and help!
I have a master dataset and three sub datasets ,with various columns puled from Oracle database. The code works till I pull the data, merge. I run this everyday, the output is an excel file with acct_num, acct_id and other columns. the logic looks back last 7 days and fetches the result.
"and i use the date constraint to look at last 7 days of data TRANSACTION_DT < TRUNC(SYSDATE) -7"
Above selection would return all rows with a transaction date OLDER than 7 days.
To select the past 7 days (without the current day where you won't have all transactions):
TRANSACTION_DT between TRUNC(SYSDATE) -7 and TRUNC(SYSDATE) -1
Assuming you don't have duplicate transactions in your source, your definition of duplicates doesn't make much sense to me.
It feels this is more an issue with your selection logic and not with your data.
If you just want to select yesterday's transactions change your selection to:
TRANSACTION_DT = TRUNC(SYSDATE) -1
As I understand it your report is at account level and not at transaction level so at some point you're aggregating the data.
Why do you need transaction level data at all? Like: Is this some summary of transactions per account over the last 7 days and you only want to include accounts that had a change yesterday? If so then it wouldn't be hard to identify all accounts with a change yesterday directly on the Oracle side and only create account level aggregated rows moved to SAS that meet this condition.
You just need to describe a bit more in detail what you have and what your end result needs to be.
If you need consistency with regards to data types and other variable attributes, then you MUST NOT use PROC IMPORT, which also means that you MUST NOT use Excel files as data source.
All means that SAS provides for reading Excel files involve guessing and are notoriously unreliable.
From where do these Excel files come?
Ah... ok. So, I generate the output using EXCEL files. and this has to come out everyday for reviewing purposes with acct_num and other demographic info. Ideally my problem is todays_report should not have ANY entries /duplicates from yesterdays or day before. the code works if I compare todays report to yesterdays and THEN PRINT todays BUT if there are Zero records for today then it pulls data from day before yesterday's which is such a pain now.
@User_2024 wrote:
Ah... ok. So, I generate the output using EXCEL files. and this has to come out everyday for reviewing purposes with acct_num and other demographic info. Ideally my problem is todays_report should not have ANY entries /duplicates from yesterdays or day before. the code works if I compare todays report to yesterdays and THEN PRINT todays BUT if there are Zero records for today then it pulls data from day before yesterday's which is such a pain now.
IF you are generating the Excel files as output are you doing it from SAS? If so, then the place to start would be the data sets used to create that Excel output, not rereading output.
FWIW in a typical week I read anywhere from 10 to 50 files that start out as XLSX. Because of the headaches involved with that I save them to CSV and then can use a data step to read the files into consistent variables by type, length and name.(Until the idiots providing the source change the column order, then I need to change the order of the Input statement I use.)
@User_2024 wrote:
ohh. thats a great point, I am just starting to create reports using SAS. And yes, you are right, i use SAS to create the XLSX report. can you throw me a sample how I can convert this to CSV now ? and all my previous 7 days files are also in XLSX. How to fix them now ? this is how I do it. any insights are sincerely appreciated .
proc export
data=final outfile="filepath\Output\filename&today..xlsx"
dbms=xlsx replace;
sheet="data_for_review";
run;
I would create a permanent library for the data related to this project. Then as you make a "final" data set either append it to a long-term set in that library (best if the contents don't change often) or, and I know this is unpopular for many reasons, make a copy in that permanent library with the date in the name such as Final_20240301. This may be better if the contents do change, meaning the number of variables, the names of the variables and metadata.
Then combine the data sets as needed (if the append approach isn't feasible and may not be with a history of Proc Import...)
Keep the data in SAS. Create those Excel reports for reviewing, but keep the underlying data as SAS datasets for further processing.
@User_2024 Looking at the code you share I feel it might be worth to take a step back and look at your whole process and potentially revamp it.
"I have a master dataset and three sub datasets ,with various columns puled from Oracle database. "
1. Is all your data in Oracle or only some additional information that you use to enrich your master dataset?
2. Where is your master dataset stored?
"Now, the tricky part is I am comparing the today's file to last 7 days and remove duplicates from today's report."
3. What constitutes a "duplicate" and why do you get them in first place?
4. Are there updates to your source data between your daily queries (like a change to open_dt)?
6. What's the purpose of these rsubmit in your code. Why do you need them?
And as others already stated:
Do not use Excel as a data source. Easiest would be to store the daily SAS table permanently that you use to create the Excel and then use these SAS tables the next day.
You could either create daily tables with a date portion in the table name or append daily data to a single table with a date column.
"and i use the date constraint to look at last 7 days of data TRANSACTION_DT < TRUNC(SYSDATE) -7"
Above selection would return all rows with a transaction date OLDER than 7 days.
To select the past 7 days (without the current day where you won't have all transactions):
TRANSACTION_DT between TRUNC(SYSDATE) -7 and TRUNC(SYSDATE) -1
Assuming you don't have duplicate transactions in your source, your definition of duplicates doesn't make much sense to me.
It feels this is more an issue with your selection logic and not with your data.
If you just want to select yesterday's transactions change your selection to:
TRANSACTION_DT = TRUNC(SYSDATE) -1
As I understand it your report is at account level and not at transaction level so at some point you're aggregating the data.
Why do you need transaction level data at all? Like: Is this some summary of transactions per account over the last 7 days and you only want to include accounts that had a change yesterday? If so then it wouldn't be hard to identify all accounts with a change yesterday directly on the Oracle side and only create account level aggregated rows moved to SAS that meet this condition.
You just need to describe a bit more in detail what you have and what your end result needs to be.
Thank you for suggesting. I am trying to find payments made in last 7 days, and I need to run this daily, When I do this I stumbled into duplicates because data from say for example from Mar3 shows up in MAr4 when I run it, and it also shows up when I run on Mar5. And I gotta look at last 7 days because there could be different payment type like cheques.
To resolve it, I imported yesterday's output file, compared to today's, removed duplicates, which worked fine.
But, when I did that for multiple days, I faced datatype error and that when I approached the community. Hope, this helps.
Issues were on defining today, yesterday macros that got fixed from help here. And finally resolved the datatype error by keeping just the index variable while stacking.
proc upload data=Previous_report1 (keep=acct_id);
run;
proc sort data=Previous_report1;
by acct_id;
run;
proc upload data=Previous_report2 (keep=acct_id);
run;
proc sort data=Previous_report2;
by acct_id;
run;
@User_2024 wrote:
Thank you for suggesting. I am trying to find payments made in last 7 days, and I need to run this daily, When I do this I stumbled into duplicates because data from say for example from Mar3 shows up in MAr4 when I run it, and it also shows up when I run on Mar5. And I gotta look at last 7 days because there could be different payment type like cheques.
To resolve it, I imported yesterday's output file, compared to today's, removed duplicates, which worked fine.
But, when I did that for multiple days, I faced datatype error and that when I approached the community. Hope, this helps.
Yes, of course, if you run daily but select the last 7 days of transactions then you will select the same transaction on multiple days.
"And I gotta look at last 7 days because there could be different payment type like cheques."
And you want to report (select) this check payment now for 7 days or only once?
I still believe that ideally you would write logic that does the desired selection directly on the database instead of reading and processing some .csv or Excel for post processing that you created via runs on past days. Just an opinion.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.