BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
User_2024
Obsidian | Level 7

Headsup ! I am a beginner in sas. Appreciate your time, kindness and help!

 

I have a master dataset and three sub datasets ,with various columns puled from Oracle database. The code works till I pull the data, merge. I run this everyday, the output is an excel file with acct_num, acct_id and other columns. the logic looks back last 7 days and fetches the result.

  • Now, the tricky part is I am comparing the today's file to last 7 days and remove duplicates from today's report.( My previous post was to seek help on defining &yesterday, and with the help I created &previous_day 1 to 7 , stacked)
  • After the comparison runs and when I pull the final data, I have this datatype error that seems to be difficult to resolve. I used APPEND, APPEND with FORCE, I also tried creating TEST dataset so my first previous_day1 report will look at test and avoid any data type error IF the previous report has ANY NULL data. 
253 data myora.final_result2;
254 /*format open_dt 10. close_dt 10. merch_num 12. DOB $10.;*/
255 merge myora.final_result (in=a) Previous_report1(in=b) Previous_report2 (in=c)
255! Previous_report3 (in=d) Previous_report4 (in=e) Previous_report5 (in=f) Previous_report6
255! (in=g) Previous_report7(in=h);
ERROR: Variable OPEN_DT has been defined as both character and numeric.
ERROR: Variable CLOSE_DT has been defined as both character and numeric.
ERROR: Variable MERCH_NUM has been defined as both character and numeric.
ERROR: Variable DOB has been defined as both character and numeric.
ERROR: Variable ACCT_NUM has been defined as both character and numeric
 
 
 
 
%let current_day = %sysfunc(today());
%let weekday = %sysfunc(weekday(&current_day.));
 
%if &weekday. = 2 %then %do;
    /* If it's Monday, fetch friday */
    %let previous_day  = %sysfunc(intnx(day, &current_day., -3), date9.);
%let previous_day2 = %sysfunc(intnx(day, &current_day., -4), date9.);
%let previous_day3 = %sysfunc(intnx(day, &current_day., -5), date9.);
%let previous_day4 = %sysfunc(intnx(day, &current_day., -6), date9.);
  %let previous_day5 = %sysfunc(intnx(day, &current_day., -7), date9.);
%let previous_day6 = %sysfunc(intnx(day, &current_day., -8), date9.);
%let previous_day7 = %sysfunc(intnx(day, &current_day., -9), date9.);
%end;
%else %do;
    /* For the rest of the week, refer to yesterday */
    %let previous_day  = %sysfunc(intnx(day, &current_day., -1), date9.);
%let previous_day2 = %sysfunc(intnx(day, &current_day., -2), date9.);
%let previous_day3 = %sysfunc(intnx(day, &current_day., -3), date9.);
%let previous_day4 = %sysfunc(intnx(day, &current_day., -4), date9.);
  %let previous_day5 = %sysfunc(intnx(day, &current_day., -5), date9.);
%let previous_day6 = %sysfunc(intnx(day, &current_day., -6), date9.);
%let previous_day7 = %sysfunc(intnx(day, &current_day., -7), date9.);
%end;
 
%put &previous_day.;
 
 
 
 
PROC IMPORT 
    DATAFILE = "filepath\Output\filename&previous_day..xlsx"
    OUT = Previous_report1
    DBMS = EXCEL REPLACE;
    RANGE = 'possible_accts$';
    MIXED = YES;
    SCANTEXT = YES;
    USEDATE = YES;
    SCANTIME = YES;
RUN;
 
/*proc print data = Previous_report1;run;*/
 
PROC IMPORT 
    DATAFILE = "filepath\Output\filename&previous_day2..xlsx"
    OUT = Previous_report2
    DBMS = EXCEL REPLACE;
    RANGE = 'possible_accts$';
    MIXED = YES;
    SCANTEXT = YES;
    USEDATE = YES;
    SCANTIME = YES;
RUN;
 
/*proc print data = Previous_report2;run;*/
 
PROC IMPORT 
    DATAFILE = "filepath\Output\filename&previous_day3..xlsx"
    OUT = Previous_report3
    DBMS = EXCEL REPLACE;
    RANGE = 'possible_accts$';
    MIXED = YES;
    SCANTEXT = YES;
    USEDATE = YES;
    SCANTIME = YES;
RUN;
 
 
PROC IMPORT 
    DATAFILE = "filepath\Output\filename&previous_day4..xlsx"
    OUT = Previous_report4
    DBMS = EXCEL REPLACE;
    RANGE = 'possible_accts$';
    MIXED = YES;
    SCANTEXT = YES;
    USEDATE = YES;
    SCANTIME = YES;
RUN;
 
 
 
PROC IMPORT 
    DATAFILE = "filepath\Output\filename&previous_day5..xlsx"
    OUT = Previous_report5
    DBMS = EXCEL REPLACE;
    RANGE = 'possible_accts$';
    MIXED = YES;
    SCANTEXT = YES;
    USEDATE = YES;
    SCANTIME = YES;
RUN;
 
 
PROC IMPORT 
    DATAFILE = "filepath\Output\filename&previous_day6..xlsx"
    OUT = Previous_report6
    DBMS = EXCEL REPLACE;
    RANGE = 'possible_accts$';
    MIXED = YES;
    SCANTEXT = YES;
    USEDATE = YES;
    SCANTIME = YES;
RUN;
 
PROC IMPORT 
    DATAFILE = "filepath\Output\filename&previous_day7..xlsx"
    OUT = Previous_report7
    DBMS = EXCEL REPLACE;
    RANGE = 'possible_accts$';
    MIXED = YES;
    SCANTEXT = YES;
    USEDATE = YES;
    SCANTIME = YES;
RUN;
 
 
 
proc delete data=test;run;
data test;
/* Define variable lengths */
  format acct_num 16. acct_id 8. tran_dt 12. open_dt $10. close_dt 10. cheque_pymt 30. merch_num 12. postal_cd $6. count_cash_adv 4. total_cash_adv 9. name $15. DOB $10. email $24. tel_home 10. addr_ln1 $17.;
run;
 
 
 
rsubmit;
proc upload data=test;
run;
proc sort data=test;
by acct_id;
run;
 
proc upload data=Previous_report1 ;
run;
proc sort data=Previous_report1;
by acct_id;
run;
 
proc upload data=Previous_report2 ;
run;
proc sort data=Previous_report2;
by acct_id;
run;
proc upload data=Previous_report3 ;
run;
proc sort data=Previous_report3;
by acct_id;
run;
proc upload data=Previous_report4 ;
run;
proc sort data=Previous_report4;
by acct_id;
run;proc upload data=Previous_report5 ;
run;
proc sort data=Previous_report5;
by acct_id;
run;
proc upload data=Previous_report6 ;
run;
proc sort data=Previous_report6;
by acct_id;
run;
proc upload data=Previous_report7 ;
run;
proc sort data=Previous_report7;
by acct_id;
run;
endrsubmit;
 
 
 
 
proc append base=test data=Previous_report1 FORCE;run;
/* try to run it without FORCE and then with force */
proc append base=test data=Previous_report2 FORCE;run;
proc append base=test data=Previous_report3 FORCE;run;
proc append base=test data=Previous_report4 FORCE;run;
proc append base=test data=Previous_report5 FORCE;run;
proc append base=test data=Previous_report6 FORCE;run;
proc append base=test data=Previous_report7 FORCE;run;
 
 
rsubmit;
proc delete data= myora.final_result2;run;
data myora.final_result2;
 merge myora.final_result (in=a) Previous_report1(in=b) Previous_report2 (in=c) Previous_report3  (in=d)  Previous_report4 (in=e) Previous_report5 (in=f) Previous_report6 (in=g) Previous_report7(in=h); 
 by acct_id ;
 if a and not b and not c and not d and not e and not f and not g and not h;
 run;
endrsubmit;
 
 
rsubmit; /*table for result */
 
proc sql;
connect to oracle(&ora_str.);
create table final as
select * from connection to oracle(
 
select ACCT_NUM,ACCT_ID, CHEQUE_PYMT,NAME, DOB, EMAIL, TEL_HOME, ADDR_LN1
from  final_result2 where CHEQUE_PYMT >= '1000.00'
 
 
);
disconnect from oracle;
quit;
endrsubmit;
 
 
 
 
rsubmit;
proc download data=final;
run;
endrsubmit;
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@User_2024 

"and i use the date constraint to look at last 7 days of data TRANSACTION_DT < TRUNC(SYSDATE) -7"
Above selection would return all rows with a transaction date OLDER than 7 days.

To select the past 7 days (without the current day where you won't have all transactions):

TRANSACTION_DT between TRUNC(SYSDATE) -7 and TRUNC(SYSDATE) -1

Assuming you don't have duplicate transactions in your source, your definition of duplicates doesn't make much sense to me.
It feels this is more an issue with your selection logic and not with your data.
If you just want to select yesterday's transactions change your selection to: 

TRANSACTION_DT = TRUNC(SYSDATE) -1

As I understand it your report is at account level and not at transaction level so at some point you're aggregating the data.

 

Why do you need transaction level data at all? Like: Is this some summary of transactions per account over the last 7 days and you only want to include accounts that had a change yesterday? If so then it wouldn't be hard to identify all accounts with a change yesterday directly on the Oracle side and only create account level aggregated rows moved to SAS that meet this condition.

 

You just need to describe a bit more in detail what you have and what your end result needs to be.

View solution in original post

17 REPLIES 17
Kurt_Bremser
Super User

If you need consistency with regards to data types and other variable attributes, then you MUST NOT use PROC IMPORT, which also means that you MUST NOT use Excel files as data source.

All means that SAS provides for reading Excel files involve guessing and are notoriously unreliable.

From where do these Excel files come?

User_2024
Obsidian | Level 7

Ah... ok. So, I generate the output using EXCEL files.  and this has to come out everyday for reviewing purposes with acct_num and other demographic info. Ideally my problem is todays_report should not have ANY entries /duplicates from yesterdays or day before. the code works if I compare todays report to yesterdays and THEN PRINT todays BUT if there are Zero records for today then it pulls data from day before yesterday's which is such a pain now. 

 

 

ballardw
Super User

@User_2024 wrote:

Ah... ok. So, I generate the output using EXCEL files.  and this has to come out everyday for reviewing purposes with acct_num and other demographic info. Ideally my problem is todays_report should not have ANY entries /duplicates from yesterdays or day before. the code works if I compare todays report to yesterdays and THEN PRINT todays BUT if there are Zero records for today then it pulls data from day before yesterday's which is such a pain now. 

 

 


IF you are generating the Excel files as output are you doing it from SAS? If so, then the place to start would be the data sets used to create that Excel output, not rereading output.

 

FWIW in a typical week I read anywhere from 10 to 50 files that start out as XLSX. Because of the headaches involved with that I save them to CSV and then can use a data step to read the files into consistent variables by type, length and name.(Until the idiots providing the source change the column order, then I need to change the order of the Input statement I use.)

User_2024
Obsidian | Level 7
ohh. thats a great point, I am just starting to create reports using SAS. And yes, you are right, i use SAS to create the XLSX report. can you throw me a sample how I can convert this to CSV now ? and all my previous 7 days files are also in XLSX. How to fix them now ? this is how I do it. any insights are sincerely appreciated .

proc export
data=final outfile="filepath\Output\filename&today..xlsx"
dbms=xlsx replace;
sheet="data_for_review";
run;
ballardw
Super User

@User_2024 wrote:
ohh. thats a great point, I am just starting to create reports using SAS. And yes, you are right, i use SAS to create the XLSX report. can you throw me a sample how I can convert this to CSV now ? and all my previous 7 days files are also in XLSX. How to fix them now ? this is how I do it. any insights are sincerely appreciated .

proc export
data=final outfile="filepath\Output\filename&today..xlsx"
dbms=xlsx replace;
sheet="data_for_review";
run;

I would create a permanent library for the data related to this project. Then as you make a "final" data set either append it to a long-term set in that library  (best if the contents don't change often) or, and I know this is unpopular for many reasons, make a copy in that permanent library with the date in the name such as Final_20240301. This may be better if the contents do change, meaning the number of variables, the names of the variables and metadata.

 

Then combine the data sets as needed (if the append approach isn't feasible and may not be with a history of Proc Import...)

 

User_2024
Obsidian | Level 7
Thank you, any sample syntax or code pls ? Appreciate your time.
User_2024
Obsidian | Level 7
Thanks, could you point me for an example pls?
Patrick
Opal | Level 21

@User_2024 Looking at the code you share I feel it might be worth to take a step back and look at your whole process and potentially revamp it.

 

"I have a master dataset and three sub datasets ,with various columns puled from Oracle database. "

1. Is all your data in Oracle or only some additional information that you use to enrich your master dataset?

2. Where is your master dataset stored?

 

"Now, the tricky part is I am comparing the today's file to last 7 days and remove duplicates from today's report."

3. What constitutes a "duplicate" and why do you get them in first place?

4. Are there updates to your source data between your daily queries (like a change to open_dt)?

 

6. What's the purpose of these rsubmit in your code. Why do you need them?

 

And as others already stated: 

Do not use Excel as a data source. Easiest would be to store the daily SAS table permanently that you use to create the Excel and then use these SAS tables the next day.

You could either create daily tables with a date portion in the table name or append daily data to a single table with a date column.

User_2024
Obsidian | Level 7
1. Is all your data in Oracle or only some additional information that you use to enrich your master dataset? - All of my data are from Oracle sql.

2. Where is your master dataset stored? - its in myora. and not in WORK.

6. What's the purpose of these rsubmit in your code. Why do you need them?- I always use them for ease of execution. Since am a beginnner in sas, I am open for suggestions and ideas, this will be my first ever report creation in SAS. Appreciate your insights and time!
User_2024
Obsidian | Level 7
3. What constitutes a "duplicate" and why do you get them in first place? - i run this report on weekdays and i use the date constraint to look at last 7 days of data TRANSACTION_DT < TRUNC(SYSDATE) -7 so for example when the report runs today, it picks up records from last 7 days and has duplicates in it, so I did a proc import to compare yesterdays to today to remove dupes, which works for one day, BUT if today;s report has NO DATA in it, it goes and picks up day before yesterdays data as output. So to avoid that I started stacking last 7 days of output files, sorted by acct_id and compared with today's , removed duplicates. but at this point, its not working because of the data type error.

4. Are there updates to your source data between your daily queries (like a change to open_dt)?- no, open_dt is the date when the account was opened so no change in it, the only change is at the Transaction_dt as the real time transaction happens,
Patrick
Opal | Level 21

@User_2024 

"and i use the date constraint to look at last 7 days of data TRANSACTION_DT < TRUNC(SYSDATE) -7"
Above selection would return all rows with a transaction date OLDER than 7 days.

To select the past 7 days (without the current day where you won't have all transactions):

TRANSACTION_DT between TRUNC(SYSDATE) -7 and TRUNC(SYSDATE) -1

Assuming you don't have duplicate transactions in your source, your definition of duplicates doesn't make much sense to me.
It feels this is more an issue with your selection logic and not with your data.
If you just want to select yesterday's transactions change your selection to: 

TRANSACTION_DT = TRUNC(SYSDATE) -1

As I understand it your report is at account level and not at transaction level so at some point you're aggregating the data.

 

Why do you need transaction level data at all? Like: Is this some summary of transactions per account over the last 7 days and you only want to include accounts that had a change yesterday? If so then it wouldn't be hard to identify all accounts with a change yesterday directly on the Oracle side and only create account level aggregated rows moved to SAS that meet this condition.

 

You just need to describe a bit more in detail what you have and what your end result needs to be.

User_2024
Obsidian | Level 7

Thank you for suggesting. I am trying to find payments made in last 7 days, and I need to run this daily, When I do this I stumbled into duplicates because data from say for example from Mar3 shows up in MAr4 when I run it, and it also shows up when I run on Mar5. And I gotta look at last 7 days because there could be different payment type like cheques. 

To resolve it, I imported yesterday's output file, compared to today's, removed duplicates, which worked fine. 

But, when I did that for multiple days, I faced datatype error and that when I approached the community. Hope, this helps. 

 

Issues were on defining today, yesterday macros that got fixed from help here. And finally resolved the datatype error by keeping just the index variable while stacking. 

proc upload data=Previous_report1 (keep=acct_id);
run;
proc sort data=Previous_report1;
by acct_id;
run;
proc upload data=Previous_report2 (keep=acct_id);
run;
proc sort data=Previous_report2;
by acct_id;
run;

Patrick
Opal | Level 21

@User_2024 wrote:

Thank you for suggesting. I am trying to find payments made in last 7 days, and I need to run this daily, When I do this I stumbled into duplicates because data from say for example from Mar3 shows up in MAr4 when I run it, and it also shows up when I run on Mar5. And I gotta look at last 7 days because there could be different payment type like cheques. 

To resolve it, I imported yesterday's output file, compared to today's, removed duplicates, which worked fine. 

But, when I did that for multiple days, I faced datatype error and that when I approached the community. Hope, this helps. 

Yes, of course, if you run daily but select the last 7 days of transactions then you will select the same transaction on multiple days. 

"And I gotta look at last 7 days because there could be different payment type like cheques."

And you want to report (select) this check payment now for 7 days or only once?

 

I still believe that ideally you would write logic that does the desired selection directly on the database instead of reading and processing some .csv or Excel for post processing that you created via runs on past days. Just an opinion. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 2066 views
  • 4 likes
  • 4 in conversation