SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

DI Studio file archiving

Accepted Solution Solved
Reply
Contributor
Posts: 31
Accepted Solution

DI Studio file archiving

Hi All.

 

I'm in the process of writing a program that will be processing files from several AS400 systems into a 'staging table' . (The staging table will ultimately be used to feed a set of standardization and cleansing routines.) Now, the AS400 systems will be sending these pipe delimited CSV files on a daily basis for now, but ultimately will be sped up to be hourly. The naming convention of these files will be similar to 'CDH_ccyymmddhhmmss.csv'  Now from what I see and have read... DI Studio does not have a transformation that can be used to 'copy' or 'archive' the file after being succesfully processed. My Admin is going to create a folder/directory for me to send processed files to. I'm hoping to preserve the name of the input file so that if there is an issue, I can also reference the matching file on the source AS400 system.  I believe in unix there are also commands that can do this, but I dont see how DI studio will allow that code to be used.  Also, I thought about an sql statement and using the file name as a macro or parm.. but have never done this sort of thing yet. 

 

I have attached a single file processing job, that I was trying to add the archive functionality to ..

 

Any help would be greatly appreciated..!

Chris N.

- Chris N.

Accepted Solutions
Solution
‎04-25-2017 09:22 AM
New Contributor
Posts: 3

Re: DI Studio file archiving

Hi Chris,

There are several ways to accomplish this. As other users have already discussed, it can be done using a user written code which calls an Unix script using a X command. In the attachement, I have outlined all the steps with screenprints.


Below are the steps.

1. Read the CSV file.
2. Perform the needed ETL Transformation as per your businees requirements.
3. Load the file in to the DB Table.
4. Have a conditional start (Job return code > 4 is a job failure) and conditional end transformations to check that there are no errors while processing/loading the file.
5. Call the Unix Script using the X command using SAS User Written Code Transformation.
6. Below is a sample Unix script to move a file to archived directory.


#!/bin/ksh
umask 002
export LOG_DIR="<LOG Directory Path>"
export LOG_FILE="$LOG_DIR/FILENAME.log"
export FILE="<File Path/CDH_cc*.csv>"
export TRGT_DIR='<Archive Directory Path>'

if [ -f $FILE ]
then
echo "File $FILE exists. "
echo "Moving the $FILE to $TRGT_DIR directory" | tee -a $LOG_FILE
mv $FILE ${TRGT_DIR}
RET_CD=$(echo $?)

# Check for the return code of the previous command

if [[ $RET_CD -ne 0 ]]; then
echo "Moving the $FILE to $TRGT_DIR failed..." | tee -a $LOG_FILE
echo "Please check $LOG_FILE for error message"
exit 1;
else
echo "Moved $FILE to $TRGT_DIR Successfully"| tee -a $LOG_FILE
fi
else
echo "No $FILE present in the incoming directory" |tee -a $LOG_FILE
fi

7. I have outlined all the steps with screenprints in the attachment.


Thanks,
-Sai

View solution in original post


All Replies
Contributor
Posts: 31

Re: DI Studio file archiving

geez.. I forgot to add the screen shot.

 

 

 

- Chris N.
Super User
Posts: 5,257

Re: DI Studio file archiving

Basically, it's an OS (UNIX) move command that needs to be issued.

That can be done in several ways. If you have just one process to read all files, it can be simply be added as a user written step after the table loader, or within the table loader. Be sure that you define what a successful import means to you, to you have any control file or similar structure, you need to take that into account.

If you have many different files, you may need to investigate parametrization, like macro, or user written transformation.

Basically, this is an ETL design job, and best performed on site.

Data never sleeps
Contributor
Posts: 31

Re: DI Studio file archiving

Thanks for the reply..

 

So, this unix command can be added as a postcode in my tableloader?

if the tableloader is successful, execute this poscode command for the copy.. or something similar?

 

would you know what the unix commands are for something like this.. I don't know unix whatsoever..

 

-Chris N.

 

 

 

- Chris N.
PROC Star
Posts: 1,095

Re: DI Studio file archiving

If you can do it within the constraints of DI Studio, that's great.

 

From what you're describing, though, be prepared for it to be just a little too complicated for DI Studio. The good news is it looks like it would be trivial in SAS code, as a user-written transformation. My feeling is 10-20 hours of effort.

 

Depending on how critical your timelines are, you might want to be thinking about lining somebody up.

 

Tom

Contributor
Posts: 31

Re: DI Studio file archiving

Right now, development time is not an issue. Myself and the rest of the team at my company are just learning DI studio as well as Dataflux at the same time as we work on this project of ours. Right now, I have my program set up to read a single file. The next step is to add a loop to read ALL the tables in the same folder. The thought is that once each table is processed successfully, to create/copy/move the table to an archive folder. Each input file will have the same file format, but different names based on a datetimestamp when it was originally created on the remote system..

 

so for my archive.. has anyone done such a thing like this? I thought maybe I could somehow add some postcode to last process that would do this.. I have read about an 'X' statement that allows execution of unix code in a shell. Everything I have read refers to other SAS products though.. not DI Studio..

- Chris N.
PROC Star
Posts: 1,095

Re: DI Studio file archiving

Yes, the good news is that SAS is very good for things like this...it's actually the sort of thing I do myself for fun.

 

To figure out what files are in the directory, check out the "external files" functions in the SAS help, especially FILENAME, DOPEN, DNUM, DREAD, DCLOSE, FOPEN, and FCLOSE.

 

Don't worry about the fact that everything refers to other SAS products; mostly, DI Studio just generates SAS code, so if you create user-defined code, you can do (pretty much) anything that SAS does.

 

Tom

Contributor
Posts: 31

Re: DI Studio file archiving

Thanks for the advice Tom.

I'm going to start researching those fuctions this afternoon.. 

 

-Chris N.

- Chris N.
Solution
‎04-25-2017 09:22 AM
New Contributor
Posts: 3

Re: DI Studio file archiving

Hi Chris,

There are several ways to accomplish this. As other users have already discussed, it can be done using a user written code which calls an Unix script using a X command. In the attachement, I have outlined all the steps with screenprints.


Below are the steps.

1. Read the CSV file.
2. Perform the needed ETL Transformation as per your businees requirements.
3. Load the file in to the DB Table.
4. Have a conditional start (Job return code > 4 is a job failure) and conditional end transformations to check that there are no errors while processing/loading the file.
5. Call the Unix Script using the X command using SAS User Written Code Transformation.
6. Below is a sample Unix script to move a file to archived directory.


#!/bin/ksh
umask 002
export LOG_DIR="<LOG Directory Path>"
export LOG_FILE="$LOG_DIR/FILENAME.log"
export FILE="<File Path/CDH_cc*.csv>"
export TRGT_DIR='<Archive Directory Path>'

if [ -f $FILE ]
then
echo "File $FILE exists. "
echo "Moving the $FILE to $TRGT_DIR directory" | tee -a $LOG_FILE
mv $FILE ${TRGT_DIR}
RET_CD=$(echo $?)

# Check for the return code of the previous command

if [[ $RET_CD -ne 0 ]]; then
echo "Moving the $FILE to $TRGT_DIR failed..." | tee -a $LOG_FILE
echo "Please check $LOG_FILE for error message"
exit 1;
else
echo "Moved $FILE to $TRGT_DIR Successfully"| tee -a $LOG_FILE
fi
else
echo "No $FILE present in the incoming directory" |tee -a $LOG_FILE
fi

7. I have outlined all the steps with screenprints in the attachment.


Thanks,
-Sai

Contributor
Posts: 31

Re: DI Studio file archiving

Thanks for all the great advice..

 

I did figure out a very simple way to perform my archiving by using an x-cmd in my post code tab.

all my files being processed are prefixed with 'CDH_'  so by using the asterisk wildcard(*) I can move all available files without having to maintain a contents list and making multiple copy statements.

 

X cp -a /dmadata/AS400/CDH_* /dmadata/AS400_ARCH/;

 

now I am making a copy of all my files from my 'input' directory to my 'archive' one and maintaining the original attributes as well.

 

 

-Chris N.

- Chris N.
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 262 views
  • 0 likes
  • 4 in conversation