BookmarkSubscribeRSS Feed
AllanD
Fluorite | Level 6

Hi All,

 

I am trying to identify the best method for handling 1 or more txt files in Data Management Studio 2.7, being received from different sources but in the same format and then combining them into a single file to allow for processing all the records together.  The files may contain the same records so need to be combined to ensure only the unique records will be processed.

 

I have created a process job using the following method (see image Process image.png):

  • Create a blank generic txt file to hold all the data received with the correct heading record
  • Check to see if file 1 exists - if it does, union with the generic txt file and write back to the generic text file
  • Check to see if file 2 exists - if it does, union with the generic txt file and write back to the generic text file
  • Use this generic txt file in the main data job
  • Archive all files

Acknowledging that in this process there is only two files that may be received but there are others that have larger numbers of files, how then do you manage in DMS where there are more files rather than having to check for each file to add to the generic txt file?

 

I think there must be a better way but I just don't know what it could be.

 

Also I have found an issue with the reading in of the generic txt file in the "Data Job 1" node where the output is written correctly in the "Generate data for OSHC Dir..." as say 458 records but is not read into the "Data Job 1" in its entirety.  It is stopping at the same location in the generic txt file when reading this file into the input file node within this data job.

 

I expect that this issue is due to reading and writting to the same file but am not sure how to avoid this scenario to combine these files.

 

Any suggestions would be greatly appreciated.


Thanks,


Allan. 


Process image.png
2 REPLIES 2
RonAgresta
SAS Employee

You may be able to use the parallel iterator node in your process job to check for new files and write the contents to a work table using the work table writer node in your data job. A subsequent data job could read data out of the work table using the work table reader node after which you could perform duplicate removal processing using clustering and surviving record identification nodes.

 

For the possible issue related to reading from and writing to the same file at the same time, using a work table may help with that or if you wanted to stay with the current job design, you could use a branch node in between steps where you are reading and writing data. In the branch node, you can set the option to "land all data locally before processing continues' to force the behavior you are looking for.

 

Ron

AllanD
Fluorite | Level 6

Hi Ron,

 

Thanks for the reply.  I have converted the process over to use a database table rather than the txt file to aggregate the multiple files into one.  I will look into the work table option to ensure it is all contained within dataflux.

 

I have looked at the parellel processor node and it looks like it may be able to achieve what I am after.  The question I have now is how do you identify the number of files that exist in a particular location and call these file names to the process?

 

Thanks,

 

Allan.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1178 views
  • 0 likes
  • 2 in conversation