D/All,
I'm in the process of migration of my SAS scripts from EG 4.1 to EG 7.1. In addition to that I will also be doing some parametrization of the input & output file paths (either shared folders, ftp folders, http links) in order to make changes in the script to point to QA and Prod environment. Once the changes have been applied on the QA environment the SAS script will be executed and the log files has to be compared line by line across QA and Production.
The problem is, I'm not sure if both versions of EG will produce exactly same lines of log text for a given SAS script. Hence even auto comparing the log files will not give me the desired output. Hence what I need is a script to which I can pass a source log file and a target log file. The script should read the log file and list down dataset name being called or new dataset created, record count read & written, time taken to read and write, identify the macro name in which the dataset was read or written (if possible).
My expertise on SAS is very limited to few data & proc steps hence I have no clue as how to parse and generate the list. I would request the experts to help me in having such a script. A sample log file and expected output pasted below. Sorry cannot attach as a file here.
Expected output (in Comma Separated Format) from below log file:
| Dataset, | Action, Record_Count, Var_Count, | Real_Time | 
| ASPL.PDPA_R, Read, 12732236, | Not mentioned, Not mentioned | |
| WORK.IMP0_R, Write, 12732236, | 34 variables, 1:59.00 | |
| ASPL.PDPA_A, Read, 12242819, | Not mentioned, Not mentioned | |
| WORK.IMP0_A, Write, 12242819, | 34 variables, 56.00 seconds | |
| WORK.IMP0_R, Read, 12732236, | Not mentioned, Not mentioned | |
| WORK.IMP1_R, Write, 2696, | 9 variables, | 1:02.00 | 
/* Sample Log File --- START--- */
WARNING: The Base Product product with which DATASTEP (2) is associated will be expiring soon, and is currently in warning mode to
indicate this upcoming expiration. Most typically this warning period runs for 45 days. Please run PROC SETINIT to obtain
more information on your warning period.
NOTE: There were 12732236 observations read from the data set ASPL.PDPA_R.
NOTE: The data set WORK.IMP0_R has 12732236 observations and 34 variables.
NOTE: Compressing data set WORK.IMP0_R decreased size by 57.72 percent.
Compressed is 91233 pages; un-compressed would require 215801 pages.
NOTE: DATA statement used (Total process time):
real time 1:59.00
user cpu time 1:05.21
system cpu time 24.51 seconds
Memory 252k
Page Faults 0
Page Reclaims 70
Page Swaps 0
Voluntary Context Switches 13065
Involuntary Context Switches 6537
Block Input Operations 5481304
Block Output Operations 4382568
WARNING: The Base Product product with which DATASTEP (2) is associated will be expiring soon, and is currently in warning mode to
indicate this upcoming expiration. Most typically this warning period runs for 45 days. Please run PROC SETINIT to obtain
more information on your warning period.
NOTE: DATA statement used (Total process time):
real time 2:06.00
user cpu time 1:12.76
system cpu time 24.82 seconds
Memory 310k
Page Faults 0
Page Reclaims 37
4 The SAS System 10:28 Thursday, June 18, 2015
Page Swaps 0
Voluntary Context Switches 15227
Involuntary Context Switches 6906
Block Input Operations 6536272
Block Output Operations 4752976
NOTE: There were 12242819 observations read from the data set ASPL.PDPA_A.
NOTE: The data set WORK.IMP0_A has 12242819 observations and 34 variables.
NOTE: Compressing data set WORK.IMP0_A decreased size by 61.20 percent.
Compressed is 98964 pages; un-compressed would require 255059 pages.
WARNING: The Base Product product with which SUMMARY (2) is associated will be expiring soon, and is currently in warning mode to
indicate this upcoming expiration. Most typically this warning period runs for 45 days. Please run PROC SETINIT to obtain
more information on your warning period.
NOTE: PROCEDURE SUMMARY used (Total process time):
real time 56.00 seconds
user cpu time 23.05 seconds
system cpu time 7.30 seconds
Memory 7415k
Page Faults 0
Page Reclaims 1741
Page Swaps 0
Voluntary Context Switches 16539
Involuntary Context Switches 803
Block Input Operations 4383520
Block Output Operations 320
NOTE: There were 12732236 observations read from the data set WORK.IMP0_R.
NOTE: The data set WORK.IMP1_R has 2696 observations and 9 variables.
NOTE: Compressing data set WORK.IMP1_R decreased size by 29.63 percent.
Compressed is 19 pages; un-compressed would require 27 pages.
WARNING: The Base Product product with which SUMMARY (2) is associated will be expiring soon, and is currently in warning mode to
indicate this upcoming expiration. Most typically this warning period runs for 45 days. Please run PROC SETINIT to obtain
more information on your warning period.
NOTE: PROCEDURE SUMMARY used (Total process time):
real time 1:02.00
user cpu time 29.08 seconds
system cpu time 8.66 seconds
Memory 6763k
Page Faults 0
Page Reclaims 1571
Page Swaps 0
Voluntary Context Switches 11745
Involuntary Context Switches 3253
Block Input Operations 4754952
Block Output Operations 352
/* Sample Log File --- END--- */
Rgds, Anil
Just to add one point:
I can ask the script owners to add few more code in their existing script to output the above record counts, but the idea is not to tamper the existing system and get the requirement sorted out.
Look at programs created specifically for these kinds of tasks, eg the UNIX diff utility. Or WinMerge for Windows, as a start.
Why make it hard for yourself by comparing logs? I've found using a combination of SAS dataset comparison using PROC COMPARE and checking for clean logs and program process times is all you need to do when changing versions.
If your datasets are identical then any outputs will also be identical.
You don't say if you are also changing SAS versions. If you are not when SAS log results are unlikely to change if the code you are using stays the same.
MIgration is part of Software Development Life Cycle, so follow that. Identify a plan, run through the test scripts, validate outputs etc. Personally I wouldn't check the log, simply because they will be different in various ways. Execute the script, logcheck the results ensuring there are no Warnings or Errors and that Notes are acceptable. Then get an independent party to either validate output, or run through the testing procedure.
Eguide is just generating new SAS code with some of those menu-task. As Eguide is a new  version you should exepect different new code to be generated.
As you are going from 4.1 to  7.1  your real migration is going from  SAS base foundation 9.1.3 to 9.4. There are a lot of differences as the foundation got a lot of changes.
Knowing the new system is accordingly expectations you need to do regression testing. That regression testing should be part of the migration it not part of the migrations of SAS- metadata. You are not having SAS-metadata as you did not mention that.
I would expect comparing results-output  (lists/html or datasets) not the logs. For comparing dataset there is a PROC COMPARE Base SAS(R) 9.4 Procedures Guide, Fourth Edition
Thank you everyone. We finally sticked to proc compare and check if the log reports produces ERROR or WARNING lines. This approach was best to take decisions.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
