DATA Step, Macro, Functions and more

Comparing the Files

Posts: 66

Comparing the Files

Hi All,

I need to read historic XLSX,Xls files, we are having some data missing in our database tables.

As the files were updated and we are not sure if we have loaded the updated files on to the Tables.


We recieve a file from a vendor on monthly basis in .xlsx format (

xxxxxxx xxxxxxx-xxxxxxx_YYMM_cmb.xlsx)  and as the program is designed to import .xls and we dont want to distrub the existing set up. so we are just converting the monthly.XLSX file to .xls. Whenever the Input file .xlsx is updated we recieve a New file with a new name (xxxxxxx xxxxxxx-xxxxxxx_YYMM_cmb_upd_v2.xlsx) if there are more than one update to the file we recieve the files with a increment in value 'Vn'.


Now we need ot create a Master dataset which would have all the correct records from the input .xlsx file.

i have already designed a program which woul read all the .xlsx files from the desired location and created a dataset for each of the input .xlsx files and later append all the datasets to a Master file.


Datasets created from the .xlsx input files for example:






the above files are appended to a Master dataset "Master_Sales"

This works fine with the input files which donot have updated Versions.

Now i need to compare the Number of records from Version V1 and V2 and append the dataset with Highest number of records in V1 or V2.






Kinly Suggest me on this.


Super User
Posts: 23,663

Re: Comparing the Files

Posted in reply to santosh_pat69

Sashelp.vtable contains metadata about your tables which includes the number of records. You can use that to find the table with highest number of records. 

Posts: 66

Re: Comparing the Files

Hi Reeza


Thank you for your Quick response,

data _null_;
 set SASHELP.Vtable;
 where libname="WORK" and memname="MSTR_DM_&I";
 call symput("count_obs",nlobs);


But here i need to compare the comunt between to files Version1 and Version for the file name V1,V2 and append the dataset with highest count to the master dataset.


In my current code it chks for the Xls,xlsx files reads it and appends it without checking the counts.

now  i have check the counts and append the dataset for the highest count from file V1 and V2.


for example :


XXXXXX_XXXXXX_XXX_YYMM_Cmb.xlsx ===>>>> 400 counts.

XXXXXX_XXXXXX_XXX_YYMM_Cmb_V2.xlsx ===>>>> 410 counts.

XXXXXX_XXXXXX_XXX_YYMM_Cmb_V3.xlsx ===>>>> 455 counts.

i need to read all these files take the counts compare the file names and counts  then append the dataset for the file with max number of records.




Super User
Posts: 23,663

Re: Comparing the Files

Posted in reply to santosh_pat69

So, you've imported all the files. 


Use the table to get the number of observations and take the largest one.


I don't know enough about your process to know where you want to do this, but it is a trivial issue to determine the largest one. See the example below.



proc sql;
creeate table table_obs as 
select memname, nlobs
from sashelp.vtable
where libname='WORK' and memname like 'MSTR_DM_%'
order by nlobs descending;

data _null_;
set table_obs;
if _n_=1 then call symput('table_to_use', memname);

%put &table_to_use.;



Posts: 66

Re: Comparing the Files

Hi Reeza,

Thank you !!!

My code :- It detemines all the .xlsx and Xls file


data _Null_;

Call system("find &filedir1. . \( -name \*.xls -o -name \*.xlsx \) -print> /Santosh/exls.csv");



The out put of hte above will have a list of all hte xlsx,xls files, my code reads all the files listed sequentially, but now i need to compare the files v,v1 and v2 and append only hte dataset which has the highest count to the Master data set and should also compare the names of the files used.






for example we recieved three files for any month now after the datasets are checked for the record how do i append the dataset containing highest records for the same period of time.



Posts: 66

Re: Comparing the Files

Posted in reply to santosh_pat69



Thank you Reeza!!!


I have taken the count of the records from the interim datasets, and then added the tables who have max number of records into the tables.


Thank you for your help.







Ask a Question
Discussion stats
  • 5 replies
  • 2 in conversation