- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi folks,
I don't know if I am in the right section of this forum for my question. If not, I am sorry.
Here's my question. I have a multiple data files produced each month. I would like to know if there's a way to use PROC CONTENTS in a way to ensure that the data in the latest textfile (ex: Sep 2019) are "good". By good, I mean that the data are similar from the last files, the number of data is constant or increasing, etc. I don't know if my question is clear. I want to find a way to program a SAS code that will "check" each new produced file to look for any irregularities comparing to the past files.
I know that PROC contents produce great informations like type of variables, # of obs, type of variable, etc. I don't know if there's a way to use thoses informations to gather all of them into one dataset and use it to check if our newest datas are "potent".
Since English isn't my native language, I hope that my question for you guys is clear enough.
I was thinking of gathering the output of PROC CONTENTS for each file into a dataset. By example, If I keep only the variable NAME, TYPE, LENGTH, VARNUM, NPOS, NOBS, is there a way to have it like this :
Columns : NAME, TYPE, LENGTH, VARNUM, NPOS, NOBS
Lines : Datas of each files
I really hope that I'm clear...
Thanks in advance!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@jpprovost wrote:
I didn't know that SAS have a COMPARE procedure. Wow.
However, my goal is to collect datas from all the files from past years and to verify that the new one is "clean". Is there a way to do that?
Could I create a macro that load all the files into one dataset and then use PROC COMPARE to compare between "all_past_files" and "latest files" ? I am not sure that such a thing would work since one set would be larger than the other...?
No need for a "macro" to create such. SAS already does that in the background for you. They are data views in the SASHELP library such as Vtable, with information about datasets and views, Vcolumn with information about variables. These can also be accessed in Proc SQL as Dictionary Tables.
For instance one way to get information about all data sets in a library is: (replace YOURLIB with the uppercase name of your library of interest)
Proc sql; create table work.libtables as select * from dictionary.tables where libname='YOURLIB' ; run;
The resulting data set will have the data set library, name, creation and modified dates, number of observations, number of variables, information about file size, deleted observations and some other somewhat esoteric characteristics. For your purpose the dates, number of variables and number of observations are likely the most useful, especially if the number of variables changes.
The Vcolumns would be checked to see if the variables stay consistent for properties.
Replace YOURLIB with the uppercase spelling of a library of interest in the following code.
this creates a table that indicates the variable name and type on the row, the column of data set name and an X to indicate which.
If you have a variable with multiple types you can determine which data sets it changes in. You could restrict the data sets or variables to consider in the WHERE statement or create a data set from the SASHELP.VCOLUMN or Dictionary.Columns and filter as desired.
proc format library=work; value xmark 1-high='X' other =' ' ; run; Proc tabulate data=sashelp.vcolumn; where libname='YOURLIB'; class name type memname; table name*type, memname*n=' '*f=xmark. /misstext=' ' ; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @jpprovost ,
You're very clear
Maybe you could use also PROC COMPARE to get insights about differences between a previous dataset and the actual one.
The basic syntax is the following one:
proc compare base=prev_data compare=actual_data;
run;
Hope this help
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I didn't know that SAS have a COMPARE procedure. Wow.
However, my goal is to collect datas from all the files from past years and to verify that the new one is "clean". Is there a way to do that?
Could I create a macro that load all the files into one dataset and then use PROC COMPARE to compare between "all_past_files" and "latest files" ? I am not sure that such a thing would work since one set would be larger than the other...?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@jpprovost wrote:
I didn't know that SAS have a COMPARE procedure. Wow.
However, my goal is to collect datas from all the files from past years and to verify that the new one is "clean". Is there a way to do that?
Could I create a macro that load all the files into one dataset and then use PROC COMPARE to compare between "all_past_files" and "latest files" ? I am not sure that such a thing would work since one set would be larger than the other...?
No need for a "macro" to create such. SAS already does that in the background for you. They are data views in the SASHELP library such as Vtable, with information about datasets and views, Vcolumn with information about variables. These can also be accessed in Proc SQL as Dictionary Tables.
For instance one way to get information about all data sets in a library is: (replace YOURLIB with the uppercase name of your library of interest)
Proc sql; create table work.libtables as select * from dictionary.tables where libname='YOURLIB' ; run;
The resulting data set will have the data set library, name, creation and modified dates, number of observations, number of variables, information about file size, deleted observations and some other somewhat esoteric characteristics. For your purpose the dates, number of variables and number of observations are likely the most useful, especially if the number of variables changes.
The Vcolumns would be checked to see if the variables stay consistent for properties.
Replace YOURLIB with the uppercase spelling of a library of interest in the following code.
this creates a table that indicates the variable name and type on the row, the column of data set name and an X to indicate which.
If you have a variable with multiple types you can determine which data sets it changes in. You could restrict the data sets or variables to consider in the WHERE statement or create a data set from the SASHELP.VCOLUMN or Dictionary.Columns and filter as desired.
proc format library=work; value xmark 1-high='X' other =' ' ; run; Proc tabulate data=sashelp.vcolumn; where libname='YOURLIB'; class name type memname; table name*type, memname*n=' '*f=xmark. /misstext=' ' ; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@ballardw wrote:
@jpprovost wrote:I didn't know that SAS have a COMPARE procedure. Wow.
However, my goal is to collect datas from all the files from past years and to verify that the new one is "clean". Is there a way to do that?
Could I create a macro that load all the files into one dataset and then use PROC COMPARE to compare between "all_past_files" and "latest files" ? I am not sure that such a thing would work since one set would be larger than the other...?
No need for a "macro" to create such. SAS already does that in the background for you. They are data views in the SASHELP library such as Vtable, with information about datasets and views, Vcolumn with information about variables. These can also be accessed in Proc SQL as Dictionary Tables.
For instance one way to get information about all data sets in a library is: (replace YOURLIB with the uppercase name of your library of interest)
Proc sql; create table work.libtables as select * from dictionary.tables where libname='YOURLIB' ; run;The resulting data set will have the data set library, name, creation and modified dates, number of observations, number of variables, information about file size, deleted observations and some other somewhat esoteric characteristics. For your purpose the dates, number of variables and number of observations are likely the most useful, especially if the number of variables changes.
The Vcolumns would be checked to see if the variables stay consistent for properties.
Replace YOURLIB with the uppercase spelling of a library of interest in the following code.
this creates a table that indicates the variable name and type on the row, the column of data set name and an X to indicate which.
If you have a variable with multiple types you can determine which data sets it changes in. You could restrict the data sets or variables to consider in the WHERE statement or create a data set from the SASHELP.VCOLUMN or Dictionary.Columns and filter as desired.
proc format library=work; value xmark 1-high='X' other =' ' ; run; Proc tabulate data=sashelp.vcolumn; where libname='YOURLIB'; class name type memname; table name*type, memname*n=' '*f=xmark. /misstext=' ' ; run;
This is one piece of great resources you just gave to me. Thank you alot. I will use this for sure.
But the problem that I have now is that all my files are .txt files and I don't know how to "load" them all into SAS. I thought that once all loaded, I could use VTable and VColumn to do the stuff that I want to. I will continue to seek for information about loading many files with "common" names (they are all dated and recurrent, so...).
By the way, I find it amazing that SAS does all of this in the background!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
basic way to read a text file is
filename text "\\&path.\&file.";
data read_text;
infile text; *dlm="09"x;
input var1 var2 var3;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@VDD wrote:
basic way to read a text file is
filename text "\\&path.\&file."; data read_text; infile text; *dlm="09"x; input var1 var2 var3; run;
Yes, I already have my code (LOADTXT.sas) especially for those .txt files.
However, I have the list of filenames with their extension into a dataset and now I am looking to import thoses files with a macro using my code LOADTXT.SAS. I am struggling to pass into the macro to tell it to load each filename in each row.
By example, the dataset containing all the filenames from my folder looks like :
OBS FILEVAR
1 file1.txt
2 file2.txt
3 fileaza.txt
...
...
...
The macro looks like this :
%macro loop;
%do i = 1 %to 3/*&Total*/;
DATA &&name&i ;
infile "&path\&&filevar&i" MISSOVER LRECL = 153 ;
input
@1 FIELD1 10.
@11 FIELD12 $15.
@26 FIELD13 $1.
@27 FIELD14 $1.
@28 FIELD15 $2.
@30 FIELD16 $6.
@36 FIELD17 $60.
@96 FIELD18 $2.
@98 FIELD18 9.
@107 FIELD10 13.
@120 FIELD111 11.
@131 FIELD1211 12.
@143 FIELD122 $10.
@153 FIELD133 $1.
@154 FIELD144 $3.
;
run ;
%end;
%mend loop;
The place where I struggle is there :
infile "&path\&&filevar&i"
The &path is good, the &filevar also (filevar is the name of the column in the dataset where my files are listed).
However, I am not able to tell the macro to load the filename that the row i contains.
Any ideas?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is one piece of great resources you just gave to me. Thank you alot. I will use this for sure.
But the problem that I have now is that all my files are .txt files and I don't know how to "load" them all into SAS. I thought that once all loaded, I could use VTable and VColumn to do the stuff that I want to. I will continue to seek for information about loading many files with "common" names (they are all dated and recurrent, so...).
By the way, I find it amazing that SAS does all of this in the background!
If you haven't read the data into SAS then you can prevent most of the likely errors by reading them properly.
Hopefully you have some document that shows what one of these text files should look like as far as column names, variable types, values (date and expected date format, currency, or similar), maximum length of character values and order in the file.
Then READ the data with those rules.
If you have such a document you might share it with the forum and we can show how to turn it into a data step to read the data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@ballardw wrote:
This is one piece of great resources you just gave to me. Thank you alot. I will use this for sure.
But the problem that I have now is that all my files are .txt files and I don't know how to "load" them all into SAS. I thought that once all loaded, I could use VTable and VColumn to do the stuff that I want to. I will continue to seek for information about loading many files with "common" names (they are all dated and recurrent, so...).
By the way, I find it amazing that SAS does all of this in the background!
If you haven't read the data into SAS then you can prevent most of the likely errors by reading them properly.
Hopefully you have some document that shows what one of these text files should look like as far as column names, variable types, values (date and expected date format, currency, or similar), maximum length of character values and order in the file.
Then READ the data with those rules.
If you have such a document you might share it with the forum and we can show how to turn it into a data step to read the data.
I agree with you. I have a specific SAS code for those files to load correctly.
My goal is to validate that the last file produced is "as good as the last ones". I don't know if I'm "clear" enough...