BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jpprovost
Quartz | Level 8

Hi folks,

 

I don't know if I am in the right section of this forum for my question. If not, I am sorry.

 

Here's my question. I have a multiple data files produced each month. I would like to know if there's a way to use PROC CONTENTS in a way to ensure that the data in the latest textfile (ex: Sep 2019) are "good". By good, I mean that the data are similar from the last files, the number of data is constant or increasing, etc. I don't know if my question is clear. I want to find a way to program a SAS code that will "check" each new produced file to look for any irregularities comparing to the past files.

 

I know that PROC contents produce great informations like type of variables, # of obs, type of variable, etc. I don't know if there's a way to use thoses informations to gather all of them into one dataset and use it to check if our newest datas are "potent".

 

Since English isn't my native language, I hope that my question for you guys is clear enough.

 

I was thinking of gathering the output of PROC CONTENTS for each file into a dataset. By example, If I keep only the variable NAME, TYPE, LENGTH, VARNUM, NPOS, NOBS, is there a way to have it like this :

 

Columns : NAME, TYPE, LENGTH, VARNUM, NPOS, NOBS

Lines : Datas of each files

 

I really hope that I'm clear...

 

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@jpprovost wrote:

I didn't know that SAS have a COMPARE procedure. Wow.

However, my goal is to collect datas from all the files from past years and to verify that the new one is "clean". Is there a way to do that?

 

Could I create a macro that load all the files into one dataset and then use PROC COMPARE to compare between "all_past_files" and "latest files" ? I am not sure that such a thing would work since one set would be larger than the other...?


No need for a "macro" to create such. SAS already does that in the background for you. They are data views in the SASHELP library such as Vtable, with information about datasets and views, Vcolumn with information about variables. These can also be accessed in Proc SQL as Dictionary Tables.

 

For instance one way to get information about all data sets in a library  is: (replace YOURLIB with the uppercase name of your library of interest)

Proc sql;
   create table work.libtables as
   select *
   from dictionary.tables
   where libname='YOURLIB'
   ;
run;

The resulting data set will have the data set library, name, creation and modified dates, number of observations, number of variables, information about file size, deleted observations and some other somewhat esoteric characteristics. For your purpose the dates, number of variables and number of observations are likely the most useful, especially if the number of variables changes.

The Vcolumns would be checked to see if the variables stay consistent for properties.

Replace YOURLIB with the uppercase spelling of a library of interest in the following code.

this creates a table that indicates the variable name and type on the row, the column of data set name and an X to indicate which.

If you have a variable with multiple types you can determine which data sets it changes in. You could restrict the data sets or variables to consider in the WHERE statement or create a data set from the SASHELP.VCOLUMN or Dictionary.Columns and filter as desired.

proc format library=work;
value xmark
1-high='X'
other =' '
;
run;
Proc tabulate data=sashelp.vcolumn;
   where libname='YOURLIB';
   class name type memname;
   table name*type,
         memname*n=' '*f=xmark.
         /misstext=' '
   ;
run;

 

View solution in original post

8 REPLIES 8
ed_sas_member
Meteorite | Level 14

Hi @jpprovost ,

 

You're very clear Smiley Happy

Maybe you could use also PROC COMPARE to get insights about differences between a previous dataset and the actual one.

The basic syntax is the following one:

proc compare base=prev_data compare=actual_data;
run;

Hope this help

jpprovost
Quartz | Level 8

I didn't know that SAS have a COMPARE procedure. Wow.

However, my goal is to collect datas from all the files from past years and to verify that the new one is "clean". Is there a way to do that?

 

Could I create a macro that load all the files into one dataset and then use PROC COMPARE to compare between "all_past_files" and "latest files" ? I am not sure that such a thing would work since one set would be larger than the other...?

ballardw
Super User

@jpprovost wrote:

I didn't know that SAS have a COMPARE procedure. Wow.

However, my goal is to collect datas from all the files from past years and to verify that the new one is "clean". Is there a way to do that?

 

Could I create a macro that load all the files into one dataset and then use PROC COMPARE to compare between "all_past_files" and "latest files" ? I am not sure that such a thing would work since one set would be larger than the other...?


No need for a "macro" to create such. SAS already does that in the background for you. They are data views in the SASHELP library such as Vtable, with information about datasets and views, Vcolumn with information about variables. These can also be accessed in Proc SQL as Dictionary Tables.

 

For instance one way to get information about all data sets in a library  is: (replace YOURLIB with the uppercase name of your library of interest)

Proc sql;
   create table work.libtables as
   select *
   from dictionary.tables
   where libname='YOURLIB'
   ;
run;

The resulting data set will have the data set library, name, creation and modified dates, number of observations, number of variables, information about file size, deleted observations and some other somewhat esoteric characteristics. For your purpose the dates, number of variables and number of observations are likely the most useful, especially if the number of variables changes.

The Vcolumns would be checked to see if the variables stay consistent for properties.

Replace YOURLIB with the uppercase spelling of a library of interest in the following code.

this creates a table that indicates the variable name and type on the row, the column of data set name and an X to indicate which.

If you have a variable with multiple types you can determine which data sets it changes in. You could restrict the data sets or variables to consider in the WHERE statement or create a data set from the SASHELP.VCOLUMN or Dictionary.Columns and filter as desired.

proc format library=work;
value xmark
1-high='X'
other =' '
;
run;
Proc tabulate data=sashelp.vcolumn;
   where libname='YOURLIB';
   class name type memname;
   table name*type,
         memname*n=' '*f=xmark.
         /misstext=' '
   ;
run;

 

jpprovost
Quartz | Level 8

@ballardw wrote:

@jpprovost wrote:

I didn't know that SAS have a COMPARE procedure. Wow.

However, my goal is to collect datas from all the files from past years and to verify that the new one is "clean". Is there a way to do that?

 

Could I create a macro that load all the files into one dataset and then use PROC COMPARE to compare between "all_past_files" and "latest files" ? I am not sure that such a thing would work since one set would be larger than the other...?


No need for a "macro" to create such. SAS already does that in the background for you. They are data views in the SASHELP library such as Vtable, with information about datasets and views, Vcolumn with information about variables. These can also be accessed in Proc SQL as Dictionary Tables.

 

For instance one way to get information about all data sets in a library  is: (replace YOURLIB with the uppercase name of your library of interest)

Proc sql;
   create table work.libtables as
   select *
   from dictionary.tables
   where libname='YOURLIB'
   ;
run;

The resulting data set will have the data set library, name, creation and modified dates, number of observations, number of variables, information about file size, deleted observations and some other somewhat esoteric characteristics. For your purpose the dates, number of variables and number of observations are likely the most useful, especially if the number of variables changes.

The Vcolumns would be checked to see if the variables stay consistent for properties.

Replace YOURLIB with the uppercase spelling of a library of interest in the following code.

this creates a table that indicates the variable name and type on the row, the column of data set name and an X to indicate which.

If you have a variable with multiple types you can determine which data sets it changes in. You could restrict the data sets or variables to consider in the WHERE statement or create a data set from the SASHELP.VCOLUMN or Dictionary.Columns and filter as desired.

proc format library=work;
value xmark
1-high='X'
other =' '
;
run;
Proc tabulate data=sashelp.vcolumn;
   where libname='YOURLIB';
   class name type memname;
   table name*type,
         memname*n=' '*f=xmark.
         /misstext=' '
   ;
run;

 


This is one piece of great resources you just gave to me. Thank you alot. I will use this for sure.

But the problem that I have now is that all my files are .txt files and I don't know how to "load" them all into SAS. I thought that once all loaded, I could use VTable and VColumn to do the stuff that I want to. I will continue to seek for information about loading many files with "common" names (they are all dated and recurrent, so...).

 

By the way, I find it amazing that SAS does all of this in the background!

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

basic way to read a text file is

filename text "\\&path.\&file.";

data read_text;
   infile text; *dlm="09"x;
   input var1 var2 var3;
run;
jpprovost
Quartz | Level 8

@VDD wrote:

basic way to read a text file is

filename text "\\&path.\&file.";

data read_text;
   infile text; *dlm="09"x;
   input var1 var2 var3;
run;

Yes, I already have my code (LOADTXT.sas) especially for those .txt files.

 

However, I have the list of filenames with their extension into a dataset and now I am looking to import thoses files with a macro using my code LOADTXT.SAS. I am struggling to pass into the macro to tell it to load each filename in each row.

 

By example, the dataset containing all the filenames from my folder looks like :

 

OBS      FILEVAR
1        file1.txt
2        file2.txt
3        fileaza.txt
...
...
...

The macro looks like this :

 

%macro loop;
%do i = 1 %to 3/*&Total*/;
  DATA &&name&i ;
  infile "&path\&&filevar&i" MISSOVER LRECL = 153 ;
  input
	@1   FIELD1    10.
	@11  FIELD12   $15.
	@26  FIELD13   $1.
	@27  FIELD14   $1.
	@28  FIELD15   $2.
	@30  FIELD16   $6.
	@36  FIELD17   $60.
	@96  FIELD18   $2.
	@98  FIELD18   9.
	@107 FIELD10   13.
	@120 FIELD111   11.
	@131 FIELD1211  12.
	@143 FIELD122  $10.
	@153 FIELD133  $1.
	@154 FIELD144  $3.
	;
run ;
%end;
%mend loop;

The place where I struggle is there :

 

infile "&path\&&filevar&i"

 

The &path is good, the &filevar also (filevar is the name of the column in the dataset where my files are listed).

However, I am not able to tell the macro to load the filename that the row i contains.

 

Any ideas?

ballardw
Super User

 


This is one piece of great resources you just gave to me. Thank you alot. I will use this for sure.

But the problem that I have now is that all my files are .txt files and I don't know how to "load" them all into SAS. I thought that once all loaded, I could use VTable and VColumn to do the stuff that I want to. I will continue to seek for information about loading many files with "common" names (they are all dated and recurrent, so...).

 

By the way, I find it amazing that SAS does all of this in the background!


If you haven't read the data into SAS then you can prevent most of the likely errors by reading them properly.

Hopefully you have some document that shows what one of these text files should look like as far as column names, variable types, values (date and expected date format, currency, or similar), maximum length of character values and order in the file.

Then READ the data with those rules.

If you have such a document you might share it with the forum and we can show how to turn it into a data step to read the data.

jpprovost
Quartz | Level 8

@ballardw wrote:

 


This is one piece of great resources you just gave to me. Thank you alot. I will use this for sure.

But the problem that I have now is that all my files are .txt files and I don't know how to "load" them all into SAS. I thought that once all loaded, I could use VTable and VColumn to do the stuff that I want to. I will continue to seek for information about loading many files with "common" names (they are all dated and recurrent, so...).

 

By the way, I find it amazing that SAS does all of this in the background!


If you haven't read the data into SAS then you can prevent most of the likely errors by reading them properly.

Hopefully you have some document that shows what one of these text files should look like as far as column names, variable types, values (date and expected date format, currency, or similar), maximum length of character values and order in the file.

Then READ the data with those rules.

If you have such a document you might share it with the forum and we can show how to turn it into a data step to read the data.


I agree with you. I have a specific SAS code for those files to load correctly.

My goal is to validate that the last file produced is "as good as the last ones". I don't know if I'm "clear" enough...

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1163 views
  • 2 likes
  • 4 in conversation