07-11-2014 11:22 AM
I have around 1000 data sets and I am trying to write a code to create a proc contents for all these data sets and once I have them my second step would be to check for specific fields like account_number,DOB,sex etc and if they are available in the proc contents my second step should filter those datasets and write them to output.
Example :- I have datasets A,B,C,D,E and now I write a macro to do proc contents on each of A,B,C,D,E datasets and I create a separate output for each of them. Now my second step of code need to see if each of the new datasets created (i.e the output of the proc contents) have a set of fields in them and if they have the field then the second step should write them out.
Do any one have thoughts on the best way of doing it?
07-11-2014 11:40 AM
If the data sets are all in the same data libraray you can get PROC CONTENTS on all of them with DATA=libname._ALL_; You will want NOPRINT and OUT= option with KEEP=data set option to keep only the variables in the OUT= data that you are interested in like MEMNAME, NAME, TYPE, LENGTH etc.
You may need other or perhaps all of them depending on what you are doing.
07-11-2014 12:25 PM
Suppose your all datasets are in the same folder then following may generate the desired results.
proc contents data=SASHELP._all_ out=vars(keep=memname name); /* used SASHELP as an example */
select distinct memname from vars
where name in ('A','B','C'); /* Fields to be searched */
07-11-2014 12:58 PM
So if you have the list of variable names then a single query against DICTIONARY.COLUMNS should get the list of datasets that contain all of the variables.
%let nvars=2 ;
proc sql ;
select distinct catx('.',libname,memname) as dsname
and upcase(name) in (&varlist)
group by 1
having count(*) = &nvars
07-11-2014 01:21 PM
I wrote the below basic sas code to start with (as given below). This filters out the variables names and then it checks if specific variables are available and if they are available if writes the output. So from the title name I can identify the file name and that in turn help me to identify which dataset has these variables
CHALLENGE :- The datasets that I need to scan are spread across multiple folders and across multiple years. For example as below
Dataset :- /june/data/<port>/<rep>/<year>/<month>/<sasfile name>
here port,rep,year,month,sasfile name all are different. I could go define each of the files but that is going to take me a long time. Is there some technique that would allow me to penetrate inside the folders?
SAS CODE USED
libname rpl "/june/dev/adhoc/test/SASEG" ;
libname check "/june/dev/adhoc/test/SASEG" ;
libname bbr "/june/data/&port./repor1/2014/06/";
proc contents data=bbr.file1 noprint out=rpl.&port. (keep=name);
data check.&port. ;
set rpl.&port. ;
if NAME in ('ACCOUNT','CCSTAT','ZIPCODE') ;
proc print data= check.&port. ;
title check.&port. ;
07-11-2014 03:05 PM
You don't need PROC CONTENTS or what follows. You simply need to issue the appropriate LIBNAME statements. Then use the solution by Tom in message 6 of this thread, probably without
07-11-2014 03:11 PM
1/ You can use OS commands when the installation/configuration has been set to allow you.
In this case you can use an Unix "ls -latr" command with redirecting to analyse that file building up a dataset of folders.
When you have access to the system doing that "ls"" command but not being allowed to run that from SAS you split that up in several steps
2/ Build your SAS dataset with OS folder using SAS functions. It will be a lot of work
1 and 2/ Proceed with that SAS dataset creating the associated libnamess Put some counter in the libname def (max string length 8)
3/ Build all libnames using some macro-looping. An optimistic approach is allowed ignoring the failures of combinations that do not exist.
Having alle libnames defined use the SAShelp.Vcolumns dataset with your selection on column names.
All contents information will be automatic generated by SAS at the moment of accessing that view.
07-11-2014 03:55 PM
Sorry guys, have been trying to swim against the wave (don't hate me for not using sashelp yet !!)
let me tell you what I did so far...since the list of sas files were around 60,000 spread across multiple directories/filenames etc I decided to do a basic listing (LS) and created that as a .txt output file. Once I did that I read the .txt file at a particular position and hence my new output file just had the file names
the LS output
-rwxr-x--- 1 bag23 too8 56K Jul 1 11:40 /july/first/ny/fq/2014/06/jjssn.sas7bndx
-rwxr-x--- 1 bag23 too8 32K Jul 1 11:40 /july/first/ny/fq/2014/06/jjssn.sas7bdat
-rwxr-x--- 1 bag23 too8 56K Jul 1 11:40 /july/first/ny/fq/2014/06/jjpii.sas7bndx
-rwxr-x--- 1 bag23 too8 32K Jul 1 11:40 /july/first/ny/fq/2014/06/jjpii.sas7bdat
-rwxr-x--- 1 bag23 too8 56K Jul 1 11:40 /july/first/ny/fq/2014/06/jj.sas7bndx
NEW FILE output after selecting using the new code
now I want to read this NEW FILE such that I could get the proc print of each of these records..i.e read a record from the NEW FILE and then PRINT the proc content of first record and then read next record print the PROC CONTENT of that etc..
07-11-2014 05:34 PM
I would use a data step to parse out the paths which hold sas datasets into a libs dataset. Then
If last.path ;
CALL EXECUTE( 'libname L ' !! Quote(trim( path)) !!' ;' ) ;
CALL EXECUTE( 'PROC CONTENTS DATA=L._ALL_ NOPRINT OUT=A; ' ) ;
CALL EXECUTE( data b; set a; ');
call execute(' length path $300 ; retain path ' !! quote( trim( path )) );
call execute(' ; run; proc append base= collection data= b ; run ; ' ) ;
That provides a work.collection detailling all columns along with the path.
you might want to drop columns you don't need.
07-11-2014 05:58 PM
There's a libname function in SAS, as well as the fact that the libname statement will take multiple arguments.
libname test ('C:\Data\StatCan' ,
07-11-2014 07:11 PM
I have totally lost the big picture of what you want to do. Originally it had something to do with testing which datasets had particular variables.
If you are talking about looking at the contents of tens of thousands of SAS datasets and then printing the data in those files? That does not make much sense as who is going to read all of that output.
Now if you are talking about making a report that summarizes each dataset then it might make more sense. Say something like
FILENAME NOBS VAR1 VAR2 VAR3
/july/first/ny/fq/2014/06/jjssn.sas7bdat 1,212 YES NO NO
/july/first/ny/fq/2014/06/jjpii.sas7bdat 512 YES YES YES