BookmarkSubscribeRSS Feed
anoopm7
Calcite | Level 5

Hello,

I have around 1000 data sets and I am trying to write a code to create a proc contents for all these data sets and once I have them my second step would be to check for specific fields like account_number,DOB,sex etc and if they are available in the proc contents my second step should filter those datasets and write them to output.

Example :- I have datasets A,B,C,D,E and now I write a macro to do proc contents on each of A,B,C,D,E datasets and I create a separate output for each of them. Now my second step of code need to see if each of the new datasets created (i.e the output of the proc contents) have a set of fields in them and if they have the field then the second step should write them out.

Do any one have thoughts on the best way of doing it?

28 REPLIES 28
jakarman
Barite | Level 11

Use the sashelp.v- members as view to them all. But be aware the processing can be slow.

---->-- ja karman --<-----
PaigeMiller
Diamond | Level 26

Don't you mean SASHELP.VCOLUMN instead of SASHELP.VMEMBER?

--
Paige Miller
jakarman
Barite | Level 11

And sashelp.vtable  .... I yes but was too lazy

---->-- ja karman --<-----
data_null__
Jade | Level 19

If the data sets are all in the same data libraray you can get PROC CONTENTS on all of them with DATA=libname._ALL_;  You will want NOPRINT and OUT= option with KEEP=data set option to keep only the variables in the OUT= data that you are interested in like MEMNAME, NAME, TYPE, LENGTH etc. 

You may need other or perhaps all of them depending on what you are doing.

stat_sas
Ammonite | Level 13

Hi,

Suppose your all datasets are in the same folder then following may generate the desired results.

proc contents data=SASHELP._all_ out=vars(keep=memname name); /* used SASHELP as an example */
run;

proc sql;
select distinct memname from vars
where name in ('A','B','C'); /* Fields to be searched */
quit;

Tom
Super User Tom
Super User

So if you have the list of variable names then a single query against DICTIONARY.COLUMNS should get the list of datasets that contain all of the variables.


%let varlist='NAME','AGE';

%let nvars=2 ;

proc sql  ;

  select distinct catx('.',libname,memname) as dsname

    from dictionary.columns

    where libname='SASHELP'

      and upcase(name) in (&varlist)

    group by 1

    having count(*) = &nvars

  ;

quit;

anoopm7
Calcite | Level 5

I wrote the below basic sas code to start with (as given below). This filters out the variables names and then it checks if specific variables are available and if they are available if writes the output. So from the title name I can identify the file name and that in turn help me to identify which dataset has these variables

CHALLENGE :- The datasets that I need to scan are spread across multiple folders and across multiple years. For example as below

Dataset :- /june/data/<port>/<rep>/<year>/<month>/<sasfile name>

here port,rep,year,month,sasfile name all are different. I could go define each of the files but that is going to take me a long time. Is there some technique that would allow me to penetrate inside the folders?

--------

SAS CODE USED

--------

libname rpl "/june/dev/adhoc/test/SASEG" ;

libname check "/june/dev/adhoc/test/SASEG" ;

%macro x(port);

libname bbr "/june/data/&port./repor1/2014/06/";

proc contents data=bbr.file1 noprint out=rpl.&port. (keep=name);

run ;

data check.&port. ;

set rpl.&port. ;

if NAME in ('ACCOUNT','CCSTAT','ZIPCODE') ;

run ;

proc print data= check.&port. ;

title check.&port. ;

run ;

%mend ;

%x(zone1);

%x(zone2);

%x(zone3);

Reeza
Super User

Yes, you can use standard OS commands.

Google Recursively list all files in a directory in (YOUR OS)

Here's a link for Unix:

https://kb.iu.edu/d/abef

PaigeMiller
Diamond | Level 26

You don't need PROC CONTENTS or what follows. You simply need to issue the appropriate LIBNAME statements. Then use the solution by Tom in message 6 of this thread, probably without

libname='SASHELP'.


--
Paige Miller
jakarman
Barite | Level 11

Several ways::

1/ You can use OS commands when the installation/configuration has been set to allow you.
In this case you can use an Unix "ls -latr" command with redirecting to analyse that file building up a dataset of folders.
When you have access to the system doing that "ls"" command but not being allowed to run that from SAS you split that up in several steps

2/ Build your SAS dataset with OS folder using SAS functions. It will be a lot of work

1 and 2/ Proceed with that SAS dataset creating the associated libnamess Put some counter in the libname def (max string length 😎

3/ Build all libnames using some macro-looping. An optimistic approach is allowed ignoring the failures of combinations that do not exist.

Having alle libnames defined use the SAShelp.Vcolumns dataset with your selection on column names.

All contents information will be automatic generated by SAS at the moment of accessing that view. 

---->-- ja karman --<-----
anoopm7
Calcite | Level 5

Sorry guys, have been trying to swim against the wave (don't hate me for not using sashelp yet !!)

let me tell you what I did so far...since the list of sas files were around 60,000 spread across multiple directories/filenames etc I decided to do a basic listing (LS) and created that as a .txt output file. Once I did that I read the .txt file at a particular position and hence my new output file just had the file names

the LS output

-rwxr-x---   1 bag23  too8      56K Jul  1 11:40 /july/first/ny/fq/2014/06/jjssn.sas7bndx

-rwxr-x---   1 bag23  too8      32K Jul  1 11:40 /july/first/ny/fq/2014/06/jjssn.sas7bdat

-rwxr-x---   1 bag23  too8      56K Jul  1 11:40 /july/first/ny/fq/2014/06/jjpii.sas7bndx

-rwxr-x---   1 bag23  too8      32K Jul  1 11:40 /july/first/ny/fq/2014/06/jjpii.sas7bdat

-rwxr-x---   1 bag23  too8      56K Jul  1 11:40 /july/first/ny/fq/2014/06/jj.sas7bndx

NEW FILE output after selecting using the new code

/july/first/ny/fq/2014/06/jjssn.sas7bndx

/july/first/ny/fq/2014/06/jjssn.sas7bdat

/july/first/ny/fq/2014/06/jjpii.sas7bndx

/july/first/ny/fq/2014/06/jjpii.sas7bdat

/july/first/ny/fq/2014/06/jj.sas7bndx 

now I want to read this NEW FILE such that I could get the proc print of each of these records..i.e read a record from the NEW FILE and then PRINT the proc content of first record and then read next record print the PROC CONTENT of that etc..

Peter_C
Rhodochrosite | Level 12

I would use a data step to parse out the paths which hold sas datasets into a libs dataset. Then

data ;

Set libs;

By path;

If last.path ;

CALL EXECUTE( 'libname L ' !! Quote(trim( path)) !!' ;' ) ;

CALL EXECUTE( 'PROC CONTENTS DATA=L._ALL_ NOPRINT OUT=A; ' ) ;

CALL EXECUTE( data b; set a; ');

call execute(' length path $300 ; retain path ' !! quote( trim( path )) );

call execute(' ; run; proc append base= collection data= b ; run ; ' ) ;

Run ;

That provides a work.collection detailling all columns along with the path.

you might want to drop columns you don't need.

Reeza
Super User

There's a libname function in SAS, as well as the fact that the libname statement will take multiple arguments.

SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition

libname  test ('C:\Data\StatCan' ,

  'C:\Data\StatCan\ShapeToTab');

Tom
Super User Tom
Super User

I have totally lost the big picture of what you want to do.  Originally it had something to do with testing which datasets had particular variables.

If you are talking about looking at the contents of tens of thousands of SAS datasets and then printing the data in those files?  That does not make much sense as who is going to read all of that output.

Now if you are talking about making a report that summarizes each dataset then it might make more sense.  Say something like

FILENAME                                   NOBS   VAR1  VAR2  VAR3

/july/first/ny/fq/2014/06/jjssn.sas7bdat  1,212    YES    NO    NO

/july/first/ny/fq/2014/06/jjpii.sas7bdat    512    YES    YES   YES



sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 28 replies
  • 3062 views
  • 29 likes
  • 8 in conversation