BookmarkSubscribeRSS Feed
Peter_C
Rhodochrosite | Level 12

I thought the query was to report the tables which contained certain columns.

My proposal supports that query by creating a table with the full library contents info with all column info + path, for each folder of sas datasets. 

The next stage would filter to show only rows for those columns required. And then summarise whatever. 

A libname concatenating all paths will hide a table in a subfolder a/b/c/ if a table of the same name is also in a higher level folder like a/b/.

jakarman
Barite | Level 11

You cannot read those file *.sas7- files as text files. The way to read them is by library references that are on the folder names.

You are need the tree of folders without any files on the OS level. The a attribute in: "ls -latr" I inlcuding all hidden files witn the . (current folder) an .. (upper level folder)
The bndx files types are indexfiles associated with a dataset with the same name. Some more types are possible.

The folders have an other indication at the file attributes, starting with the letter d (directory).

Go for the folders not files and connect them to libnames.

---->-- ja karman --<-----
jakarman
Barite | Level 11

Peter, I would also not suggest concatenation of libraries in this moment. It is excellent with version management using catalog member types (eg formats) but not for data-processing.

The op's question is looking like data investigation on a environment where a lot of segregated SAS datasets have been created in a lot of folders.
As the perception on folders and SAS datasets with Unix is not correct it is not the experienced SAS analyst, may be a new one taken over work, a new technical person or an auditor. 

---->-- ja karman --<-----
Peter_C
Rhodochrosite | Level 12

Jaap

you describe a suituation I find when arriving at a new client.

It is really handy to register contents of all atasets in the environment.

Then I become aware of terminology that the "locals" use for specific keys and other attributes. For the same reason I list all .sas code files then parse macro names of macros.

Usually it finds more overlap than justified by version history.

It isn't as good as a full impact analysis but might have more scope.

Peter.c

anoopm7
Calcite | Level 5

My apologies if I am not clear.. Let me explain it with some sample codes

The below is the code

libname rpl "/temp/test/SASEG" ;

libname check "/temp/test/SASEG" ;

filename dd "/temp/test/SASEG/sample.txt" ;

data test;

infile dd missover;

input @54 filename : $80. ;

run;

proc print data=test;

run;

****The out put of the test using the above proc print is below.

QUESTION

1) I have such a SAS dataset (a list of datasets). Now how do I use that SAS dataset to do a PROC CONTENTS on each of the observation in that file?

2) If that could be done how should I write it down as a output ?

3) Then I would check the output of the proc contents to see if they have some variables that Iam interested and if  the variables are present that I want to report that dataset.


Obs

filename

1

/july/q2/bc/ringreport/2014/06/bankcard.sas7bndx

2

/july/q2/bc/ringreport/2014/06/bankcard.sas7bdat

3

/july/q2/bc/ringreport/2014/06/bank_co.sas7bndx

4

/july/q2/bc/ringreport/2014/06/bank_co.sas7bdat

5

/july/q2/bc/ringreport/2014/06/multran.sas7bndx

6

/july/q2/bc/ringreport/2014/06/multran.sas7bdat

7

/july/q2/avco/ringreport/2014/06/avco.sas7bndx

8

/july/q2/avco/ringreport/2014/06/avco.sas7bdat

9

/july/q2/citiusa/ringreport/2014/06/bankcard.sas7bndx

10

/july/q2/citiusa/ringreport/2014/06/bankcard.sas7bdat

11

/july/q2/citiusa/ringreport/2014/06/bank_co.sas7bndx

12

/july/q2/citiusa/ringreport/2014/06/bank_co.sas7bdat

13

/july/q2/acb/ringreport/2014/06/acb.sas7bndx

14

/july/q2/acb/ringreport/2014/06/acb.sas7bdat

15

/july/q2/comm/ringreport/2014/06/coml.sas7bndx

16

/july/q2/comm/ringreport/2014/06/coml.sas7bdat

17

/july/q2/comm/ringreport/2014/06/coml_co.sas7bndx

18

/july/q2/comm/ringreport/2014/06/coml_co.sas7bdat

19

/july/q2/avco/ringreport/2014/06/chdi.sas7bndx

20

/july/q2/avco/ringreport/2014/06/chdi.sas7bdat

21

/july/q2/avco/ringreport/2014/06/chda.sas7bndx

22

/july/q2/avco/ringreport/2014/06/chda.sas7bdat

23

/july/q2/citiusa/ringreport/2014/06/chda.sas7bndx

24

/july/q2/citiusa/ringreport/2014/06/chda.sas7bdat

25

/july/q2/citiusa/ringreport/2014/06/chdi.sas7bndx

26

/july/q2/citiusa/ringreport/2014/06/chdi.sas7bdat

27

/july/q2/acb/ringreport/2014/06/chda.sas7bndx

28

/july/q2/acb/ringreport/2014/06/chda.sas7bdat

29

/july/q2/acb/ringreport/2014/06/chdi.sas7bndx

30

/july/q2/acb/ringreport/2014/06/chdi.sas7bdat

31

/july/q2/comm/ringreport/2014/06/chda.sas7bndx

32

/july/q2/comm/ringreport/2014/06/chda.sas7bdat

33

/july/q2/comm/ringreport/2014/06/chdi.sas7bndx

34

/july/q2/comm/ringreport/2014/06/chdi.sas7bdat

35

/july/q2/rpl/ringreport/2014/06/chdi.sas7bndx

36

/july/q2/rpl/ringreport/2014/06/chdi.sas7bdat

37

/july/q2/rpl/ringreport/2014/06/chda.sas7bndx

38

/july/q2/rpl/ringreport/2014/06/chda.sas7bdat

39

/july/q2/rpl/ringreport/2014/06/cardseg.sas7bdat

40

/july/q2/bestbuy/ringreport/2014/06/chdi.sas7bndx

41

/july/q2/bestbuy/ringreport/2014/06/chdi.sas7bdat

42

/july/q2/bestbuy/ringreport/2014/06/chda.sas7bndx

43

/july/q2/bestbuy/ringreport/2014/06/chda.sas7bdat

44

/july/q2/bestbuy/ringreport/2014/06/cardseg.sas7bdat

45

/july/q2/thd/ringreport/2014/06/chdi.sas7bndx

46

/july/q2/thd/ringreport/2014/06/chdi.sas7bdat

47

/july/q2/thd/ringreport/2014/06/chda.sas7bndx

48

/july/q2/thd/ringreport/2014/06/chda.sas7bdat

49

/july/q2/thd/ringreport/2014/06/cardseg.sas7bdat

50

/july/q2/sears1/ringreport/2014/06/chdi.sas7bndx

51

/july/q2/sears1/ringreport/2014/06/chdi.sas7bdat

52

/july/q2/sears1/ringreport/2014/06/chda.sas7bndx

53

/july/q2/sears1/ringreport/2014/06/chda.sas7bdat

54

/july/q2/sears1/ringreport/2014/06/cardseg.sas7bdat

Reeza
Super User

1. sas7bndx are Index files and not SAS data files, so I'll assume you want to ignore those.

2. It sounds like you're doing data exploration by looking at all .sas7bdat files and the variables within. This isn't a method I'd ever recommend and would actually recommend against this. You may not understand the rules behind the variable creation and variables with the same names may not mean the same thing.

3. Generally its faster to query the SASHELP.VCOLUMN Table than to look at the proc contents output.

I'm only accounting for up to 99 different libnames. If you have more, you'll need to modify the program by using z3. or the appropriate number of digits.

The solution requested is below, as well as the query of the sashelp.vcolumn table instead.

The libname function can also be used.

data have;

length path $60.;

path="/july/q2/bc/ringreport/2014/06/bankcard.sas7bndx"; output;

path="/july/q2/bc/ringreport/2014/06/bankcard.sas7bdat"; output;

path="/july/q2/bc/ringreport/2014/06/bank_co.sas7bndx"; output;

path="/july/q2/bc/ringreport/2014/06/bank_co.sas7bdat"; output;

path="/july/q2/bc/ringreport/2014/06/multran.sas7bndx"; output;

path="/july/q2/bc/ringreport/2014/06/multran.sas7bdat"; output;

path="/july/q2/avco/ringreport/2014/06/avco.sas7bndx"; output;

path="/july/q2/avco/ringreport/2014/06/avco.sas7bdat"; output;

path="/july/q2/citiusa/ringreport/2014/06/bankcard.sas7bndx"; output;

path="/july/q2/citiusa/ringreport/2014/06/bankcard.sas7bdat"; output;

path="/july/q2/citiusa/ringreport/2014/06/bank_co.sas7bndx"; output;

path="/july/q2/citiusa/ringreport/2014/06/bank_co.sas7bdat"; output;

run;

data want;

    set have;

    libpath=substr(path, 1, length(path)-index(reverse(trim(path)), "/"));

    extension=scan(path, -1, "/.");

    if extension='sas7bndx' then delete;

    file=scan(path, -2, '/.');

run;

proc sort data=want; by libpath file;

run;

*create library and run proc contents;

data _null_;

    set want;

    by libpath;

    retain id 0 libref;

    if first.libpath then do;

        id+1;

        libref='temp'||put(id, z2.);

        rc=libname('temp'||put(id, z2.), quote(libpath));

    end;

    file_ref=catx(".", libref, file);

    call execute ('proc contents data='||file_ref||';run;');

run;

*get variable information into dataset;

proc sql;

    create table column listing as

    select a.*, b.path

    from sashelp.vcolumn as a

    left join sashelp.vlibnam as b

    on upper(a.libname)=upper(b.libname)

    where upper(a.libname) like 'TEMP%';

quit;

*Clear library references;

data _null_;

    set want;

    by libpath;

    retain id 0;

    if first.libpath then do;

        id+1;

        rc=libname('temp'||put(id, z2.));

    end;

run;

Tom
Super User Tom
Super User

You will need to generate code from your list of files. You will only want to examine the SAS datasets and can ignore the index files that are included in your example list.

Note that you can use the quoted filename rather than creating a libref and using a two level SAS dataset name.

Here is an example for how to generate a PROC CONTENTS call for each file and then append selected variables from the proc contents into a single large file.  You could then query the final ALL dataset to look for those members that include the variables of interest.

data test ;

  input filename $80.;

cards;

C:\Program Files\SASHome\x86\SASFoundation\9.4\core\sashelp\class.sas7bdat

;;;

filename code temp ;

data _null_ ;

   set TEST ;

   where scan(filename,-1,'.')='sas7bdat';

   file code ;

   put 'proc contents data=' filename :$quote. 'noprint out=contents; run;' ;

   if _n_=1 then put

     'data all ;'

   / '  stop;'

   / '  set TEST (keep=filename) contents(keep=memname name type length label format:);'

   / 'run;'

   ;

   put 'proc sql;'

     / 'insert into all '

     / '  select ' filename :quote. ',*'

     / '  from contents(keep=memname name type length label format:)'

     / ';'

     / 'quit;'

  ;

run;

%include code /source2;

proc print;

run;

Peter_C
Rhodochrosite | Level 12

Anoopm

I thought my earlier code offerred you the solution.

- collecting proc contents results (the OUT=) from each folder

Best of luck

jakarman
Barite | Level 11

You cannot acces sas-datasets on their own name, you have to go to the folders. Get a list of the Folders like:

Foldername
/july/q2/bc/ringreport/2014/06
/july/q2/avco/ringreport/2014/06
/july/q2/citiusa/ringreport/2014/06
/july/q2/acb/ringreport/2014/06
/july/q2/comm/ringreport/2014/06
     --- (duplicates?)  I see some duplicates in the folders. Not knowing what you have done. sortin and deduplicating is necessary

/july/q2/rpl/ringreport/2014/06
/july/q2/bestbuy/ringreport/2014/06
/july/q2/thd/ringreport/2014/06
/july/q2/sears1/ringreport/2014/06

Insisting on your list you can shorten the names finding the last "/" can be done with findc and a negative startposition.

I would prefer a listing with just the foldernames to simplify

The goal to achive is:

libname q2bc         "/july/q2/bc/ringreport/2014/06" ;
libname q2avco     "/july/q2/avco/ringreport/2014/06" ;
libname q2citi        "/july/q2/citiusa/ringreport/2014/06" ;
libname q2acb       "/july/q2/acb/ringreport/2014/06" ;
libname q2comm   "/july/q2/comm/ringreport/2014/06" ;
libname q2rpl         "/july/q2/rpl/ringreport/2014/06" ;
libname q2best      "/july/q2/bestbuy/ringreport/2014/06" ;
libname q2thd        "/july/q2/thd/ringreport/2014/06" ;
libname q2sears    "/july/q2/sears1/ringreport/2014/06" ;

Using the call execute SAS(R) 9.3 Functions and CALL Routines: Reference having those foldernames.
The to be solve issue here is the length of the lbname reference that is not allowed to be longer as 8. I shortened some by hand.

Will this in you situation go well or do you need an addition number translation.  

Than access the sashelp.vnames dataset and you will see all names in all libname members.

Storing this dataset to an other location (create) and analyze that as you like.

---->-- ja karman --<-----
anoopm7
Calcite | Level 5

Jaap, I can use the libname if I were to handle a handful of files. But here I am doing a variable check on around 50,000 files.

jakarman
Barite | Level 11

Anoopm7 you did not tell home many folders you have.
The relation is libname <-> folders   the number of files is irrelevant.  The SAS datasets are members in a libname.  The SAS Contents is going into libnames with all those members in it.

You have to let go the OS Unix directory listing on files. You need an OS listing done only for the folders.

  
Try what I have done with a small subset manually understand the issues in namings and then go for the automatization for all folders/libnames.

---->-- ja karman --<-----
Reeza
Super User

You can use a direct file reference in the set or data=statement though...

libname sample 'C:\_LOCALdata\Temp';

data sample.class;

set sashelp.class;

run;

libname sample;

data test;

set 'C:\_LOCALdata\Temp\Class.sas7bdat';

run;

proc contents data='C:\_LOCALdata\Temp\Class.sas7bdat';

run;

Reeza
Super User

I still don't recommend this, because you can't possibly go through the output of 50K files manually with any sort of accuracy.

data want;

    set have;

    libpath=substr(path, 1, length(path)-index(reverse(trim(path)), "/"));

    extension=scan(path, -1, "/.");

    if extension='sas7bndx' then delete;

        call execute ('proc contents data='||quote(path)||';run;');

run;

anoopm7
Calcite | Level 5

Alright..here is the final code that helped me achieve that

I really appreciate all your solutions to help me derive the below code !!

options obs=max;

filename dd "/july/dev/adhoc/SASEG/sample.txt" ;

X "cat /master/july/adhoc//SASEG/sample.txt|awk '{print $9}'>  /july/dev/adhoc//newsamp.txt";

data test;

infile "/july/adhoc/adhoc/newsamp.txt";

input @1 flname : $80. ;

if index(flname,'pii') > 0 or index(flname,'ssn') > 0 or index(flname,'sas7bndx') >0 then delete ;

run;

proc print;

proc sql;

select count(*) into:totrecs from test;

quit;

%macro conts;

data _null_;

set test;

call symput('fln'||compress(_n_),compress(flname));

run;

%do i=1 %to &totrecs.;

proc contents DATA="&&fln&i." noprint OUT=july&i.(keep=name);

run;

proc print data=july&i.;

%end;

%mend conts;

%conts;

%put _user_;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 28 replies
  • 3118 views
  • 29 likes
  • 8 in conversation