BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tesu
Calcite | Level 5

I have three folders (say, folder1, folder2, and folder3) that contain some dbf format files.

Suppose that I have:

0345.dbf, 2345.dbf, 4444.dbf, 6767.dbf, 7623.dbf in folder1

0222.dbf, 0345.dbf in folder2

3423.dbf, 4827.dbf, 5635.dbf, 7623.dbf in folder3

So, for example, "0345.dbf"s are shown folder1 and folder2. Similarly, there are two "7623.dbf"s in folder1 and folder3. They all have the variable X, and Y. But they are all different files with different contents.

I have an additional information stored in a csv file (list.csv) with two columns as follows:

0222 f2

0345 f2

2345 f1

3423 f3

4444 f1

4827 f3

5635 f3

6767 f1

7623 f3

According to that rule, I have to pick "0345.dbf" from the folder2, not from the folder1. And "7623.dbf" from the folder3, not from the folder1.

I need to collect all the nine files and save them to a folder(say, "d:/collect") following the selection rule written in the csv file above. Then, I want to do a regression for each nine dbf file:

proc reg; model Y=X; run;

How can I implement this task in SAS? Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10


try:


data list;
input fname $ id $;
cards;
1122 bg
1620 bg
1660 bg
0520 tz
0240 tr
0280 tr
0560 tr
;run;

data have;
input o fname $   id $;
cards;

                        1     1122     bg   
                        2     1620     bg   
                        3     1660     bg   
                        4     0240     tr   
                        5     0280     tr   
                        6     0560     tr   
                        7     0520     tz   
                        8     1122     tz 
      ;
     

proc sql;
create table final_selection as
select a.*
from have a,list b
where a.fname=b.fname and a.id=b.id;
  quit;
proc print;
run;
                                       Obs    o    fname    id

                                        1     1    1122     bg
                                        2     2    1620     bg
                                        3     3    1660     bg
                                        4     4    0240     tr
                                        5     5    0280     tr
                                        6     6    0560     tr
                                        7     7    0520     tz

View solution in original post

21 REPLIES 21
Reeza
Super User

How many files do you have? You say all have X and Y, but are they the same formats? Ie is X the same type of variable in all?

Although it might be inefficient, you could just read all with the folder source and file name using filename or indsname options in the input or set statement.

Then filter through what you need with a merge file.

tesu
Calcite | Level 5

The X, Y variables are numeric variables. There are lots of dbf files in many folders. I have to automate this process. Thank you.

Astounding
PROC Star

I would suggest you create 3 SAS data sets:  folder1, folder2, and folder3.  If you have separate SAS data sets for each dbf file, they can still be combined.  For example:

data folder2;

   set f2_0222 (in=in1)

        f2_0345 (in=in2);

   if in1 then source='0222';

   else if in2 then source='0345';

run;

Having created 3 SAS data sets, I would transfer the control file to a SAS data set, and create a list of records to extract.  For example, assuming the SAS data set CONTROL_FILE contains DBF_FILE and FOLDER as the variable names:

proc sql noprint;

   select quote(dbf_file) into : f1 separated by ', '  from control_file where folder="f1";

   select quote(dbf_file) into : f2 separated by ', '  from control_file where folder="f2";

   select quote(dbf_file) into : f3 separated by ', '  from control_file where folder="f3";

quit;

This will create three macro variables where &F2, for example, will look like this:

"0222", "0345"

Finally, extract the proper records:

data analysis;

   set folder1 (where = (source in (&F1)))

        folder2 (where = (source in (&F2)))

        folder3 (where = (source in (&F3)));

run;

Good luck.  Check back if you get stuck at some point.

Astounding
PROC Star

Do you know how to read a dbf file into SAS?  Once you take it that far, you'll get plenty of help with the rest of the code.

Ksharp
Super User

I would like to take the power of OS command.

Assuming c:\test\ contains floder1 floder2 ............

data list;
input fname $ dname $;
id=input(compress(dname,,'kd'),best8.);
cards;
0222 f2
0345 f2
2345 f1
3423 f3
4444 f1
4827 f3
5635 f3
6767 f1
7623 f3
;
run;
filename x pipe 'dir c:\test\*.txt /s /b';
data have;
infile x;
input path $100.;
fname=scan(path,-2,'.\');
id=input(compress(scan(path,-3,'.\'),,'kd'),best8.);
run;
proc sql;
 select path 
  from have 
   where fname in (select fname from list) and
         id in (select id from list);
quit;

Ksharp

tesu
Calcite | Level 5

filename x pipe 'dir c:\test\*.txt /s /b';

This statement does not seem to collect all the path+files if files with same name are scattered across the folder1 - folder3.

Linlin
Lapis Lazuli | Level 10

changing

filename x pipe 'dir c:\test\*.txt /s /b';

to

filename x pipe 'dir c:\test\*.dbf /s /b';

tesu
Calcite | Level 5

Thank you for the reply. I already changed it. I check the Ksharp's pipe statement does not collect all the path+dbf information. If I have 0345.dbf both in folder1 and in folder3, the pipe command reads only one of two a.dbf files. What we have to do is to create the path variable, which has data values as follows:

...

c:\test\folder1\0345.dbf

...

c:\test\folder3\0345.dbf

...

But the pipe fails to collect both.

*update*

And I found that the pipe command does not collect all the dbf files in a single folder. For example, my folder1 has 13 dbf files. But after running the pipe command just for the folder1 (filename x pipe 'dir C:\test\folder1\*.dbf /s /b';), I have only 6 records in the path variable.

Linlin
Lapis Lazuli | Level 10

Hi,

I created three folders under c:\temp\forum and saved several .txt files.

%macro test;

%do i=1 %to 3;

filename x pipe "dir /b c:\temp\forum\folder&i.\*.txt " lrecl=100;

data folder&i ;

     folder=cats('f',&i);

     length fname $20;

     infile x length=reclen ;

     input fname $varying20. reclen ;

run;

%end;

%mend;

%test

data want;

  set folder:;

run;

proc print;run;

                            obs folder     fname

                              1       f1      123.txt

                              2       f1      345.txt

                              3       f2      123.txt

                              4       f2      888.txt

                              5       f3      123.txt

                              6       f3      888.txt

Ksharp
Super User

I tested it again. No problem.

c:\test\ is parent directory .

Using command

dir c:\test\*.dbf /s /b

can list all of dbf file under test directory ,include its sub-directory.

Ksharp

Linlin
Lapis Lazuli | Level 10

Hi Ksharp,

when I ran the code below, it only gave me the last .txt file in a folder.

filename x pipe 'dir c:\temp\*.txt /s /b';

data have;

infile x;

input path $100.;

fname=scan(path,-2,'.\');

*id=input(compress(scan(path,-3,'.\'),,'kd'),best8.);

run;

Thanks - Linlin

Tom
Super User Tom
Super User

Try just seeing what your command is generating.  If you are running interactive SAS just issue the command INCLUDE X from the command line.  Otherwise run this simple data step.

data _null_; infile x; input; list; run;

Ksharp
Super User

Ha. I make a mistake here , I must add Truncover option to avoid the wrong result.

data list;
input fname $ dname $;
id=input(compress(dname,,'kd'),best8.);
cards;
0222 f2
0345 f2
2345 f1
3423 f3
4444 f1
4827 f3
5635 f3
6767 f1
7623 f3
;
run;
filename x pipe 'dir c:\test\*.txt /s /b';
data have;
infile x truncover;
input path $100.;
fname=scan(path,-2,'.\');
id=input(compress(scan(path,-3,'.\'),,'kd'),best8.);
run;
proc sql;
 select path 
  from have 
   where fname in (select fname from list) and
         id in (select id from list);
quit;


Ksharp

tesu
Calcite | Level 5

data list;
input fname $ id $;
cards;
1122 bg
1620 bg
1660 bg
0520 tz
0240 tr
0280 tr
0560 tr
;run;
proc contents data=list;run;

                             Alphabetic List of Variables and Attributes

                                    #    Variable    Type    Len

                                    1    fname       Char      8
                                    2    id          Char      8


filename x pipe 'dir D:\ex\sas_ex\Ksharp\*.dbf /s /b';
data have;
length fname $8. id $8.;
infile x truncover;
input path $100.;
fname=scan(path,-2,'.\');
id = scan(path,5,'\');
run;
proc contents data=have; run;

                             Alphabetic List of Variables and Attributes

                                    #    Variable    Type    Len

                                    1    fname       Char      8
                                    2    id          Char      8
                                    3    path        Char    100


proc print data=have; run;

                        Obs    fname    id                 path

                        1     1122     bg    D:\ex\sas_ex\Ksharp\bg\1122.dbf
                        2     1620     bg    D:\ex\sas_ex\Ksharp\bg\1620.dbf
                        3     1660     bg    D:\ex\sas_ex\Ksharp\bg\1660.dbf
                        4     0240     tr    D:\ex\sas_ex\Ksharp\tr\0240.dbf
                        5     0280     tr    D:\ex\sas_ex\Ksharp\tr\0280.dbf
                        6     0560     tr    D:\ex\sas_ex\Ksharp\tr\0560.dbf
                        7     0520     tz    D:\ex\sas_ex\Ksharp\tz\0520.dbf
                        8     1122     tz    D:\ex\sas_ex\Ksharp\tz\1122.dbf

proc sql;
create table final_selection as
select path
from have
where fname in (select fname from list) and
  id in (select id from list);
quit;

proc print data=final_selection; run;

                               Obs                 path

                                1     D:\ex\sas_ex\Ksharp\bg\1122.dbf
                                2     D:\ex\sas_ex\Ksharp\bg\1620.dbf
                                3     D:\ex\sas_ex\Ksharp\bg\1660.dbf
                                4     D:\ex\sas_ex\Ksharp\tr\0240.dbf
                                5     D:\ex\sas_ex\Ksharp\tr\0280.dbf
                                6     D:\ex\sas_ex\Ksharp\tr\0560.dbf
                                7     D:\ex\sas_ex\Ksharp\tz\0520.dbf
                                8     D:\ex\sas_ex\Ksharp\tz\1122.dbf


I still see the problem here. From the list, fname='1122' is of 'bg'. In the final_selection data set, look at the obs=8: D:\ex\sas_ex\Ksharp\tz\1122.dbf. The whole point of this project is not to include the obs=8. But I still see the '1122.dbf' again from the 'tz' folder!

Am I missing something here? Thanks.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 1415 views
  • 6 likes
  • 6 in conversation