I have three folders (say, folder1, folder2, and folder3) that contain some dbf format files.
Suppose that I have:
0345.dbf, 2345.dbf, 4444.dbf, 6767.dbf, 7623.dbf in folder1
0222.dbf, 0345.dbf in folder2
3423.dbf, 4827.dbf, 5635.dbf, 7623.dbf in folder3
So, for example, "0345.dbf"s are shown folder1 and folder2. Similarly, there are two "7623.dbf"s in folder1 and folder3. They all have the variable X, and Y. But they are all different files with different contents.
I have an additional information stored in a csv file (list.csv) with two columns as follows:
0222 f2
0345 f2
2345 f1
3423 f3
4444 f1
4827 f3
5635 f3
6767 f1
7623 f3
According to that rule, I have to pick "0345.dbf" from the folder2, not from the folder1. And "7623.dbf" from the folder3, not from the folder1.
I need to collect all the nine files and save them to a folder(say, "d:/collect") following the selection rule written in the csv file above. Then, I want to do a regression for each nine dbf file:
proc reg; model Y=X; run;
How can I implement this task in SAS? Thanks!
try:
data list;
input fname $ id $;
cards;
1122 bg
1620 bg
1660 bg
0520 tz
0240 tr
0280 tr
0560 tr
;run;
data have;
input o fname $ id $;
cards;
1 1122 bg
2 1620 bg
3 1660 bg
4 0240 tr
5 0280 tr
6 0560 tr
7 0520 tz
8 1122 tz
;
proc sql;
create table final_selection as
select a.*
from have a,list b
where a.fname=b.fname and a.id=b.id;
quit;
proc print;
run;
Obs o fname id
1 1 1122 bg
2 2 1620 bg
3 3 1660 bg
4 4 0240 tr
5 5 0280 tr
6 6 0560 tr
7 7 0520 tz
How many files do you have? You say all have X and Y, but are they the same formats? Ie is X the same type of variable in all?
Although it might be inefficient, you could just read all with the folder source and file name using filename or indsname options in the input or set statement.
Then filter through what you need with a merge file.
The X, Y variables are numeric variables. There are lots of dbf files in many folders. I have to automate this process. Thank you.
I would suggest you create 3 SAS data sets: folder1, folder2, and folder3. If you have separate SAS data sets for each dbf file, they can still be combined. For example:
data folder2;
set f2_0222 (in=in1)
f2_0345 (in=in2);
if in1 then source='0222';
else if in2 then source='0345';
run;
Having created 3 SAS data sets, I would transfer the control file to a SAS data set, and create a list of records to extract. For example, assuming the SAS data set CONTROL_FILE contains DBF_FILE and FOLDER as the variable names:
proc sql noprint;
select quote(dbf_file) into : f1 separated by ', ' from control_file where folder="f1";
select quote(dbf_file) into : f2 separated by ', ' from control_file where folder="f2";
select quote(dbf_file) into : f3 separated by ', ' from control_file where folder="f3";
quit;
This will create three macro variables where &F2, for example, will look like this:
"0222", "0345"
Finally, extract the proper records:
data analysis;
set folder1 (where = (source in (&F1)))
folder2 (where = (source in (&F2)))
folder3 (where = (source in (&F3)));
run;
Good luck. Check back if you get stuck at some point.
Do you know how to read a dbf file into SAS? Once you take it that far, you'll get plenty of help with the rest of the code.
I would like to take the power of OS command.
Assuming c:\test\ contains floder1 floder2 ............
data list; input fname $ dname $; id=input(compress(dname,,'kd'),best8.); cards; 0222 f2 0345 f2 2345 f1 3423 f3 4444 f1 4827 f3 5635 f3 6767 f1 7623 f3 ; run; filename x pipe 'dir c:\test\*.txt /s /b'; data have; infile x; input path $100.; fname=scan(path,-2,'.\'); id=input(compress(scan(path,-3,'.\'),,'kd'),best8.); run; proc sql; select path from have where fname in (select fname from list) and id in (select id from list); quit;
Ksharp
filename x pipe 'dir c:\test\*.txt /s /b';
This statement does not seem to collect all the path+files if files with same name are scattered across the folder1 - folder3.
changing
filename x pipe 'dir c:\test\*.txt /s /b';
to
filename x pipe 'dir c:\test\*.dbf /s /b';
Thank you for the reply. I already changed it. I check the Ksharp's pipe statement does not collect all the path+dbf information. If I have 0345.dbf both in folder1 and in folder3, the pipe command reads only one of two a.dbf files. What we have to do is to create the path variable, which has data values as follows:
...
c:\test\folder1\0345.dbf
...
c:\test\folder3\0345.dbf
...
But the pipe fails to collect both.
*update*
And I found that the pipe command does not collect all the dbf files in a single folder. For example, my folder1 has 13 dbf files. But after running the pipe command just for the folder1 (filename x pipe 'dir C:\test\folder1\*.dbf /s /b';), I have only 6 records in the path variable.
Hi,
I created three folders under c:\temp\forum and saved several .txt files.
%macro test;
%do i=1 %to 3;
filename x pipe "dir /b c:\temp\forum\folder&i.\*.txt " lrecl=100;
data folder&i ;
folder=cats('f',&i);
length fname $20;
infile x length=reclen ;
input fname $varying20. reclen ;
run;
%end;
%mend;
%test
data want;
set folder:;
run;
proc print;run;
obs folder fname
1 f1 123.txt
2 f1 345.txt
3 f2 123.txt
4 f2 888.txt
5 f3 123.txt
6 f3 888.txt
I tested it again. No problem.
c:\test\ is parent directory .
Using command
dir c:\test\*.dbf /s /b
can list all of dbf file under test directory ,include its sub-directory.
Ksharp
Hi Ksharp,
when I ran the code below, it only gave me the last .txt file in a folder.
filename x pipe 'dir c:\temp\*.txt /s /b';
data have;
infile x;
input path $100.;
fname=scan(path,-2,'.\');
*id=input(compress(scan(path,-3,'.\'),,'kd'),best8.);
run;
Thanks - Linlin
Try just seeing what your command is generating. If you are running interactive SAS just issue the command INCLUDE X from the command line. Otherwise run this simple data step.
data _null_; infile x; input; list; run;
Ha. I make a mistake here , I must add Truncover option to avoid the wrong result.
data list; input fname $ dname $; id=input(compress(dname,,'kd'),best8.); cards; 0222 f2 0345 f2 2345 f1 3423 f3 4444 f1 4827 f3 5635 f3 6767 f1 7623 f3 ; run; filename x pipe 'dir c:\test\*.txt /s /b'; data have; infile x truncover; input path $100.; fname=scan(path,-2,'.\'); id=input(compress(scan(path,-3,'.\'),,'kd'),best8.); run; proc sql; select path from have where fname in (select fname from list) and id in (select id from list); quit;
Ksharp
data list;
input fname $ id $;
cards;
1122 bg
1620 bg
1660 bg
0520 tz
0240 tr
0280 tr
0560 tr
;run;
proc contents data=list;run;
Alphabetic List of Variables and Attributes
# Variable Type Len
1 fname Char 8
2 id Char 8
filename x pipe 'dir D:\ex\sas_ex\Ksharp\*.dbf /s /b';
data have;
length fname $8. id $8.;
infile x truncover;
input path $100.;
fname=scan(path,-2,'.\');
id = scan(path,5,'\');
run;
proc contents data=have; run;
Alphabetic List of Variables and Attributes
# Variable Type Len
1 fname Char 8
2 id Char 8
3 path Char 100
proc print data=have; run;
Obs fname id path
1 1122 bg D:\ex\sas_ex\Ksharp\bg\1122.dbf
2 1620 bg D:\ex\sas_ex\Ksharp\bg\1620.dbf
3 1660 bg D:\ex\sas_ex\Ksharp\bg\1660.dbf
4 0240 tr D:\ex\sas_ex\Ksharp\tr\0240.dbf
5 0280 tr D:\ex\sas_ex\Ksharp\tr\0280.dbf
6 0560 tr D:\ex\sas_ex\Ksharp\tr\0560.dbf
7 0520 tz D:\ex\sas_ex\Ksharp\tz\0520.dbf
8 1122 tz D:\ex\sas_ex\Ksharp\tz\1122.dbf
proc sql;
create table final_selection as
select path
from have
where fname in (select fname from list) and
id in (select id from list);
quit;
proc print data=final_selection; run;
Obs path
1 D:\ex\sas_ex\Ksharp\bg\1122.dbf
2 D:\ex\sas_ex\Ksharp\bg\1620.dbf
3 D:\ex\sas_ex\Ksharp\bg\1660.dbf
4 D:\ex\sas_ex\Ksharp\tr\0240.dbf
5 D:\ex\sas_ex\Ksharp\tr\0280.dbf
6 D:\ex\sas_ex\Ksharp\tr\0560.dbf
7 D:\ex\sas_ex\Ksharp\tz\0520.dbf
8 D:\ex\sas_ex\Ksharp\tz\1122.dbf
I still see the problem here. From the list, fname='1122' is of 'bg'. In the final_selection data set, look at the obs=8: D:\ex\sas_ex\Ksharp\tz\1122.dbf. The whole point of this project is not to include the obs=8. But I still see the '1122.dbf' again from the 'tz' folder!
Am I missing something here? Thanks.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.