I need to ensure that the header rows of these 6 files are identical before I combine them into one data set.
Code:
%macro getdata(num=,dsn=);
proc import datafile="G:\Departments\Research\test scores\SC_ALT\2018\School_District_of_Oconee_County_4604-Grade_&num..csv"
out=&dsn dbms=dlm;
delimiter=',';
guessingrows=all;
run;
proc contents data=&dsn varnum;
title"&dsn.";
run;
%mend;
%getdata(num=3,dsn=gr3);
%getdata(num=4,dsn=gr4);
%getdata(num=5,dsn=gr5);
%getdata(num=6,dsn=gr6);
%getdata(num=7,dsn=gr7);
%getdata(num=8,dsn=gr8);
As everyone suggested, you should use datastep instead of proc import to avoid issues. below is the code I have developed recently and you can use the same.
/* first create standard reference table */
proc sql;
create table mylib.ref_table
(IdNum char(4),
Gender char(1),
Jobcode char(3),
Salary num,
Birth num informat=date7.
format=date7.,
Hired num informat=date7.
format=date7.);
/* compare whether column names are matching or not with reference tables*/
proc sql ;
select
case when name is missing
then "ALL COLUMN NAMES MATCHED"
else name
end into :COLUMN_NAME_MATCHORNOMATCH SEPARATED by ','
from
(select coalesce(a.name, b.name) as name, count(*)
from (Select name
from Dictionary.columns
where upcase(libname)= upcase(mylib)
and upcase(memname) = upcase('ref_table'))a
full join
(Select name
from Dictionary.columns
where upcase(libname)= upcase('work')
and upcase(memname) = upcase('test_table))b
on upcase(a.name) =upcase(b.name)
where a.name is missing
or b.name is missing)
;
/* check whether order and type of column names are not matching*/
proc sql ;
select
case when name is missing
then "ALL COLUMN Type position length MATCHED"
else name
end into :COLUMN_Values_MATCHorNOMATCH SEPARATED by ','
from
(select coalesce(a.name, b.name) as name, count(*)
from (Select name, varnum, type, length
from Dictionary.columns
where upcase(libname)= upcase(mylib)
and upcase(memname) = upcase('ref_table'))a
inner join
(Select name, varnum, type, length
from Dictionary.columns
where upcase(libname)= upcase('work')
and upcase(memname) = upcase('test_table))b
on upcase(a.name) =upcase(b.name)
where a.varnum ne b.varnum
or a.type ne b.type
or a.length ne b.length
);
/* finally reporting first step in that is capture macro variables from above queries*/
Data Temp;
Length column $500.;
If index(strip("&COLUMN_NAME_MATCHORNOMATCH" ), "ALL COLUMN")>0
then column ="&COLUMN_NAME_MATCHORNOMATCH";
else column =catx(' ', "Unmacthed columns are","&COLUMN_NAME_MATCHORNOMATCH" );
output;
If index(strip("&COLUMN_Values_MATCHorNOMATCH " ), "ALL COLUMN")>0
then column ="&COLUMN_Values_MATCHorNOMATCH ";
else column =catx(' ', "Unmacthed columns for type or length or position
are","&COLUMN_Values_MATCHorNOMATCH " );
output;
run;
/* use proc report for reporting purpose*/
Title "Final info about metadata for &syslast table";
Proc report data =temp nowd;
column column;
define column/display;
compute column;
if index(column, "ALL COLUMN") = 0 then call define
(_col_, "style", "style={background = red}");
else call define
(_col_, "style", "style={background = green}");
endcomp;
run;
/* you can check above code with example below*/
/* reference table*/
proc sql;
create table ref_paylist
(IdNum char(4),
Gender char(1),
Jobcode char(3),
Salary num,
Birth num informat=date7.
format=date7.,
Hired num informat=date7.
format=date7.
);
/*test table*/
proc sql;
create table test_paylist
(IdNum char(4),
Gender char(1),
Jobcode char(3),
Salary num,
Birth num informat=date7.
format=date7.
);
/* you just need to send macro paremeters for your test table and then everything should work*/
where upcase(libname)= upcase('work')
and upcase(memname) = upcase('test_table)
Errm, what does your import agreement state? The transfer of data from one party to another requires some sort of agreement on what is sent, how it is sent, and what structure, this is minimum requirement for anything to work. Your compounding your problem by using proc import - which is a guessing procedure and may not give the correct result, or be repeatable.
For instance, what if all 6 are the same, but some numeric columns have characters?
What if there are special characters?
If you want to combine them, DO NOT USE PROC IMPORT!
proc import makes guesses about variable lengths, so different contents will lead to different types/lengths/formats, which will cause problems when you combine the files.
Write a data step (you can start with the data step you get in the log from proc import), and set the attributes for the variables according to the file specification you got with the CSVs.
As everyone suggested, you should use datastep instead of proc import to avoid issues. below is the code I have developed recently and you can use the same.
/* first create standard reference table */
proc sql;
create table mylib.ref_table
(IdNum char(4),
Gender char(1),
Jobcode char(3),
Salary num,
Birth num informat=date7.
format=date7.,
Hired num informat=date7.
format=date7.);
/* compare whether column names are matching or not with reference tables*/
proc sql ;
select
case when name is missing
then "ALL COLUMN NAMES MATCHED"
else name
end into :COLUMN_NAME_MATCHORNOMATCH SEPARATED by ','
from
(select coalesce(a.name, b.name) as name, count(*)
from (Select name
from Dictionary.columns
where upcase(libname)= upcase(mylib)
and upcase(memname) = upcase('ref_table'))a
full join
(Select name
from Dictionary.columns
where upcase(libname)= upcase('work')
and upcase(memname) = upcase('test_table))b
on upcase(a.name) =upcase(b.name)
where a.name is missing
or b.name is missing)
;
/* check whether order and type of column names are not matching*/
proc sql ;
select
case when name is missing
then "ALL COLUMN Type position length MATCHED"
else name
end into :COLUMN_Values_MATCHorNOMATCH SEPARATED by ','
from
(select coalesce(a.name, b.name) as name, count(*)
from (Select name, varnum, type, length
from Dictionary.columns
where upcase(libname)= upcase(mylib)
and upcase(memname) = upcase('ref_table'))a
inner join
(Select name, varnum, type, length
from Dictionary.columns
where upcase(libname)= upcase('work')
and upcase(memname) = upcase('test_table))b
on upcase(a.name) =upcase(b.name)
where a.varnum ne b.varnum
or a.type ne b.type
or a.length ne b.length
);
/* finally reporting first step in that is capture macro variables from above queries*/
Data Temp;
Length column $500.;
If index(strip("&COLUMN_NAME_MATCHORNOMATCH" ), "ALL COLUMN")>0
then column ="&COLUMN_NAME_MATCHORNOMATCH";
else column =catx(' ', "Unmacthed columns are","&COLUMN_NAME_MATCHORNOMATCH" );
output;
If index(strip("&COLUMN_Values_MATCHorNOMATCH " ), "ALL COLUMN")>0
then column ="&COLUMN_Values_MATCHorNOMATCH ";
else column =catx(' ', "Unmacthed columns for type or length or position
are","&COLUMN_Values_MATCHorNOMATCH " );
output;
run;
/* use proc report for reporting purpose*/
Title "Final info about metadata for &syslast table";
Proc report data =temp nowd;
column column;
define column/display;
compute column;
if index(column, "ALL COLUMN") = 0 then call define
(_col_, "style", "style={background = red}");
else call define
(_col_, "style", "style={background = green}");
endcomp;
run;
/* you can check above code with example below*/
/* reference table*/
proc sql;
create table ref_paylist
(IdNum char(4),
Gender char(1),
Jobcode char(3),
Salary num,
Birth num informat=date7.
format=date7.,
Hired num informat=date7.
format=date7.
);
/*test table*/
proc sql;
create table test_paylist
(IdNum char(4),
Gender char(1),
Jobcode char(3),
Salary num,
Birth num informat=date7.
format=date7.
);
/* you just need to send macro paremeters for your test table and then everything should work*/
where upcase(libname)= upcase('work')
and upcase(memname) = upcase('test_table)
SAS can read or import and write/export to CSV. SAS compares SAS data sets. So SAS really does not "compare" CSV files.
I hope your data sets after import/reading actually have a grade variable or combining them without one is extremely likely to limit your analysis and reporting options.
And depending on some data sources I would NOT actually expect the header rows to be the same even if the layout is the same. There are entirely too many folks happy to have a column header like "Enrollment 5th Grade" , "Fail Grade 5" or similar with the grade changing in each file
A way to use proc import to help with this sort of thing if the files are supposed to be of the same structure is to:
1) Use proc import to import one file with GUESSINGROWS=MAX.
2) Proc import will generate data step code to read a csv file. Copy it from the log into the editor.
3) verify that the expected fields have the correct types. If you have a character value that might be longer in some files, such as a school name, make sure that the generated informat is long enough to read the longest expected value (and I usually add 5 or 10 characters just in case), make sure dates have an appropriate informat and associated format. Things like Zip codes, accounts or identification fields should almost always be character and not numeric as the guessing procedures like import will do for numeric digit coded identifiers.
4) add appropriate labels
5) you may want to add code to add something like grade
6) Then the code could be wrapped in a macro changing the infile statement as you are using it and the output data set name and if needed perhaps even use your macro variable &num to assign a value to the grade variable if not in your data set
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.