BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GreggB
Pyrite | Level 9

I need to ensure that the header rows of these 6 files are identical before I combine them into one data set.

 

Code:

 

%macro getdata(num=,dsn=);
proc import datafile="G:\Departments\Research\test scores\SC_ALT\2018\School_District_of_Oconee_County_4604-Grade_&num..csv"
out=&dsn dbms=dlm;
delimiter=',';
guessingrows=all;
run;
proc contents data=&dsn varnum;
title"&dsn.";
run;
%mend;
%getdata(num=3,dsn=gr3);
%getdata(num=4,dsn=gr4);
%getdata(num=5,dsn=gr5);
%getdata(num=6,dsn=gr6);
%getdata(num=7,dsn=gr7);
%getdata(num=8,dsn=gr8);

1 ACCEPTED SOLUTION

Accepted Solutions
kiranv_
Rhodochrosite | Level 12

As everyone suggested, you should use datastep instead of proc import to avoid issues.  below is the code I have developed recently and you can use the same.

 

/* first create standard reference table */
proc sql;
   create table mylib.ref_table
       (IdNum char(4),
        Gender char(1),
        Jobcode char(3),
        Salary num,
        Birth num informat=date7.
                  format=date7.,
        Hired num informat=date7.
                  format=date7.);


/* compare whether column names are matching or not with reference tables*/

proc sql ;
select 
       case when name is missing 
            then "ALL COLUMN NAMES MATCHED"
             else name
       end into :COLUMN_NAME_MATCHORNOMATCH SEPARATED by ','
       from 
           (select coalesce(a.name, b.name) as name, count(*)
                   from (Select name 
                         from   Dictionary.columns
                         where upcase(libname)= upcase(mylib)
                           and upcase(memname) = upcase('ref_table'))a
     full join
            (Select name 
                         from   Dictionary.columns
                         where upcase(libname)= upcase('work')
                           and upcase(memname) = upcase('test_table))b
      on upcase(a.name) =upcase(b.name)
      where a.name is missing
or b.name is missing) ; /* check whether order and type of column names are not matching*/ proc sql ; select case when name is missing then "ALL COLUMN Type position length MATCHED" else name end into :COLUMN_Values_MATCHorNOMATCH SEPARATED by ',' from (select coalesce(a.name, b.name) as name, count(*) from (Select name, varnum, type, length from Dictionary.columns where upcase(libname)= upcase(mylib) and upcase(memname) = upcase('ref_table'))a inner join (Select name, varnum, type, length from Dictionary.columns where upcase(libname)= upcase('work') and upcase(memname) = upcase('test_table))b on upcase(a.name) =upcase(b.name) where a.varnum ne b.varnum or a.type ne b.type or a.length ne b.length );
/* finally reporting first step in that is capture macro variables from above queries*/ Data Temp; Length column $500.; If index(strip("&COLUMN_NAME_MATCHORNOMATCH" ), "ALL COLUMN")>0 then column ="&COLUMN_NAME_MATCHORNOMATCH"; else column =catx(' ', "Unmacthed columns are","&COLUMN_NAME_MATCHORNOMATCH" ); output; If index(strip("&COLUMN_Values_MATCHorNOMATCH " ), "ALL COLUMN")>0 then column ="&COLUMN_Values_MATCHorNOMATCH "; else column =catx(' ', "Unmacthed columns for type or length or position are","&COLUMN_Values_MATCHorNOMATCH " ); output; run; /* use proc report for reporting purpose*/ Title "Final info about metadata for &syslast table"; Proc report data =temp nowd; column column; define column/display; compute column; if index(column, "ALL COLUMN") = 0 then call define (_col_, "style", "style={background = red}"); else call define (_col_, "style", "style={background = green}"); endcomp; run; /* you can check above code with example below*/ /* reference table*/ proc sql; create table ref_paylist (IdNum char(4), Gender char(1), Jobcode char(3), Salary num, Birth num informat=date7. format=date7., Hired num informat=date7. format=date7. ); /*test table*/ proc sql; create table test_paylist (IdNum char(4), Gender char(1), Jobcode char(3), Salary num, Birth num informat=date7. format=date7. ); /* you just need to send macro paremeters for your test table and then everything should work*/ where upcase(libname)= upcase('work') and upcase(memname) = upcase('test_table)

 

 

View solution in original post

5 REPLIES 5
tomrvincent
Rhodochrosite | Level 12
Union the first row from each file and then dedupe. If you get just 1 row, they are identical.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Errm, what does your import agreement state?  The transfer of data from one party to another requires some sort of agreement on what is sent, how it is sent, and what structure, this is minimum requirement for anything to work.  Your compounding your problem by using proc import - which is a guessing procedure and may not give the correct result, or be repeatable.  

For instance, what if all 6 are the same, but some numeric columns have characters?  

What if there are special characters?

 

Kurt_Bremser
Super User

If you want to combine them, DO NOT USE PROC IMPORT!

proc import makes guesses about variable lengths, so different contents will lead to different types/lengths/formats, which will cause problems when you combine the files.

Write a data step (you can start with the data step you get in the log from proc import), and set the attributes for the variables according to the file specification you got with the CSVs.

kiranv_
Rhodochrosite | Level 12

As everyone suggested, you should use datastep instead of proc import to avoid issues.  below is the code I have developed recently and you can use the same.

 

/* first create standard reference table */
proc sql;
   create table mylib.ref_table
       (IdNum char(4),
        Gender char(1),
        Jobcode char(3),
        Salary num,
        Birth num informat=date7.
                  format=date7.,
        Hired num informat=date7.
                  format=date7.);


/* compare whether column names are matching or not with reference tables*/

proc sql ;
select 
       case when name is missing 
            then "ALL COLUMN NAMES MATCHED"
             else name
       end into :COLUMN_NAME_MATCHORNOMATCH SEPARATED by ','
       from 
           (select coalesce(a.name, b.name) as name, count(*)
                   from (Select name 
                         from   Dictionary.columns
                         where upcase(libname)= upcase(mylib)
                           and upcase(memname) = upcase('ref_table'))a
     full join
            (Select name 
                         from   Dictionary.columns
                         where upcase(libname)= upcase('work')
                           and upcase(memname) = upcase('test_table))b
      on upcase(a.name) =upcase(b.name)
      where a.name is missing
or b.name is missing) ; /* check whether order and type of column names are not matching*/ proc sql ; select case when name is missing then "ALL COLUMN Type position length MATCHED" else name end into :COLUMN_Values_MATCHorNOMATCH SEPARATED by ',' from (select coalesce(a.name, b.name) as name, count(*) from (Select name, varnum, type, length from Dictionary.columns where upcase(libname)= upcase(mylib) and upcase(memname) = upcase('ref_table'))a inner join (Select name, varnum, type, length from Dictionary.columns where upcase(libname)= upcase('work') and upcase(memname) = upcase('test_table))b on upcase(a.name) =upcase(b.name) where a.varnum ne b.varnum or a.type ne b.type or a.length ne b.length );
/* finally reporting first step in that is capture macro variables from above queries*/ Data Temp; Length column $500.; If index(strip("&COLUMN_NAME_MATCHORNOMATCH" ), "ALL COLUMN")>0 then column ="&COLUMN_NAME_MATCHORNOMATCH"; else column =catx(' ', "Unmacthed columns are","&COLUMN_NAME_MATCHORNOMATCH" ); output; If index(strip("&COLUMN_Values_MATCHorNOMATCH " ), "ALL COLUMN")>0 then column ="&COLUMN_Values_MATCHorNOMATCH "; else column =catx(' ', "Unmacthed columns for type or length or position are","&COLUMN_Values_MATCHorNOMATCH " ); output; run; /* use proc report for reporting purpose*/ Title "Final info about metadata for &syslast table"; Proc report data =temp nowd; column column; define column/display; compute column; if index(column, "ALL COLUMN") = 0 then call define (_col_, "style", "style={background = red}"); else call define (_col_, "style", "style={background = green}"); endcomp; run; /* you can check above code with example below*/ /* reference table*/ proc sql; create table ref_paylist (IdNum char(4), Gender char(1), Jobcode char(3), Salary num, Birth num informat=date7. format=date7., Hired num informat=date7. format=date7. ); /*test table*/ proc sql; create table test_paylist (IdNum char(4), Gender char(1), Jobcode char(3), Salary num, Birth num informat=date7. format=date7. ); /* you just need to send macro paremeters for your test table and then everything should work*/ where upcase(libname)= upcase('work') and upcase(memname) = upcase('test_table)

 

 

ballardw
Super User

SAS can read or import and write/export to CSV. SAS compares SAS data sets. So SAS really does not "compare" CSV files.

 

I hope your data sets after import/reading actually have a grade variable or combining them without one is extremely likely to limit your analysis and reporting options.

 

And depending on some data sources I would NOT actually expect the header rows to be the same even if the layout is the same. There are entirely too many folks happy to have a column header like "Enrollment 5th Grade" , "Fail Grade 5" or similar with the grade changing in each file

 

A way to use proc import to help with this sort of thing if the files are supposed to be of the same structure is to:

 

1) Use proc import to import one file with GUESSINGROWS=MAX.

2) Proc import will generate data step code to read a csv file. Copy it from the log into the editor.

3) verify that the expected fields have the correct types. If you have a character value that might be longer in some files, such as a school name, make sure that the generated informat is long enough to read the longest expected value (and I usually add 5 or 10 characters just in case), make sure dates have an appropriate informat and associated format. Things like Zip codes, accounts or identification fields should almost always be character and not numeric as the guessing procedures like import will do for numeric digit coded identifiers.

4) add appropriate labels

5) you may want to add code to add something like grade

6) Then the code could be wrapped in a macro changing the infile statement as you are using it and the output data set name and if needed perhaps even use your macro variable &num to assign a value to the grade variable if not in your data set

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2539 views
  • 0 likes
  • 6 in conversation