07-17-2017 02:31 PM
I have a library with many datasets. I am only considered in two special sets of datasets
1st group ADS_HLK_date1, PLKS_ERT_HLK_date2, HIO_TRE_HLK_OPT_date3 etc..
(all of these datasets of interest have in common _HLK_ )
2nd group TUI_FRT_GOO_date1, HGK_ERT_GOO_999_date2, HIO_TRE_GOO_LIT_date3 etc..
(all of these datasets of interest have in common _GOO_ )
The real problem is although both sets of tables they have identical variables
NUMVAR1, CHARVAR1,NUMVAR2,NUMVAR3,NUMVAR4,NUMVAR5,CHARVAR2,CHARVAR3 etc.
in the first set (i.e._HLK_) some numeric variables NUMVAR(i) have CHARACTER FORMATS while
in the second set they have NUMERIC FORMATS
the same applies for the character variables CHARVAR(i) , in one set (i.e. _GOO_)
have character formats of certain type i.e. $CHAR20. but in the other set they appear to haave different character formats i.e. $15.
I would like to "homogenise":
1st-- the numeric formats across ALL sets (both _GOO_ and _HLK_)
by changing the LENGTH and CHARACTER FORMAT to the ones with the NUMERIC represantation
2nd-- the various character formats and LENGTHS across ALL datasets.
in order to CONCATENATE all datasets in one table.
I would more than welcome any hints/suggestions.
Thank you in advance.
07-17-2017 02:47 PM
How are these datasets created? Is there an option to fix them before you append so you don't have to do this in the first place?
07-17-2017 03:11 PM
They are csv. files and I use "Import Data" with Enterprise Guide..Unfortunately it would take me ages to correct the csv. files..
07-17-2017 04:52 PM
The original csv files are the same..
but the EG IMPORT produces the "inconsistencies"...differences..
07-17-2017 05:10 PM - edited 07-17-2017 05:12 PM
Look at this data step. It will read all files in a single folder at once, if they have the same layout. First, use PROC IMPORT on a single file. Get the code from the log - primarily the FORMAT/INFORMAT/INPUT statement.
Then, use the instructions to import all at once. This is less work than the solution to 'homogenize' will be, but at the end of the day it will be your choice. One question - especially if there are character variables, how sure are you SAS read all of them correctly and that you didn't miss anything?
07-17-2017 04:53 PM
Although I would not know how to fix that within the csv...
I am afraid I have to come up with a process in SAS
07-17-2017 04:05 PM
If these files are supposed to be the same then you do not allow any of the import tasks to guess for each input set. Use a data step to read each one and you can specify the variable names, types, lengths, formats and such.
A not uncommon practice is to use an import task (or proc import code) once and then to capture the Log generated. The task actually creates data step code that can be saved and editted with the changes you want. Once you have the data behaving as desired keep the code and change the input file name and the output dataset name.
If you have multiple file layouts then you do this once for each layout.
Some things such a variable names or formats can be changed using proc datasets on existing data but to change variable type requires either re-reading the data or a separate data step operation on each data set to rename old variable and recreate with the new name. Which may be more code work than modifying a program to read a single file layout and rereading all the similar sets.