BookmarkSubscribeRSS Feed
Zeus_Olympous
Obsidian | Level 7

Dear all,

 

I have a library with many datasets. I am only considered in two special sets of datasets

1st group ADS_HLK_date1, PLKS_ERT_HLK_date2, HIO_TRE_HLK_OPT_date3 etc..
(all of these datasets of interest have in common _HLK_ )

 

2nd group TUI_FRT_GOO_date1, HGK_ERT_GOO_999_date2, HIO_TRE_GOO_LIT_date3 etc..
(all of these datasets of interest have in common _GOO_ )

 

The real problem is although both sets of tables they have identical variables

NUMVAR1, CHARVAR1,NUMVAR2,NUMVAR3,NUMVAR4,NUMVAR5,CHARVAR2,CHARVAR3 etc.

 

in the first set (i.e._HLK_) some numeric variables NUMVAR(i) have CHARACTER FORMATS while
in the second set they have NUMERIC FORMATS


the same applies for the character variables CHARVAR(i) , in one set (i.e. _GOO_)
have character formats of certain type i.e. $CHAR20. but in the other set they appear to haave different character formats i.e. $15.

 

I would like to "homogenise":

 

1st-- the numeric formats across ALL sets (both _GOO_ and _HLK_)
by changing the LENGTH and CHARACTER FORMAT to the ones with the NUMERIC represantation

 

2nd-- the various character formats and LENGTHS across ALL datasets.

 

in order to CONCATENATE all datasets in one table.

 

I would more than welcome any hints/suggestions.

 

Thank you in advance.

7 REPLIES 7
Reeza
Super User

How are these datasets created? Is there an option to fix them before you append so you don't have to do this in the first place?

Zeus_Olympous
Obsidian | Level 7

Hi Reeza,

 

They are csv. files and I use "Import Data" with Enterprise Guide..Unfortunately it would take me ages to correct the csv. files..

 

Thank you.

Reeza
Super User

Is it the CSV files or the IMPORT that's incorrect?

Are the files the same?

Zeus_Olympous
Obsidian | Level 7

The original csv files are the same..

 

but the EG IMPORT produces the "inconsistencies"...differences..

Reeza
Super User

Look at this data step. It will read all files in a single folder at once, if they have the same layout. First, use PROC IMPORT on a single file. Get the code from the log - primarily the FORMAT/INFORMAT/INPUT statement. 

 

Then, use the instructions to import all at once. This is less work than the solution to 'homogenize' will be, but at the end of the day it will be your choice. One question - especially if there are character variables, how sure are you SAS read all of them correctly and that you didn't miss anything? 

 

https://communities.sas.com/t5/SAS-Communities-Library/How-do-I-write-a-macro-to-import-multiple-tex...

Zeus_Olympous
Obsidian | Level 7

Although I would not know how to fix that within the csv...

 

I am afraid I have to come up with a process in SAS

 

ballardw
Super User

If these files are supposed to be the same then you do not allow any of the import tasks to guess for each input set. Use a data step to read each one and you can specify the variable names, types, lengths, formats and such.

 

A not uncommon practice is to use an import task (or proc import code) once and then to capture the Log generated. The task actually creates data step code that can be saved and editted with the changes you want. Once you have the data behaving as desired keep the code and change the input file name and the output dataset name.

 

If you have multiple file layouts then you do this once for each layout.

 

Some things such a variable names or formats can be changed using proc datasets on existing data but to change variable type requires either re-reading the data or a separate data step operation on each data set to rename old variable and recreate with the new name. Which may be more code work than modifying a program to read a single file layout and rereading all the similar sets.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 830 views
  • 1 like
  • 3 in conversation