BookmarkSubscribeRSS Feed
Hank
Fluorite | Level 6

Hi,

I have 64 data sets, each with unique observations but exactly the same variable names. I have made it so that before each data set name, there is a prefix "_".

Now, I want to create 1 big data set that contains all the 64 data sets.

But at the same time, in each data set, many of the variables are "defined as both character and numeric".

I am trying to create a data set, and in that data set define variables as either character or numeric, something like this:

data work.complete;

     set work._:;

var1 = input(var1, ddmmyy10.);

/* Here I try to convert one of the "defined as both character and numeric"- variable to a numeric (date) variable. Then I want to add all the other variables that needs to be defined as either numeric or character */

run;

Will it work, doing the way I am trying to do? Or could anyone suggest to me how to fix the code so that it works?

Your feedback is greatly appreciated. Smiley Happy

/H

7 REPLIES 7
RichardinOz
Quartz | Level 8

Regrettably not.

That is because the data has to be fed into the dataset before it is converted, and SAS cannot tolerate a mix of character and numeric types in a column with the same name.

The situation will be worse if some your input datasets will have different types the next time you import them.

The most satisfactory solution is to ensure all the 64 datasets have the correct type data when you import them.  You may need co-operation from your sources to ensure type consistency.

Failing this, It will be tedious for 64 datasets but your best course is to correct the type in each one before concatenating.

data _newtable1 ;

     set _table1 (rename = (var1 = _var1)) ;

     var1 = input(_var1, ddmmyy10.);

     drop _var1 ;

run ;

and so on until all _newtable datasets have consistent types.

Then you can concatenate these tables.

Richard

Hank
Fluorite | Level 6

Hi

Thanks a lot for the answer Smiley Happy

It is quite impossible to go back to the source. The problem, I think, is that many columns both have a number (an observation,  like "45") and a "." (representing missing values). Is there a way to make the transformation, suggested above, in a macro that loops through all the data sets? Like, for example:

(except the code below is probably full of errors, and I don´t know how to loop through all the data sets).

%macro changeVarType (&name=)

data _table1;

     set _&name (rename = (var1 = _var1)) ;

     var1 = input(_var1, ddmmyy10.);

     drop _var1 ;

run ;

% mend;


% changeVarType (&table1);

Thanks a lot for your help,

Best regards,

/H



RichardinOz
Quartz | Level 8

Hank

First off, in the example you gave, in a SAS table a numeric missing value is usually represented by a dot/period.  So if the data is already in a SAS table both 45 and . are of the same type.

If your data is already in 64 SAS tables, each table will have consistent types for each column, and you can tell the type from proc contents of the data table.  So even if the value looks like a character value (eg 12MAR45) it will just be the formatted representation of the underlying numeric value.

Second, are you importing these tables from CSV or tab delimited files (or other text)?  If so (and you are using SAS in Windows), there is a way of forcing the the data types on import.

Do you want to force all variables to character or numeric?

A macro solution using arrays may be possible but would need to be quite complex:

  1. detect which columns are of the wrong type
  2. detect which order they are in, in the program data vector for each table
  3. create new arrays of the required column types
  4. convert the data, dropping the 'old' columns
  5. renaming the new columns if required.

Richard

Hank
Fluorite | Level 6

I got my data as one excel document with 64 spread sheets. Is there a way to force data types on import from excel as well?

I was hoping that the macro would not be so complex. I already know which columns that needs to be changed to numeric, and the columns are the same in every data set. So the complex, "detection"-part of the macro is unnecessary (I think). Am I right?

If I am right, woult the most complex part be to have the macro loop through all 64 data sets?

Thanks a lot for your patience Smiley Happy

/H

RichardinOz
Quartz | Level 8

Hank

First up do a quick check whether the columns giving you problems are very sparse (ie mostly missing values).  SAS checks a certain number of rows to determine the type of the data and if it only encounters null values in the first N rows it will default to one type, which may not be what you want.  I think there is an option to increase the number of rows but I have not checked.

Also check whether any date columns have all cells formatted in Excel as dates (highlight the column and format the whole column, including empty cells - column names will be unaffected).

One trick that may be an easy fix is to insert a new row 2 in your excel data.  In this row insert a 9 for every number column and an A for every character column, to force the import to a specific type.  Use an identical 2nd row in each table.  This row can be removed in SAS by specifying option firstobs = 2 ;.

If this trick results in some values not being picked up, force the columns in question to character using this trick and use code in SAS to convert data into numeric.

Assuming each table now has columns var3, var5, var9 which have been imported as character but are supposed to be numeric you can do something like this (supposing var3 should be a date).  The format specification will ensure the resulting columns are in the right order

Data fulldata ;

     Format var1 .... var99 Best16. ; /* Insert column names */

     Format var3 ddmmyy10. ;     /* Overrides the previous format for var3 */

     set _table1 (firstobs = 2 rename = (var3 = _var3 var5 =_var5 var7 = _var7))

          ...

          _table64 (firstobs = 2 rename = (var3 = _var3 var5 =_var5 var7 = _var7))

          ;

          var3 = input (_var3, ?? ddmmyy10.) ; /* ?? suppresses warning messages when data is in the wrong format */

          var5 = input (_var5, comma32.) ;

          var9 = input (_var9, comma32.) ;

          Drop _var: ;

Run ;

Disclaimer: I have no way to test the above code.  But I hope you can use it

Richard

Hank
Fluorite | Level 6

@ Richard:

Thanks again, I have not test the code yet but I am sure I will need it in the future. I deal with very messy data a lot. However, this time I took a short cut and added the sheets one by one to a new "master sheet" in excel. It took a while but now I can focus on "hands on" analysis. But I like your solutions to the variable definition problem.

Have a great day Smiley Happy

/Hannes

Haikuo
Onyx | Level 15

Hi,

Here is my 2 cents:

1. If you have a business requirement/function specs, where you can find the supposed types of all the variables, then you can come up with a SAS code to go through all 64 tables, and check/correct them one by one, after that, do the concatenation.

2. If you don't have any docs defining variable types, then I suggest that you import them into SAS as all character. You will only make changes (type conversion) on a needed basis.

Haikuo

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 6355 views
  • 6 likes
  • 3 in conversation