BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hk19
Calcite | Level 5

Hi! I have an assignment where I need to vertically combine data sets (set, not merge) and they are 12 separate csv files. I have imported the data but 2 variables are listed as both character and numeric variables. Their observations are reported numerically (height and alcohol, reported as number of drinks) so I would like to change both to character variables. I have been trying to complete this operation in the final data step where I combine all 12. I have previously tried creating new variables where I multiply the old variable by 1 and I have included my current code below. All 12 files have been imported without issue, I am now running into problems as I try to merge them. I am also using SAS studio and it says that there are 0 observations, despite there being 528 combined throughout the 12 data sets. 

 

Should I be looking for a solution in my data step or fixing each variable as I import the data? Thank you for your help! 

 

/*Example of last import, done successfully*/

proc import out=D12
datafile = '/home/u42933802/CM/DN012.csv'
DBMS= csv replace;
getnames=yes;
run;

 

/*Merge all files*/
libname CMEA '/home/u42933802/CM';
data final_data;
set D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12;

orig_var = 'Alcohol';
Drinks = input(Alcohol,8.);
drop Alcohol;
rename Drinks = Alcohol;

orig_var = 'Height';
Centimeters = input(Height,8.);
drop Height;
rename Centimeters = Height;

run;

 

proc print data=final_data (obs=528);
run;

 

ERROR: Variable Height has been defined as both character and numeric.
ERROR: Variable Alcohol has been defined as both character and numeric.
ERROR: Variable Alcohol has been defined as both character and numeric.
ERROR: Variable Alcohol has been defined as both character and numeric.
ERROR: Variable Height has been defined as both character and numeric.
ERROR: Variable Alcohol has been defined as both character and numeric.
ERROR: Variable Alcohol has been defined as both character and numeric.
ERROR: Variable Alcohol has been defined as both character and numeric.
ERROR: Variable Alcohol has been defined as both character and numeric.
ERROR: Variable Alcohol has been defined as both character and numeric.
74
75 orig_var = 'Alcohol';
76 Drinks = input(Alcohol,8.);
77 drop Alcohol;
78 rename Drinks = Alcohol;
79
80 orig_var = 'Height';
81 Centimeters = input(Height,8.);
82 drop Height;
83 rename Centimeters = Height;
84
85 run;
 
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
76:19 81:24
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.FINAL_DATA may be incomplete. When this step was stopped there were 0 observations and 9 variables.
WARNING: Data set WORK.FINAL_DATA was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 3974.68k
OS Memory 36316.00k
Timestamp 02/18/2020 10:49:16 PM
Step Count 438 Switch Count 0
Page Faults 0
Page Reclaims 528
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 16
 
 
86
87 proc print data=final_data (obs=528);
88 run;
 
NOTE: No observations in data set WORK.FINAL_DATA.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 740.12k
OS Memory 33196.00k
Timestamp 02/18/2020 10:49:16 PM
Step Count 439 Switch Count 0
Page Faults 0
Page Reclaims 65
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
 
 
89
90
91 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
102
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

PROC IMPORT will look at a specific CSV file and try to guess how to define the variables based on the set of text values in that one file.

It will generate data step code that you should see in the log.  You could start with that as a model for the code you need to create.

But it is really easy to write the code to read a CSV from scratch without using PROC IMPORT at all.

Basic structure is like this:

data want;
  infile 'myfile.txt' dsd firstobs=2 truncover;
  length firstvar 8 var2 $30 var3 8 lastvar $40;
  input firstvar -- lastvar;
run;

The DATA statement names the dataset you want to create. The INFILE statement tells SAS where to find the lines of text and that you want to treat it as delimited and skip the header line.  If your file is not a  true CSV file, but some other type of delimited then you can use the DLM= option on the INFILE statement to tell it what character(s) to treat as the delimiter.  The LENGTH statement defines the type of the variables (and for character variables the length). Note for numeric variables just use 8 since SAS stores all numbers as 8 byte floating point values.  Then INPUT statement tells it which variables to read. You can abbreviate it by using a positional variable list like in this example.

If you have DATE, TIME or DATETIME values (or possibly other things like numbers with commas or $ in them) then you can add a INFORMAT statement to tell SAS how to convert the text in the file into the right numbers.  For DATE, TIME and DATETIME values you will want to attach an appropriate FORMAT so they print in a human readable style.  The format you choose does not have to match how it was stored as text in the source file. It just needs to be appropriate for the values.  

View solution in original post

10 REPLIES 10
ChrisNZ
Tourmaline | Level 20

>  I have been trying to complete this operation in the final data step

 

The variables have to be clean when the data sets are merged.

Either change them before, or rename the bad ones as you merge.

 

  set D1-D4

      D5(rename=(HEIGHT=HEIGHT_NUM))

      D6-D8

      D9(rename=(HEIGHT=HEIGHT_NUM))

      D10-D12;

 

and then process them as needed.

hk19
Calcite | Level 5

Thank you for your help! I used your suggesting and renamed both variables after importing prior to merging the data sets. 

 

set D1(rename=(Height=Height_Num Alcohol=Alcohol_Num));

 

I still got the same error code when I merged the cleaned data together though? 

ChrisNZ
Tourmaline | Level 20

If you still have this error,it means the data is not clean.

Check that all variables with the same name have the same type (num or char) in all tables, and hopefully the same length too.

 

 

Tom
Super User Tom
Super User

Why did you use PROC IMPORT?  That has to guess how to define the variables.  Just write your own data steps to read the CSV files.

 

Are the CSV files supposed to have the same structure?  If they do you can read all 12 in one data step and eliminate the need for a data step to set them together.

hk19
Calcite | Level 5

That's how I knew how to import csv files. Each file has the same structure in terms of variable names, but they are defined differently in each of the 12 sets in terms of character or numeric type and need to be converted to numeric before I can merge them all into 1 final set.

Tom
Super User Tom
Super User

@hk19 wrote:

That's how I knew how to import csv files. Each file has the same structure in terms of variable names, but they are defined differently in each of the 12 sets in terms of character or numeric type and need to be converted to numeric before I can merge them all into 1 final set.


A CSV file does not have any place where it could define the type of a variable. The only metadata it can have is the (optional) header line that you can use as a hint for what names to use for each column.

hk19
Calcite | Level 5

After downloading the CSV files to SAS studio and running them, I was able to see a table of variables and attributes showing which data sets showed character vs numeric variables. A PROC IMPORT operation was performed in the log by running this in SAS Studio. I am still not sure what other operation I should perform besides the set statement to make sure that I can make 2 variables numeric instead of conflicting, as I reported my error code. 

ChrisNZ
Tourmaline | Level 20

1. You can use proc contents to see the variable types.

 

2. When you run proc import, you can see the data step that's generated displayed in the log. 

Use and alter that code to import the data exactly as intended in a consistent manner for all tables.

 

Tom
Super User Tom
Super User

PROC IMPORT will look at a specific CSV file and try to guess how to define the variables based on the set of text values in that one file.

It will generate data step code that you should see in the log.  You could start with that as a model for the code you need to create.

But it is really easy to write the code to read a CSV from scratch without using PROC IMPORT at all.

Basic structure is like this:

data want;
  infile 'myfile.txt' dsd firstobs=2 truncover;
  length firstvar 8 var2 $30 var3 8 lastvar $40;
  input firstvar -- lastvar;
run;

The DATA statement names the dataset you want to create. The INFILE statement tells SAS where to find the lines of text and that you want to treat it as delimited and skip the header line.  If your file is not a  true CSV file, but some other type of delimited then you can use the DLM= option on the INFILE statement to tell it what character(s) to treat as the delimiter.  The LENGTH statement defines the type of the variables (and for character variables the length). Note for numeric variables just use 8 since SAS stores all numbers as 8 byte floating point values.  Then INPUT statement tells it which variables to read. You can abbreviate it by using a positional variable list like in this example.

If you have DATE, TIME or DATETIME values (or possibly other things like numbers with commas or $ in them) then you can add a INFORMAT statement to tell SAS how to convert the text in the file into the right numbers.  For DATE, TIME and DATETIME values you will want to attach an appropriate FORMAT so they print in a human readable style.  The format you choose does not have to match how it was stored as text in the source file. It just needs to be appropriate for the values.  

hk19
Calcite | Level 5

Thank you so much for your help, I never learned how to do this before and this ended up working. Thank you for teaching me!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 9622 views
  • 0 likes
  • 3 in conversation