Hi! I have an assignment where I need to vertically combine data sets (set, not merge) and they are 12 separate csv files. I have imported the data but 2 variables are listed as both character and numeric variables. Their observations are reported numerically (height and alcohol, reported as number of drinks) so I would like to change both to character variables. I have been trying to complete this operation in the final data step where I combine all 12. I have previously tried creating new variables where I multiply the old variable by 1 and I have included my current code below. All 12 files have been imported without issue, I am now running into problems as I try to merge them. I am also using SAS studio and it says that there are 0 observations, despite there being 528 combined throughout the 12 data sets.
Should I be looking for a solution in my data step or fixing each variable as I import the data? Thank you for your help!
/*Example of last import, done successfully*/
proc import out=D12
datafile = '/home/u42933802/CM/DN012.csv'
DBMS= csv replace;
getnames=yes;
run;
/*Merge all files*/
libname CMEA '/home/u42933802/CM';
data final_data;
set D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12;
orig_var = 'Alcohol';
Drinks = input(Alcohol,8.);
drop Alcohol;
rename Drinks = Alcohol;
orig_var = 'Height';
Centimeters = input(Height,8.);
drop Height;
rename Centimeters = Height;
run;
proc print data=final_data (obs=528);
run;
PROC IMPORT will look at a specific CSV file and try to guess how to define the variables based on the set of text values in that one file.
It will generate data step code that you should see in the log. You could start with that as a model for the code you need to create.
But it is really easy to write the code to read a CSV from scratch without using PROC IMPORT at all.
Basic structure is like this:
data want;
infile 'myfile.txt' dsd firstobs=2 truncover;
length firstvar 8 var2 $30 var3 8 lastvar $40;
input firstvar -- lastvar;
run;
The DATA statement names the dataset you want to create. The INFILE statement tells SAS where to find the lines of text and that you want to treat it as delimited and skip the header line. If your file is not a true CSV file, but some other type of delimited then you can use the DLM= option on the INFILE statement to tell it what character(s) to treat as the delimiter. The LENGTH statement defines the type of the variables (and for character variables the length). Note for numeric variables just use 8 since SAS stores all numbers as 8 byte floating point values. Then INPUT statement tells it which variables to read. You can abbreviate it by using a positional variable list like in this example.
If you have DATE, TIME or DATETIME values (or possibly other things like numbers with commas or $ in them) then you can add a INFORMAT statement to tell SAS how to convert the text in the file into the right numbers. For DATE, TIME and DATETIME values you will want to attach an appropriate FORMAT so they print in a human readable style. The format you choose does not have to match how it was stored as text in the source file. It just needs to be appropriate for the values.
> I have been trying to complete this operation in the final data step
The variables have to be clean when the data sets are merged.
Either change them before, or rename the bad ones as you merge.
set D1-D4
D5(rename=(HEIGHT=HEIGHT_NUM))
D6-D8
D9(rename=(HEIGHT=HEIGHT_NUM))
D10-D12;
and then process them as needed.
Thank you for your help! I used your suggesting and renamed both variables after importing prior to merging the data sets.
set D1(rename=(Height=Height_Num Alcohol=Alcohol_Num));
I still got the same error code when I merged the cleaned data together though?
If you still have this error,it means the data is not clean.
Check that all variables with the same name have the same type (num or char) in all tables, and hopefully the same length too.
Why did you use PROC IMPORT? That has to guess how to define the variables. Just write your own data steps to read the CSV files.
Are the CSV files supposed to have the same structure? If they do you can read all 12 in one data step and eliminate the need for a data step to set them together.
That's how I knew how to import csv files. Each file has the same structure in terms of variable names, but they are defined differently in each of the 12 sets in terms of character or numeric type and need to be converted to numeric before I can merge them all into 1 final set.
@hk19 wrote:
That's how I knew how to import csv files. Each file has the same structure in terms of variable names, but they are defined differently in each of the 12 sets in terms of character or numeric type and need to be converted to numeric before I can merge them all into 1 final set.
A CSV file does not have any place where it could define the type of a variable. The only metadata it can have is the (optional) header line that you can use as a hint for what names to use for each column.
After downloading the CSV files to SAS studio and running them, I was able to see a table of variables and attributes showing which data sets showed character vs numeric variables. A PROC IMPORT operation was performed in the log by running this in SAS Studio. I am still not sure what other operation I should perform besides the set statement to make sure that I can make 2 variables numeric instead of conflicting, as I reported my error code.
1. You can use proc contents to see the variable types.
2. When you run proc import, you can see the data step that's generated displayed in the log.
Use and alter that code to import the data exactly as intended in a consistent manner for all tables.
PROC IMPORT will look at a specific CSV file and try to guess how to define the variables based on the set of text values in that one file.
It will generate data step code that you should see in the log. You could start with that as a model for the code you need to create.
But it is really easy to write the code to read a CSV from scratch without using PROC IMPORT at all.
Basic structure is like this:
data want;
infile 'myfile.txt' dsd firstobs=2 truncover;
length firstvar 8 var2 $30 var3 8 lastvar $40;
input firstvar -- lastvar;
run;
The DATA statement names the dataset you want to create. The INFILE statement tells SAS where to find the lines of text and that you want to treat it as delimited and skip the header line. If your file is not a true CSV file, but some other type of delimited then you can use the DLM= option on the INFILE statement to tell it what character(s) to treat as the delimiter. The LENGTH statement defines the type of the variables (and for character variables the length). Note for numeric variables just use 8 since SAS stores all numbers as 8 byte floating point values. Then INPUT statement tells it which variables to read. You can abbreviate it by using a positional variable list like in this example.
If you have DATE, TIME or DATETIME values (or possibly other things like numbers with commas or $ in them) then you can add a INFORMAT statement to tell SAS how to convert the text in the file into the right numbers. For DATE, TIME and DATETIME values you will want to attach an appropriate FORMAT so they print in a human readable style. The format you choose does not have to match how it was stored as text in the source file. It just needs to be appropriate for the values.
Thank you so much for your help, I never learned how to do this before and this ended up working. Thank you for teaching me!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.