BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
xyxu
Quartz | Level 8
I need to ensure all variables with the same prefix are numeric, so want to multiply their values by 1. Is there a good way to do this?
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
Excel doesn't enforce data types which is why it's very bad for data management. Is this something you need to do regularly or something you do one off? If it's a one off I'd convert the file to CSV run it through PROC IMPORT and modify the code. If it's a regular process you need to build a process to ensure its converted and read in correctly each time. There isn't a way to really skip this step unfortunately when dealing with Excel files.

View solution in original post

14 REPLIES 14
Reeza
Super User

Multiplying by 1 does not change a variable type. 

You can use arrays to make a change across multiple variables but you need to list the variables somehow. 

 

data convert;
set have;

array new(*) new1-new20;
array old(*) $ old1-old20;

do i=1 to dim(old);
new(i) = input(old(i), 8.);
end;

run;

If you have a few in there that are of mixed types then that gets to be a pain the a** because arrays can only be one type. 

You can use the methods here to list the variables easier:

Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

 

Here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/

 


@xyxu wrote:
I need to ensure all variables with the same prefix are numeric, so want to multiply their values by 1. Is there a good way to do this?

 

xyxu
Quartz | Level 8
Indeed some of the variables are char and others are numeric. This is caused by the import process from excel. When all values of a column are missing, the procedure determines the variable to be char instead of numeric.
Reeza
Super User
Excel doesn't enforce data types which is why it's very bad for data management. Is this something you need to do regularly or something you do one off? If it's a one off I'd convert the file to CSV run it through PROC IMPORT and modify the code. If it's a regular process you need to build a process to ensure its converted and read in correctly each time. There isn't a way to really skip this step unfortunately when dealing with Excel files.
xyxu
Quartz | Level 8

This makes sense. The only issue with using the data step to import csv format is that the file has hundreds of columns with the same prefix, each followed by a different date. I am still trying to find a way to avoid typing all these columns in the informat command.   

Reeza
Super User
Use PROC IMPORT and then modify the code from the log instead of manually typing it all out.
Or use Excel or SAS to generate the list dynamically - you can code to generate a list of text I would assume 🙂
xyxu
Quartz | Level 8

This is working now. The data step has 2000 lines though. The only issue I am having is that my numeric values have a comma as the thousand separator, and SAS gets confused.

Reeza
Super User
Is the file formatted correctly. When a value has a comma embedded it should be in quotes. Is it in quotes in the CSV file?
Open the file with a text editor, not Excel, to check this.
Otherwise it is problematic to tell what is the end of the value if you're using comma's as your delimiter and as a numeric separator.
xyxu
Quartz | Level 8
I opened it in Atom. The values look like the following: "1,972,909.44","1,771,118.01","1,710,921.05","1,604,967.14","1,655,117.54","1,638,147.95"
Reeza
Super User
Then SAS can definitely read them in correctly. Please show your code to read the data - not all, please summarize the problem I don't want to read through 2000 lines.
xyxu
Quartz | Level 8

I am using

data want;
	infile "&in_path\data.csv" delimiter = ',' dsd missover lrecl = 32767 firstobs = 2; 
	informat variables 15.2;
        format variables;
        input variables;
run;

All other variables are read correctly, except for ones that have thousand separators.  

Reeza
Super User
Change MISSOVER to TRUNCOVER and change the informat from 15.2 to comma16.

Do not specify the decimal places (ie the 2 in 15.2) UNLESS you expect to have decimal places for EVERY value.
xyxu
Quartz | Level 8
Yes, this works! Thank you!
ballardw
Super User

@xyxu wrote:
I opened it in Atom. The values look like the following: "1,972,909.44","1,771,118.01","1,710,921.05","1,604,967.14","1,655,117.54","1,638,147.95"

So variables that have commas in numeric values are read as COMMA16. (or possibly larger width)

Example:

data example;
  infile datalines dlm=',' dsd;
  informat var1-var6 comma16.;
  input var1-var6;

datalines;
"1,972,909.44","1,771,118.01","1,710,921.05","1,604,967.14","1,655,117.54","1,638,147.95"
;

You can look at the variables created. The difference is your code would point to the external file on the INFILE statement instead of the inline data.

If you later want the variables to display with commas then assign a format like Comma18.2;

I typically add a couple of additional characters for the informat width in case your example didn't capture the largest values. Do NOT specify the decimal portion on the Informat you can get very interesting (and usually wrong results) unless you really understand what implied decimal means when the value to be read is "123" in the text file.

ballardw
Super User

@xyxu wrote:

This makes sense. The only issue with using the data step to import csv format is that the file has hundreds of columns with the same prefix, each followed by a different date. I am still trying to find a way to avoid typing all these columns in the informat command.   


Show some actual names needed. Or the header of the csv with the column headers. If you go this way copy the text from the CSV file with a text editor NOT any spread sheet software as that often interprets stuff, then paste the resulting text into a text box opened on the forum with the </> icon above message window.

 

If there is an obvious pattern, that can likely be coded to write the names needed.

 

It is also not terribly difficult to read one line of text from file, such as the header row of a csv file and then fold, spindle and mutilate to get the text you need and use it to write the Informat information needed which could be %included as needed.

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 2632 views
  • 9 likes
  • 3 in conversation