BookmarkSubscribeRSS Feed
geneshackman
Pyrite | Level 9

Is there a way to read in a csv data set that has "." in some cells and replace those with "*"? The things inside the quote marks, e.g., period and star. This is the import statement, I'm sure I can't do it here, probably in a data step.

 

proc import datafile = "\\folder\data.csv"
out=workingdata
dbms=csv
replace;
getnames=yes;
run;

 

The data set came in with "."  I use * to represent suppressed data, and have a footnote explaining the symbol.

 

If you know how to do this, that would be very helpful.

 

FYI, I do NOT was to talk about why I'm doing this, why I should not do this, why doing this is never the right thing to do. Those points are not at all helpful.

 

13 REPLIES 13
Angel_Larrion
SAS Employee

You can use the TRANWRD function in a data step after the proc import.

 

https://v8doc.sas.com/sashtml/lgref/z0215027.htm 

geneshackman
Pyrite | Level 9
Hi. Thanks for the suggestion. I'm not sure about this. The columns where the "." are also have numbers, with the period in them, for example "23.7". Can I still use TRANWRD? Wouldn't the search and replace also change the numbers, for example, to be 23*7"?
Kurt_Bremser
Super User

Don't use PROC IMPORT for csv files. Write the data step yourself (or modify the one created by IMPORT), and you avoid all the hassle fixing things that IMPORT gets wrong.

Especially if that column is supposed to be numeric. In that case, a custom informat that reads the asterisk as missing is the way to go.

 

Edit: I see you're not talking numeric.

Use IFC:

var = ifc(var = ".","*",var);
geneshackman
Pyrite | Level 9

Um, I forgot to include a bit of info. Sorry. 

 

The columns where these "." appear are numeric, and, of course, many of the numbers also have "." in them, for example, "23.7", so I don't know if I can search and replace "." with "*". 

 

 

geneshackman
Pyrite | Level 9
That seems very promising. Except, well, there appear to be other cells in the csv file that are blank. I do -not- want to change these cells to *. Any suggestions?
Kurt_Bremser
Super User
What is the difference between a missing number as blank and a missing number as asterisk?

You might get what you want by using a custom informat that creates a "special missing value" (.a to .z) when an asterisk is encountered.
geneshackman
Pyrite | Level 9
Kurt, thanks. The data set came with some cells being blank, which means no instances, and ".", which means suppressed. I only want to change the "." to *.

How would I use the custom informat? I'd want it to be when missing is "." then change it to *. But if missing is blank, then do -not- change to *.
Kurt_Bremser
Super User

To do this, you need an informat that reads the asterisk to a special missing, and a fornat that displays the special missing as asterisk:

proc format;
invalue at_best
  "*" = .a
  other = [best.]
;
value at_best
  .a = "*"
  other = [best.]
;
run;

Then you read the data and use both:

data want;
infile "\\folder\data.csv" dlm="," dsd truncover;
input
  .......
  numvar :at_best.
  ....
;
format numvar at_best.;
run;
ballardw
Super User

@geneshackman wrote:
Kurt, thanks. The data set came with some cells being blank, which means no instances, and ".", which means suppressed. I only want to change the "." to *.

How would I use the custom informat? I'd want it to be when missing is "." then change it to *. But if missing is blank, then do -not- change to *.

What is the current format for the values that appear as blank?

I strongly suspect these are not numeric variables but character.

Tom
Super User Tom
Super User

@geneshackman wrote:
Kurt, thanks. The data set came with some cells being blank, which means no instances, and ".", which means suppressed. I only want to change the "." to *.

How would I use the custom informat? I'd want it to be when missing is "." then change it to *. But if missing is blank, then do -not- change to *.

SAS has special missing values .A to .Z and ._ to allow you to have different types of missing data.  So if you are reading a CSV file and you want to treat numeric variables that have an actual period in the CSV file differently than nothing at all (that is two adjacent commas) you will need to replace either the periods or the null strings with either a letter or an underscore. You might also need to issue the MISSING statement (not the MISSING option) to make sure that SAS treats the letter or underscore as meaning the special missing and not an invalid value for number. 

So changing a line like:

123,.,,012

to

123,_,,012

would be interpreted as having one value of the special missing ._ and regular missing value.

 

You could even make the change in the same step that is reading the file by manipulating the _INFILE_ buffer. (Note: This will only work if no line in the file has more than 32K bytes since that is maximum length where you can modify the _INFILE_ buffer variable.)

Example:

missing _;
data test ;
  infile cards dsd firstobs=2 truncover ;
  input @ ;
  _infile_=tranwrd(cats(',',_infile_,','),',.,',',_,');
  input @2 var1-var3 ;
cards;
var1,var2,var3
123,,456
.,789,012
;

proc print;
run;
Obs    var1    var2    var3

 1      123       .     456
 2        _     789      12

If you want the ._ values to print as * instead of _ then make your own format.

proc format ;
  value star ._ = '*' ;
run;

proc print data=test;
  format var1-var3 star5. ;
run;

image.png

 

ballardw
Super User

@geneshackman wrote:

Is there a way to read in a csv data set that has "." in some cells and replace those with "*"? The things inside the quote marks, e.g., period and star. This is the import statement, I'm sure I can't do it here, probably in a data step.

 

proc import datafile = "\\folder\data.csv"
out=workingdata
dbms=csv
replace;
getnames=yes;
run;

 

The data set came in with "."  I use * to represent suppressed data, and have a footnote explaining the symbol.

 

If you know how to do this, that would be very helpful.

 

FYI, I do NOT was to talk about why I'm doing this, why I should not do this, why doing this is never the right thing to do. Those points are not at all helpful.

 


How about sharing some details.

Such as the data set variable type for the values you are talking about.

If the CSV file has consecutive commas then SAS interprets that as a missing value and by default displays a . if the variable is NUMERIC.

So if the variable(s) are numeric you cannot assign a value of "*" because that would be attempting to assign a character value to a numeric variable.

If this is the case then you can set an option that will display the missing as an asterisk:

 

options missing='*';

Or you can create custom formats to do the same.

 

Example.

data example;
   infile datalines dlm=',' dsd;
   input a $ x y z;
datalines;
abc,123,,678
pdq,111,222,333
;

proc print data=example;
   title 'Default missing option';
run;

options missing='*';
proc print data=example;
   title 'missing option *';
run;


SAS reads . from CSV if the only character between commas as missing.

If the variable is character you would not see the dot though. Which is why I believe you have numeric values.

Tom
Super User Tom
Super User

Your responses have only served to confuse me about what you actually have and what you want to do.  Please create and post a simple example that shows the different types of data you want to read and the resulting dataset your want.  

Let's set some baselines here. 

  • You cannot have asterisks in a numeric variable. If you want to distinguish in a numeric variable between a missing value and * you will need to use one of the 27 special missing values (.A - .Z and ._) to mean *.
  • If you have just a period in a character field then you will need to read it using the $CHAR informat instead of the default $ informat to have SAS preserve the period. The default character informat will translate a bare period into spaces.
  • If you have a character variable you can convert every period in it to asterisks using the TRANSLATE() function.  If you only want to convert when the value is a single period then you will need IF/THEN or other conditional logic or custom informat.
  • If you want control over how the CSV file is converted into a dataset you need to write your own DATA step to read it. Currently you are asking SAS to guess what to do by using PROC IMPORT.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 2071 views
  • 8 likes
  • 5 in conversation