BookmarkSubscribeRSS Feed
melhaf
Fluorite | Level 6

Hi,

here is a random dataset I created 

data testdata;
    length banana 8 fruit 8;
    do year = 2014 to 2023;
        date = input(cats(put(year, 4.), '1231'), yymmdd8.);
        do i = 1 to 25;
            customer = cats('Customer', (year - 2014) * 25 + i);
            banana = round(ranuni(0) * 100, 0.01); 
            fruit = ceil(ranuni(0) * 2) - 1;
            output;
            if i <= 10 then do;
                customer = cats('Customer', (year - 2014) * 25 + i);
                banana = round(ranuni(0) * 100, 0.01); 
                fruit = ceil(ranuni(0) * 2) - 1; 
                output;
            end;
        end;
    end;
    format date yymmddn8.;
    drop i year;
run;

But how can I add a column (or a row) that confirms that all the columns are like numeric or char.. For example the column 'fruit'  there 1=yes, 0=no, but lets say that there is a bug in the inserting and someone had write "yes" as a string/char, then I want a column/row  that says: "not correct" if there is a string there, or "correct" if there is only 1 and 0. 

And I want to do it for every column in my dataset.  Recall that this above is just my fiction data, my data have other columns, so I think I prefer to insert the columns manually and not just like for every coloumn-counting-loop thing 🙂 beacuse the colomns differ for each model.

 

Thanks,

9 REPLIES 9
yabwon
Onyx | Level 15

First of all, it's SAS not Excel to a column (a variable in SAS nomenclature) can be only one type numeric or character. So situation when a variable is numeric and contains text "yes" is impossible.

data test;
  x=1; output;
  x='test'; output;
run;

the log will show:

1    data test;
2      x=1; output;
3      x='test'; output;
4    run;

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
      3:5
NOTE: Invalid numeric data, 'test' , at line 3 column 5.
x=. _ERROR_=1 _N_=1
NOTE: The data set WORK.TEST has 2 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

and in the data you will see:

yabwon_0-1717148174916.png

 

To preview types of variables (and bunch of other information) you can proc contents:

proc contents 
  data = sashelp.cars 
  out = work.metadata_information
  varnum;
run;

In the output window, among other, you will see:

yabwon_1-1717148349595.png

and additionally the work.metadata_information data set will have that information too.

 

To test for particular variables values you could start with something like this:

%macro testData(ds, var, space_separated_list_of_values);
data _null_;
  set &ds.(keep=&var. where=(&var. NOT in (&space_separated_list_of_values.))) curobs=firstOccurenceNumber;

  put "WARNING: values of &var. outside of range. "
    / _ALL_;
  stop;
run;
%mend;

%testData(sashelp.class, AGE, 11:15);

%testData(sashelp.cars, Origin, "Europe" "USA");

and then of cours develop it further.

 

 

Bart

 

 

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



melhaf
Fluorite | Level 6

@yabwon Hmm interesting, and the dot (.) does it mean that there is missing value, and  that can indicate that there -  or could be - a "yes" (char) instead of 1 (num) ? And can you say that it is the same thing for looking at missing values? Can you say that it is the same thing?

 

But another question then, in SAS can you look if there coloum should be like an integer, floats or such? or do SAS just have numeric and that's it?

 

 

Kurt_Bremser
Super User

SAS has only two data types, character and numeric. Numeric variables are always stored in real (floating point) format using 8 bytes.

A numeric missing value is displayed as a dot, and in code you also use the dot to represent a missing value.

Formats can be used to display numeric values as specific text, e.g.

proc format;
value yn
  0 = "No"
  1 = "Yes"
;
run;

data test;
format x yn.;
input x;
datalines;
1
0
.
3
;

proc print data=test noobs;
run;

You can query the type of variables by using DICTIONARY.COLUMNS in PROC SQL or SASHELP.VCOLUMN in a DATA/PROC step.

yabwon
Onyx | Level 15

There are only two types in SAS that can be stored in SAS data set a number or a character.

Number covers all: floats, integers, and booleans.

 

If you have a numeric variable an you will try to store text in it, SAS is smart ass, it will try to convert (implicitly) text to number and then store the value. If conversion is successful then you will get a "numeric version" of your value:

data test:
 x=1; output; 
 x="2"; output;
 x="1e2"; output;
 x="3.14"; output;
 x="text"; output;'
run;

 

So if you want to look for "bad data" which does not look like a number checking missing values (the dot) can help. But if you are looking for data that does look like numbers but are not... then it may be a bit more complicated. 

 

Check out those two examples, depending on use of DSD option results are different.

data test;
infile cards dsd;
input x;
cards;
1
2
"3"
text
"test"
;
run;

data test;
infile cards /*dsd*/;
input x;
cards;
1
2
"3"
text
"test"
;
run;

 

Bart

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



ballardw
Super User

@melhaf wrote:

@yabwon Hmm interesting, and the dot (.) does it mean that there is missing value, and  that can indicate that there -  or could be - a "yes" (char) instead of 1 (num) ? And can you say that it is the same thing for looking at missing values? Can you say that it is the same thing?

 

But another question then, in SAS can you look if there coloum should be like an integer, floats or such? or do SAS just have numeric and that's it?

 

 


You can retrieve format information which may tell you how a variable was expected to be used. If the number of decimals, the D in most formats such as Comma12.2 where the .2 is two decimal places that might tell you something.

But one of the powerful tools in SAS is the use of different formats at specific times (analysis, modeling or reporting). For example a variable that is build to hold a datetime value can with the use of the DTDATE format use only the Date portion of the information in the variable, or in another procedure use the DTYEAR format to only use the Year part of the value. Or a custom format that would display Month and hour of day only which is possible with custom formats created with the PICTURE statement.  So things like "integer" and "float" may be a bit interchangeable at use time.

 

In my occasionally humble opinion the best way to validate that a variable is read correctly is to use a correct Informat.

For example I have informats that read code values such as a physical site location or data entry values for a test. These informats have a list of expected values (some quite long) with the value to store them as plus use of the _error_ for any entries not expected.

So if a value is supposed to be "boolean", and the text is "Yes" "No" I read those to 1/0 (the numeric works more consistently for most purposes) and anything else will through an invalid data message. With the _error_ option as used then none of the invalid data values every appear in the data, they are set to missing by the informat.

The invalid data messages also can alert to changes in the data, such as a new location code that needs to be added to your metadata (informat and any thing using that variable such a custom format that displays the name or meaning of the code value.)

Quentin
Super User

@ballardw wrote in part:

For example I have informats that read code values such as a physical site location or data entry values for a test. These informats have a list of expected values (some quite long) with the value to store them as plus use of the _error_ for any entries not expected.

So if a value is supposed to be "boolean", and the text is "Yes" "No" I read those to 1/0 (the numeric works more consistently for most purposes) and anything else will through an invalid data message. With the _error_ option as used then none of the invalid data values every appear in the data, they are set to missing by the informat.

Totally agree, using other=_error_ on an informat is a great way to catch problems when reading in data.  I still hope SAS will add a way to use other=_error_ on formats as well.  I almost always expect my formats the be exhaustive.  It would be nice to have an automatic way to get an error (or at least warning) in the log when a value sneaks through the format.  I submitted an old ballot item for that idea, but see you've already commented it from 7 years ago, so probably not much hope for this feature to magically arrive in 9.4M9 next year.  : )  https://communities.sas.com/t5/SASware-Ballot-Ideas/Add-other-error-for-PROC-FORMAT/idi-p/323639

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
Tom
Super User Tom
Super User

Once you have the values in a dataset the type of the variable is already set, so I am not at all sure what your question is.

 

The only thing that makes any sense is you are trying to read in a TEXT file, such as a CSV file, and what to find the places where text is entered that does not work to represent a number. In that case you don't really need to do anything as the INPUT statement will issue the note for you.

 

Let's make an example text file, in this case a CSV file.  Let's use the fileref CSV to point to it.

options parmcards=csv;
filename csv temp;
parmcards;
id,age,present
1,10,1
2,20,0
3,30,yes
;

Now if we try to read that data into a dataset treating each column as a numeric you get a note in the SAS log about the invalid data.

 

Example SAS log:

1    options parmcards=csv;
2    filename csv temp;
3    parmcards;
8    ;

9
10   data want;
11     infile csv dsd truncover firstobs=2;
12     input id age present;
13   run;

NOTE: The infile CSV is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: Invalid data for present in line 4 6-8.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
4         3,30,yes 8
id=3 age=30 present=. _ERROR_=1 _N_=3
NOTE: 3 records were read from the infile (system-specific pathname).
      The minimum record length was 6.
      The maximum record length was 8.
NOTE: The data set WORK.WANT has 3 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

 

Tom
Super User Tom
Super User

You are making the code generation too hard.

Use the MDY() function and the RAND('INTEGER") function.

date = mdy(12,31,year);
banana = rand('integer',0,100*100)/100;
fruit = rand('integer',0,1);

 

Quentin
Super User

But how can I add a column (or a row) that confirms that all the columns are like numeric or char.. For example the column 'fruit'  there 1=yes, 0=no, but lets say that there is a bug in the inserting and someone had write "yes" as a string/char, then I want a column/row  that says: "not correct" if there is a string there, or "correct" if there is only 1 and 0. 

I enjoy this sort of data validation, and there are lots of fun ways to approach it in SAS.  

 

But to start with, as others have said, SAS only two types of variables: numeric and character.  So if the variable FRUIT is defined as numeric, it is not possible for it to accidentally have the string value 'yes'.

 

If the variable FRUIT is defined as character, then it is possible for it to have the string value '1' or '0' or the letter 'l' or the letter 'O'.  So then it does become a data validation problem.

 

In your real data, is FRUIT defined as a character variable or a numeric variable?  Or in the real problem, are you reading values from a text file (or perhaps an Excel file) where the column types are not defined?

If you can describe more of the real situation, we can provide better advice.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1112 views
  • 2 likes
  • 6 in conversation