DATA Step, Macro, Functions and more

components that are of different data types: how to effective solve the issue

Reply
Frequent Contributor
Posts: 127

components that are of different data types: how to effective solve the issue

Der experts,

 

due to the the restrictiveness of SAS concerning management of data of different data type, I am struggling with some macros that should be applied both to charachter and and numeric variables.

Which is an efective way to avoid the following problem: "components that are of different data types"? I would like to avoid to split a table in two (one with numeric variables, one with characters) in order to run some macros specifically for charachters and some specifically for numbers. Thanks a lot! SH

 

 

Super User
Posts: 17,729

Re: components that are of different data types: how to effective solve the issue

Make sure your import process explicitly defines type to avoid the issue. 

 

But beyond that it depends on what your trying to do in your macro. It sounds like an append? 

Super User
Super User
Posts: 7,392

Re: components that are of different data types: how to effective solve the issue

The restrcitiveness is there to aid you as a programmer, and you will find it in databases, and other programming languages (ok, some are loosly typed).  Without some indication of what you are doing its hard to help.  I mena there are arrays which you can define like:

array num_vars _numeric_;

array char_vars _character_;

 

And there is SASHELP.VCOLUMNS (and dictionary) which holds the metadata for each dataset.  This and your knowledge of your data should be enough to program anything?  Provide some concrete examples if more info is needed.

Frequent Contributor
Posts: 127

Re: components that are of different data types: how to effective solve the issue

if a variable is 0001 or 201602... I would like to deal with them as numbers, independently by the variable type definition of SAS.

If I check 201602>201601 I expect a meaningful output, without converting them to a numeric in advance.

I do not expect something like in Python where you can even compare with < and > the letters of he alphabet and much more, but if they are numbers... cannot sas somehow understand it without the need of explaining everything? My aim is to write compacts codes, I would like to avoid to waste time just to learn how SAS is not able to undestand the data and how to help him to understand. Thanks.

Frequent Contributor
Posts: 127

Re: components that are of different data types: how to effective solve the issue

in case of:

if "barracuda" > 1 then...

well in this case I would expect an error.

Super User
Super User
Posts: 7,392

Re: components that are of different data types: how to effective solve the issue

Store your data in an approriate format then?  SAS doesn't tell you store 0001 as a character variable that is a choice you have made, and thus need to deal with.  What if in line two you have o001, are we to assume that is also a number?  What about abcd?  

 

Conversion between types is simple and safe using put/input functions.  Simply put you are the owner of your data, you define the structure and the content of it.  Sure you have to follow some rules given by the product, but that is the same with any product.  Once you understand your data, set it up correctly, then your coding will be very simple. 

Frequent Contributor
Posts: 127

Re: components that are of different data types: how to effective solve the issue

[ Edited ]

I am telling you to use a variable as a number when it can be treated as a number. If you have a value abcd or o0001... of course it cannot be used as a number.

When I import the data from a csv or an excel (e.g 300 variables)... should I specify for each column how to read the value?

If there is no logic to say to SAS: "Dear friend SAS... well I am doing calculations, please consider numbers as numbers. I do not need your support to write a poem"... then I should stick to this conversion logic.

Then how can I write? convert all the variables of the table to numeric, if possible. Variables containing abcd or o0002 cannot of course.

Super User
Super User
Posts: 6,495

Re: components that are of different data types: how to effective solve the issue

You seem to be conflating a number of different problems.

First, ignore macro code.  Learn how to use SAS code first.  Later you can begin to learn how to take advantage of SAS macro variables to replace repetitive parts of your code.  Later still you can learn to use macros so that you can introduce conditional logic into the generation of code.

 

Second, learn how to read and manipulate data.  If you have to read data from text files (like CSV files) or other UN structured inputs where the variable types have NOT been defined by the creator of the data file (like EXCEL files), then you will need to learn how to discover the appropriate data type for your variables and convert them.  Learn how to use PROC CONTENTS and other tools for looking at the metdata of your datasets.

 

Third, there are some general methods for variable value testing that do not depend on the type of the variable.

1) You can use the MISSING() function on any data type.

2) You can use the CAT... series of functions to silently convert data to strings.

3) You can use the VVALUE() and VVALUEX() to see the formatted values of variables.

 

Frequent Contributor
Posts: 127

Re: components that are of different data types: how to effective solve the issue

[ Edited ]

Hi Tom,

 

do you see any macro in this post? I just mentioned as backgroun. Well basically everything that I am doing is based on the proc content. The problem is that I cannot define a specific rule for each variable, I am trying to automatize the process.

1) are you sure that missing work also for not numeric variables?

proc means data=in.test STACKODS n nmiss range min max;
var _numeric_ ;
ods output summary=DC.stacked
;run;

the results that i get from the step above are produced only for numeric variables.

2) I did not about cats, but thanks to the explanation of Rheinard... it seems to be really helpful

3) good to know... I am trying to figure out if it can help me.

 

If you are referring to the other post... well thanks to the brilliant help of Rheinard... I am getting where I want. If you also can effectively support, you are welcome. ;-) Thanks.

Super User
Super User
Posts: 7,392

Re: components that are of different data types: how to effective solve the issue

I am telling you to use a variable as a number when it can be treated as a number. If you have a value abcd or o0001... of course it cannot be used as a number.

-->   No, what you are telling me is that there is a variable which is text, and thus can hold any strings a text variable can hold, and that somehow I am supposed to be able to treat that as a number, the function input() is there to read and process text data, and if possible to convert it to a number.

 

When I import the data from a csv or an excel (e.g 300 variables)... should I specify for each column how to read the value?

-->  Quite simply yes.  If you don't tell the computer what Your data is then how should it know?  Is 123 a numer or is it a date in numerical format, or is a formatted value, or is it really text?  

 

If there is no logic to say to SAS: "Dear friend SAS... well I am doing calculations, please consider numbers as numbers. I do not need your support to write a poem"... then I should stick to this conversion logic.

--> You are again sidestepping the responsibility.  Its your data, you set the variable type.  You have set a column up to be text therefore you need to deal with it.  If a field is a numeric field then set it as a numeric field, if it is text, set it as text, if it is text which should, in certain instances be treated as numeric then use the functions provided to conver that text to numeric.

 

Frequent Contributor
Posts: 127

Re: components that are of different data types: how to effective solve the issue

1) maybe I was not clear enough: if a variable (colum) contains just one observation that cannot be anyhow numeric (e.g. o0001) all the observations should be then numeric. I am not expecting different treatments for observations belonging to the same variable.

2) Again: my aim would be to prioritize the categorization (e.g. if a variable contains only numbers...) I expect to have it automatically as number. Or at least in the same straighforward way as r, matlab or Python do. 

3) Yes I would like to sidestep the responsability, but I cannot apparently. If I use some libraries in Python (I ignore 90% of what is it behind) but I get the data I need for my purpose. My aim is not to be a sas consultant, I just need sas to produce what I want in the easiest, quickiest and flexible way.

Anyway thanks for the explanation... it seem than we could invest our team in more productive ways than keep discussing on it.

Super User
Super User
Posts: 6,495

Re: components that are of different data types: how to effective solve the issue

I have lost the thread of this conversation.  For a language that was created 40 years ago SAS is very good at not forcing the user to worry too much about data types.

 

PROC IMPORT does a reasonable job of guessing at your data types if you want to let it do that.  And unlike Excel it won't convert your id variables with hyphens in them into dates or subtraction formulas.

 

Most PROCs, where it is appropriate, do not care about the data type of the variables.

You can make TABLES on character or numeric variables in PROC FREQ.

Regression tools let you use character or numeric variables in CLASS statements.

 

Of course PROC MEANS doesn't work on character variables.  What is the mean of 'Male' and 'Female'?

 

Frequent Contributor
Posts: 127

Re: components that are of different data types: how to effective solve the issue

[ Edited ]

average between male and female is not the point, not even an issue, quite easy to calculate: if you consider it as bolean variable (0 or 1... conversion to numbers!!!) you could easily calculate the average sex as the percentage of males/females in the whole population. You know... the advantages of converting to numbers and playing with them.

If I ask you which is the mean of 10 and 20... the answer can be: to calculate the mean you should have numbers? Is it a joke? They simpy are numbers... why should I explain it to sas? I would understand the other way around: to force numbers to be considered as charachters but it is a quite weird transformation.

I do not blame SAS, of course is a quite close universe and not so flexible (at least in comparison to apache spark or similar) but it does a good job, that's why I am using it and I am trying to get the most out of it. This forum is also really helpful.

Super User
Super User
Posts: 6,495

Re: components that are of different data types: how to effective solve the issue

You lost me again.  Can you show something you actually tried to do that caused the error message that started this thread?

SAS is happy to convert numbers to character and the reverse for you when you accidently use the wrong type.

 

data mydata ;
   input @1 charvar $5.  @1 numvar 5. ;
   if charvar > 100 then put 'CHARVAR > 100 ' charvar= ;
   if numvar > '100' then put "NUMVAR > '100' " numvar=;
cards;
45
1234
;;;;
12    data mydata ;
13       input @1 charvar $5.  @1 numvar 5. ;
14       if charvar > 100 then put 'CHARVAR > 100 ' charvar= ;
15       if numvar > '100' then put "NUMVAR > '100' " numvar=;
16    cards;

NOTE: Character values have been converted to numeric values at the places given by:
      (Line):(Column).
      14:8    15:17
CHARVAR > 100 charvar=1234
NUMVAR > '100' numvar=1234
NOTE: The data set WORK.MYDATA has 2 observations and 2 variables.

If you don't like how it does it you can add extra code to tell it to do it in another way.

Frequent Contributor
Posts: 127

Re: components that are of different data types: how to effective solve the issue

Hi Tom,

 

thanks a lot for your explanations, now I am getting abetter understanding. Coming back to my specific case:

 

%macro macro_outliers (var=);
proc sql; create table upper_bound as select Variable_max from DC.input_analysis_res where variable_name="&VAR" ;quit;
proc sql; create table lower_bound as select Variable_min from DC.input_analysis_res where variable_name="&VAR" ;quit;
proc sql; create table base as select distinct "&VAR" from in.test ;quit;
proc sql; create table base as select a.*, b.*, c.* from base a, upper_bound b, lower_bound c ;quit;
proc sql; create table num_ourliers as select
case when "&VAR">Variable_max then 1 else 0 end as n_over,
case when "&VAR"<Variable_min then 1 else 0 end as n_under
from base ;quit;
proc sql; create table num_ourliers as select sum(n_over) as num_over, sum(n_under) as num_under from num_ourliers ;quit;
proc sql; update DC.input_analysis_res set outliers_up = (select num_over FROM num_ourliers) where variable_name="&VAR" ;quit;
proc sql; update DC.input_analysis_res set outliers_down = (select num_under FROM num_ourliers) where variable_name="&VAR" ;quit;
%mend macro_outliers;

 

I underlined the critical part, that I would like to change including the following steps:

1. if the variable is numeric then execute the macro

2. if the variable is not numeric, try to convert it to numeric and execute the macro

3. if the variable cannot be converted to numeric, well do not du anything but please let the code go further.

How can I efficiently embed these 3 steps in my code? Thanks.

 

 

Ask a Question
Discussion stats
  • 27 replies
  • 544 views
  • 7 likes
  • 5 in conversation