BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
EvansHsieh
Calcite | Level 5

Here, I have a dataframe followed by

---------------------------------------------------------------------------

data temp_;
input x y z;
cards;
1 2 10
2 5 10
3 6 10
7 1 2
run;

---------------------------------------------------------------------------

And now i wanna classify each columns with same rules :

(1) variable < 4 then 'First Cluster'

(2) 4 <= variable < 7 then 'Second Cluster'

(3) 7<= variable then 'Third Cluster'.

Here I make an attempt,

---------------------------------------------------------------------------

data test_;
set temp_;
array level_ x -- z;
do over level_;
    if level_ < 4 then level_ = 'First_Cluster';
    else if 4 <= level_ < 7 then level_ = 'Second_Cluster';
    else if 7 <= level_ then level_ = 'Third_Cluster';
    else level_ = 'Other';
end;
run;

---------------------------------------------------------------------------

But what the output frame is all of null value. It seems like `array` can't be assign with string type. So can't I split numeric variable by `array`? or there is any suggestion?

 

Thank you for all your help !! 

 

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

What is a "dataframe"?

What do you expect as result?

@japelin named the main problem: you can't change the type of a variable.

You could define a format and attach it to the variables. You won't see the original value, but only the text:

proc format;
  value Cluster
    LOW -< 4 = 'First Cluster'
    4 -< 7 = 'Second Cluster'
    7 - HIGH = 'Third Cluster'
  ;
run;

proc print data=temp_;
  format x y z Cluster.;
run;

 't

View solution in original post

3 REPLIES 3
japelin
Rhodochrosite | Level 12

I think this is because "level_" is a numeric array, but you are trying to assign a string to it in the do over statement, and it is not being handled correctly.
If you really want to use the same variable name, you need to assign it to a temporary variable and then rename it at the end.

 

data test_(rename=(a=x b=y c=z));
set temp_;
array level_ x -- z;
array level_C $20 a b c;
do over level_;
    if level_ < 4 then level_c = 'First_Cluster';
    else if 4 <= level_ < 7 then level_c = 'Second_Cluster';
    else if 7 <= level_ then level_c = 'Third_Cluster';
    else level_c = 'Other';
end;
drop x y z;
run;
andreas_lds
Jade | Level 19

What is a "dataframe"?

What do you expect as result?

@japelin named the main problem: you can't change the type of a variable.

You could define a format and attach it to the variables. You won't see the original value, but only the text:

proc format;
  value Cluster
    LOW -< 4 = 'First Cluster'
    4 -< 7 = 'Second Cluster'
    7 - HIGH = 'Third Cluster'
  ;
run;

proc print data=temp_;
  format x y z Cluster.;
run;

 't

ballardw
Super User

@EvansHsieh wrote:

Here, I have a dataframe followed by

---------------------------------------------------------------------------

data temp_;
input x y z;
cards;
1 2 10
2 5 10
3 6 10
7 1 2
run;

---------------------------------------------------------------------------

And now i wanna classify each columns with same rules :

(1) variable < 4 then 'First Cluster'

(2) 4 <= variable < 7 then 'Second Cluster'

(3) 7<= variable then 'Third Cluster'.

Here I make an attempt,

---------------------------------------------------------------------------

data test_;
set temp_;
array level_ x -- z;
do over level_;
    if level_ < 4 then level_ = 'First_Cluster';
    else if 4 <= level_ < 7 then level_ = 'Second_Cluster';
    else if 7 <= level_ then level_ = 'Third_Cluster';
    else level_ = 'Other';
end;
run;

---------------------------------------------------------------------------

But what the output frame is all of null value. It seems like `array` can't be assign with string type. So can't I split numeric variable by `array`? or there is any suggestion?

 

Thank you for all your help !! 

 


Did you read the LOG at all?

The second data step shown above:

406  data test_;
407  set temp_;
408  array level_ x -- z;
409  do over level_;
410      if level_ < 4 then level_ = 'First_Cluster';
411      else if 4 <= level_ < 7 then level_ = 'Second_Cluster';
412      else if 7 <= level_ then level_ = 'Third_Cluster';
413      else level_ = 'Other';
414  end;
415  run;

NOTE: Character values have been converted to numeric values at the places given by:
      (Line):(Column).
      410:24   411:34   412:30   413:10
NOTE: Invalid numeric data, 'First_Cluster' , at line 410 column 33.
NOTE: Invalid numeric data, 'First_Cluster' , at line 410 column 33.
NOTE: Invalid numeric data, 'Third_Cluster' , at line 412 column 39.
x=. y=. z=. _I_=4 _ERROR_=1 _N_=1
NOTE: Invalid numeric data, 'First_Cluster' , at line 410 column 33.
NOTE: Invalid numeric data, 'Second_Cluster' , at line 411 column 43.
NOTE: Invalid numeric data, 'Third_Cluster' , at line 412 column 39.
x=. y=. z=. _I_=4 _ERROR_=1 _N_=2
NOTE: Invalid numeric data, 'First_Cluster' , at line 410 column 33.
NOTE: Invalid numeric data, 'Second_Cluster' , at line 411 column 43.
NOTE: Invalid numeric data, 'Third_Cluster' , at line 412 column 39.
x=. y=. z=. _I_=4 _ERROR_=1 _N_=3
NOTE: Invalid numeric data, 'Third_Cluster' , at line 412 column 39.
NOTE: Invalid numeric data, 'First_Cluster' , at line 410 column 33.
NOTE: Invalid numeric data, 'First_Cluster' , at line 410 column 33.
x=. y=. z=. _I_=4 _ERROR_=1 _N_=4
NOTE: There were 4 observations read from the data set WORK.TEMP_.
NOTE: The data set WORK.TEST_ has 4 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


Does the phrase "Invalid numeric data, 'First_cluster' not make sense?

Or the "Character values have been converted to numeric values at the places given by:" ?

The log tells that you are attempting to convert character values to numeric and failing.

 

It is a good idea to show what you expect for output and if the result should be a data set for further manipulation or a report that people will read.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 465 views
  • 0 likes
  • 4 in conversation