Here, I have a dataframe followed by
---------------------------------------------------------------------------
data temp_;
input x y z;
cards;
1 2 10
2 5 10
3 6 10
7 1 2
run;
---------------------------------------------------------------------------
And now i wanna classify each columns with same rules :
(1) variable < 4 then 'First Cluster'
(2) 4 <= variable < 7 then 'Second Cluster'
(3) 7<= variable then 'Third Cluster'.
Here I make an attempt,
---------------------------------------------------------------------------
data test_;
set temp_;
array level_ x -- z;
do over level_;
if level_ < 4 then level_ = 'First_Cluster';
else if 4 <= level_ < 7 then level_ = 'Second_Cluster';
else if 7 <= level_ then level_ = 'Third_Cluster';
else level_ = 'Other';
end;
run;
---------------------------------------------------------------------------
But what the output frame is all of null value. It seems like `array` can't be assign with string type. So can't I split numeric variable by `array`? or there is any suggestion?
Thank you for all your help !!
What is a "dataframe"?
What do you expect as result?
@japelin named the main problem: you can't change the type of a variable.
You could define a format and attach it to the variables. You won't see the original value, but only the text:
proc format;
value Cluster
LOW -< 4 = 'First Cluster'
4 -< 7 = 'Second Cluster'
7 - HIGH = 'Third Cluster'
;
run;
proc print data=temp_;
format x y z Cluster.;
run;
't
I think this is because "level_" is a numeric array, but you are trying to assign a string to it in the do over statement, and it is not being handled correctly.
If you really want to use the same variable name, you need to assign it to a temporary variable and then rename it at the end.
data test_(rename=(a=x b=y c=z));
set temp_;
array level_ x -- z;
array level_C $20 a b c;
do over level_;
if level_ < 4 then level_c = 'First_Cluster';
else if 4 <= level_ < 7 then level_c = 'Second_Cluster';
else if 7 <= level_ then level_c = 'Third_Cluster';
else level_c = 'Other';
end;
drop x y z;
run;
What is a "dataframe"?
What do you expect as result?
@japelin named the main problem: you can't change the type of a variable.
You could define a format and attach it to the variables. You won't see the original value, but only the text:
proc format;
value Cluster
LOW -< 4 = 'First Cluster'
4 -< 7 = 'Second Cluster'
7 - HIGH = 'Third Cluster'
;
run;
proc print data=temp_;
format x y z Cluster.;
run;
't
@EvansHsieh wrote:
Here, I have a dataframe followed by
---------------------------------------------------------------------------
data temp_;
input x y z;
cards;
1 2 10
2 5 10
3 6 10
7 1 2
run;---------------------------------------------------------------------------
And now i wanna classify each columns with same rules :
(1) variable < 4 then 'First Cluster'
(2) 4 <= variable < 7 then 'Second Cluster'
(3) 7<= variable then 'Third Cluster'.
Here I make an attempt,
---------------------------------------------------------------------------
data test_;
set temp_;
array level_ x -- z;
do over level_;
if level_ < 4 then level_ = 'First_Cluster';
else if 4 <= level_ < 7 then level_ = 'Second_Cluster';
else if 7 <= level_ then level_ = 'Third_Cluster';
else level_ = 'Other';
end;
run;---------------------------------------------------------------------------
But what the output frame is all of null value. It seems like `array` can't be assign with string type. So can't I split numeric variable by `array`? or there is any suggestion?
Thank you for all your help !!
Did you read the LOG at all?
The second data step shown above:
406 data test_; 407 set temp_; 408 array level_ x -- z; 409 do over level_; 410 if level_ < 4 then level_ = 'First_Cluster'; 411 else if 4 <= level_ < 7 then level_ = 'Second_Cluster'; 412 else if 7 <= level_ then level_ = 'Third_Cluster'; 413 else level_ = 'Other'; 414 end; 415 run; NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 410:24 411:34 412:30 413:10 NOTE: Invalid numeric data, 'First_Cluster' , at line 410 column 33. NOTE: Invalid numeric data, 'First_Cluster' , at line 410 column 33. NOTE: Invalid numeric data, 'Third_Cluster' , at line 412 column 39. x=. y=. z=. _I_=4 _ERROR_=1 _N_=1 NOTE: Invalid numeric data, 'First_Cluster' , at line 410 column 33. NOTE: Invalid numeric data, 'Second_Cluster' , at line 411 column 43. NOTE: Invalid numeric data, 'Third_Cluster' , at line 412 column 39. x=. y=. z=. _I_=4 _ERROR_=1 _N_=2 NOTE: Invalid numeric data, 'First_Cluster' , at line 410 column 33. NOTE: Invalid numeric data, 'Second_Cluster' , at line 411 column 43. NOTE: Invalid numeric data, 'Third_Cluster' , at line 412 column 39. x=. y=. z=. _I_=4 _ERROR_=1 _N_=3 NOTE: Invalid numeric data, 'Third_Cluster' , at line 412 column 39. NOTE: Invalid numeric data, 'First_Cluster' , at line 410 column 33. NOTE: Invalid numeric data, 'First_Cluster' , at line 410 column 33. x=. y=. z=. _I_=4 _ERROR_=1 _N_=4 NOTE: There were 4 observations read from the data set WORK.TEMP_. NOTE: The data set WORK.TEST_ has 4 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Does the phrase "Invalid numeric data, 'First_cluster' not make sense?
Or the "Character values have been converted to numeric values at the places given by:" ?
The log tells that you are attempting to convert character values to numeric and failing.
It is a good idea to show what you expect for output and if the result should be a data set for further manipulation or a report that people will read.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.