Dear all,
for example I have one data set like:
age hight
12.11 1.49
12.12 1.49
12.15 1.52
12 .21 1.63
12.2 1.50
...
13 .01 1.29
13 .02 1.70
...
15 .75 1.78
16 .42 1.60
16 .87 1.57
and one other which give me the borders for classification
from example
age short normal hight
12.1 1.40 1.50 1.70
12.2 1.41 1.51 1.74
..
and i want to combine this data set to create one table with age hight hight_category, for example if someone has hight equal to 1.31 and he is 12.11 years old he be categorized as short i.e
age hight category
12.11 1.31 short
thanks in advance !
So, what value of AGE from the first table that is 12.15 do you compare with in the second table? Is 12.15 to use the 12.1 or the 12.2 row? what about 12.14 or 12.16?
If the age were 12.1 and the hight (sic) in the data is exactly 1.40 is that "Short" or "Normal".
You need to describe the general RULES for any comparisons. When you have range of values you need to specify if the end point is included or not and the lower and upper bound. As shown I do not know what the rules for Hight are. And what if the value of Hight for age 12.1 is 1.8??
You really need to show a result that matches your input.
This
age hight category 12.11 1.31 short
was not in your first data example. So where did the 1.31 come from?
And what are all the spaces in this
15 .75 1.78 16 .42 1.60 16 .87 1.57
you now have 3 variables with one not defined /described.
Just how important is the second decimal place in age in the comparisons.?
Hallo again and I really apologize about my delay.
Thank you for your advice,
First of all, i give one specific example to give you to know what i really need.
I know the data statements and the procedures which i can implied for, my question is exist more efficient way than common because my decision bounds are really much. I was thinking, Ι take the advantage of the second table .As I have detailed bellow :
.For example on my first observation age:12.11,hight: 1.49 ,i want my algorithm take the value from the age and it looking for between which values from the second table is in. After from second table and with use of variable of hight it classified if the subject is in category of short normal or high i.e for us example (first obs) because age:12.1<12.11<12.2 and high:1.40 < 1.49 <1.50 the category is short. May it is useful to create one new category if hight <1.40 from example for ages between 12.1,12.2 as too short .The bounds of decision to that ages group(12.1-12.2) are
[0,1.40) too short,[1.40,1.50) short ,[1.50,1.70) normal and [170,+00) high
Hallo @Barney1998,
You could retrieve the relevant bounds from a hash object created from your second dataset. This requires that the lookup can be based on an exact match of the age value. In the example below this is achieved by rounding age values to one decimal and having a complete lookup table, i.e., with no gaps in the sequence 12.0, 12.1, ..., 17.9, 18.0. (You might need to fill gaps in your real data.) Height values are rounded to two decimals to make sure that borderline values are classified correctly.
/* Create sample data for demonstration */
data have;
call streaminit(2377);
do id=1 to 100;
age=rand('uniform',12,18);
height=round(0.067*age+rand('normal',0.6,0.07),0.01);
output;
end;
run;
data ranges;
do _n_=120 to 180;
age=_n_/10;
h1=round(0.067*age+0.46,0.01);
h2=round(h1+0.07,.01);
h3=round(h2+0.14,.01);
output;
end;
run;
/* Classify heights depending on age */
proc format;
value hcfmt
1='too short'
2='short'
3='normal'
4='tall';
run;
data want(drop=h1-h3);
set have;
if _n_=1 then do;
dcl hash h(dataset:'ranges');
h.definekey('age');
h.definedata('h1','h2','h3');
h.definedone();
if 0 then set ranges;
end;
if n(age, height)=2 then
if h.find(key:round(age,0.1))=0 then c=1+(height>=h1)+(height>=h2)+(height>=h3);
format c hcfmt.;
run;
These are the first four observations in dataset WANT:
Obs id age height c 1 1 13.8719 1.39 short 2 2 14.4157 1.53 normal 3 3 15.9520 1.74 tall 4 4 16.8918 1.58 too short
... and the corresponding observations from dataset RANGES:
Obs age h1 h2 h3 20 13.9 1.39 1.46 1.60 25 14.4 1.42 1.49 1.63 41 16.0 1.53 1.60 1.74 50 16.9 1.59 1.66 1.80
proc sort data=Table1;
by age;
run;
proc sort data=classifcation;
by age;
run;
data want;
merge table1 classification;
by age;
if height < short then HeightCategory= 'Extra Short';
else if short < =height < Normal then HeightCategory='Short';
else if Normal <=height < High then 'HeightCategory = 'Normal';
else if height >= High then 'HeightCategory = 'High';
*drop short Normal high;
run;
run;
You haven't quite provided enough information about how the categories are applied but above is an example. A join/merge is the simplest approach here.
Change the conditions to match your requirements.
I would like to apologize for delay.
First of all thank you for your advice.
I know this way(which one you have describe below) ,I am training to match the hight value from the first table ,with the hight bounds from the second.To make a long story short ,I want exactly what you do it, but my question is if exist more efficient procedure to implied in this problem (because i have the bounds from the second table).For example on my first observation age:12.11,hight: 1.49 ,i want my algorithm take the value from the age and it looking for between which values from the second table is in. After from second table and with use of variable of hight it classified if the subject is in category of short normal or high i.e for us example (first obs) because age:12.1<12.11<12.2 and high:1.40 < 1.49 <1.50 the category is short. May it is useful to create one new category if hight <1.40 from example for ages between 12.1,12.2 as too short .The bounds of decision to that ages group(12.1-12.2) are
[0,1.40) too short,[1.40,1.50) short ,[1.50,1.70) normal and [170,+00) high
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.