BookmarkSubscribeRSS Feed
Barney1998
Obsidian | Level 7

Dear all,

for example I have one data set like:

age hight

12.11   1.49

12.12   1.49

12.15   1.52

12 .21   1.63

12.2    1.50

...

13 .01   1.29

13 .02  1.70

...

15 .75   1.78

16  .42  1.60

16  .87  1.57

 

and one other which give me the borders for classification

from example 

age short normal hight

12.1  1.40  1.50  1.70

12.2   1.41  1.51  1.74

..

and i want to combine this data set to create one table with age hight hight_category, for example if someone has hight equal to 1.31 and he is 12.11 years old he be categorized as short i.e

age hight category

12.11 1.31 short

 

 

thanks in advance ! 

 

5 REPLIES 5
ballardw
Super User

So, what value of AGE from the first table that is 12.15 do you compare with in the second table? Is 12.15 to use the 12.1 or the 12.2 row? what about 12.14 or 12.16?

If the age were 12.1 and the hight (sic) in the data is exactly 1.40 is that "Short" or "Normal".

You need to describe the general RULES for any comparisons. When you have range of values you need to specify if the end point is included or not and the lower and upper bound. As shown I do not know what the rules for Hight are. And what if the value of Hight for age 12.1 is 1.8??

 

You really need to show a result that matches your input.

This

age hight category
12.11 1.31 short

was not in your first data example. So where did the 1.31 come from?

 

And what are all the spaces in this

15 .75   1.78

16  .42  1.60

16  .87  1.57

you now have 3 variables with one not defined /described.

 

Just how important is the second decimal place in age in the comparisons.?

Barney1998
Obsidian | Level 7

Hallo again and I really apologize about my delay.

Thank you for your advice,

First of all, i give one specific example to give you to know what i really need.

I know the data statements and the procedures which i can implied for, my question is exist more efficient way than common because my decision bounds are really much. I was thinking, Ι take the advantage of the second table .As I have detailed bellow :

.For example on my first observation age:12.11,hight: 1.49 ,i want my algorithm take the value from the age and it looking for between which values from the second table is in. After from second table and with use of variable of hight it classified if the subject is in category of short normal or high i.e for us example (first obs)  because age:12.1<12.11<12.2 and high:1.40 < 1.49 <1.50 the category is short. May it is useful to create one new category  if hight <1.40 from example for ages between 12.1,12.2 as too short .The bounds of decision to that ages group(12.1-12.2) are 

[0,1.40) too short,[1.40,1.50) short ,[1.50,1.70) normal and [170,+00) high

FreelanceReinh
Jade | Level 19

Hallo @Barney1998,

 

You could retrieve the relevant bounds from a hash object created from your second dataset. This requires that the lookup can be based on an exact match of the age value. In the example below this is achieved by rounding age values to one decimal and having a complete lookup table, i.e., with no gaps in the sequence 12.0, 12.1, ..., 17.9, 18.0. (You might need to fill gaps in your real data.) Height values are rounded to two decimals to make sure that borderline values are classified correctly.

/* Create sample data for demonstration */

data have;
call streaminit(2377);
do id=1 to 100;
  age=rand('uniform',12,18);
  height=round(0.067*age+rand('normal',0.6,0.07),0.01);
  output;
end;
run;

data ranges;
do _n_=120 to 180;
  age=_n_/10;
  h1=round(0.067*age+0.46,0.01);
  h2=round(h1+0.07,.01);
  h3=round(h2+0.14,.01);
  output;
end;
run;

/* Classify heights depending on age */

proc format;
value hcfmt
1='too short'
2='short'
3='normal'
4='tall';
run;

data want(drop=h1-h3);
set have;
if _n_=1 then do;
  dcl hash h(dataset:'ranges');
  h.definekey('age');
  h.definedata('h1','h2','h3');
  h.definedone();
  if 0 then set ranges;
end;
if n(age, height)=2 then
if h.find(key:round(age,0.1))=0 then c=1+(height>=h1)+(height>=h2)+(height>=h3);
format c hcfmt.;
run;

These are the first four observations in dataset WANT:

Obs    id      age      height    c

  1     1    13.8719     1.39     short
  2     2    14.4157     1.53     normal
  3     3    15.9520     1.74     tall
  4     4    16.8918     1.58     too short

... and the corresponding observations from dataset RANGES:

Obs     age       h1       h2       h3

 20    13.9     1.39     1.46     1.60
 25    14.4     1.42     1.49     1.63
 41    16.0     1.53     1.60     1.74
 50    16.9     1.59     1.66     1.80
Reeza
Super User
proc sort data=Table1;
by age;
run;

proc sort data=classifcation;
by age;
run;

data want;
merge table1 classification;
by age;

if height < short then HeightCategory= 'Extra Short';
else if  short < =height < Normal then HeightCategory='Short';
else if Normal <=height < High then 'HeightCategory = 'Normal';
else if height >= High then 'HeightCategory = 'High';

*drop short Normal high;
run;
run;

You haven't quite provided enough information about how the categories are applied but above is an example. A join/merge is the simplest approach here. 

Change the conditions to match your requirements.

Barney1998
Obsidian | Level 7

I would like to apologize for delay.

First of all thank you for your advice.

I know this way(which one you have describe below) ,I am training to match the hight value from the first table ,with the hight bounds from the second.To make a long story short ,I want exactly what you do it, but my question is if exist more efficient procedure to implied in this problem (because i have the bounds from the second table).For example on my first observation age:12.11,hight: 1.49 ,i want my algorithm take the value from the age and it looking for between which values from the second table is in. After from second table and with use of variable of hight it classified if the subject is in category of short normal or high i.e for us example (first obs)  because age:12.1<12.11<12.2 and high:1.40 < 1.49 <1.50 the category is short. May it is useful to create one new category  if hight <1.40 from example for ages between 12.1,12.2 as too short .The bounds of decision to that ages group(12.1-12.2) are 

[0,1.40) too short,[1.40,1.50) short ,[1.50,1.70) normal and [170,+00) high

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 378 views
  • 0 likes
  • 4 in conversation