BookmarkSubscribeRSS Feed
Div_sas
Calcite | Level 5

Hello all,

 

I am trying to calculate percentiles using two datasets, one of which is a standard dataset and the other dataset has systolic and diastolic bp values for each subject .I want to check each observation in the subject dataset against the standard dataset to find a match for the percentile:

 

Standard dataset:

agepercentilevalht
1n14550.5
1n25550.5
1n36550.5
1n47550.5
1n58550.5
1n14955.8
1n25455.8
1n36655.8
1n47555.8
1n58855.8

subject dataset example:

idhtvalage
xyz52501
abc56711

I want :

idhtvalagepercentile
xyz52501n1
abc56711

n3

 

Because the height for subject xyx is 52 which is less than 55.8 from the standard table it would fall under the 50.5 category as I need to consider the lower bound values and the val for that subject is 50 which fall under 45 as it is less than 55  it would be categorized under n1. The actual data consists of many ages with varying heights. Any help would be appreciated. Thank you.

3 REPLIES 3
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep!!

 

First up, the standard dataset, if you have an upper and lower bound, then it makes sense to keep these in one row:

age percentile val  lower_ht upper_ht

1     n1             ?    50.5        55.8

1     n2    ...

That is a sensible model and will then allow you to merge onto your data based on age=age and lower_ht <= ht <= upper_ht.

The problem lies in the data modelling aspect of the "standard dataset", but you can sort this by sorting it by age, percentile, ht, then datastep first obs=lower, second=higher.

 

Div_sas
Calcite | Level 5

sorry about that. Here's the data in datastep:

 

data std;
input age percentile $ val ht;
datalines;
1 n1 45 50.5 
1 n2 55 50.5 
1 n3 65 50.5 
1 n4 75 50.5 
1 n5 85 50.5 
1 n1 49 55.8 
1 n2 54 55.8 
1 n3 66 55.8 
1 n4 75 55.8 
1 n5 88 55.8 
;
run;

data subj;
input id $ ht val age;
datalines;
xyz 52 50 1 
abc 56 71 1 
;
run;

I will need upper and lower bound values for height  and val as well. Thank you!

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, it doesn't make sense, as all the ranges are the same? Maybe it is the val part I am missing?  Anyways some code towards what you want:

data std;
  input age percentile $ val ht;
datalines;
1 n1 45 50.5 
1 n2 55 50.5 
1 n3 65 50.5 
1 n4 75 50.5 
1 n5 85 50.5 
1 n1 49 55.8 
1 n2 54 55.8 
1 n3 66 55.8 
1 n4 75 55.8 
1 n5 88 55.8 
;
run;

proc sort data=std;
  by age percentile ht;
run;

data std_upd (keep=age percentile lower higher);
  set std;
  by age percentile;
  retain lower higher;
  if first.percentile then lower=ht;
  if last.percentile then do;
    higher=ht;
    output;
  end;
run;
  
data subj;
  input id $ ht val age;
datalines;
xyz 52 50 1 
abc 56 71 1 
;
run;

proc sql;
  create table want as 
  select a.*,
         b.percentile,
         b.lower,
         b.higher
  from   subj a
  left join std_upd b
  on     a.age=b.age
  and    b.lower <= a.ht <= b.higher;
quit;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1064 views
  • 0 likes
  • 2 in conversation