Hello all,
I am trying to calculate percentiles using two datasets, one of which is a standard dataset and the other dataset has systolic and diastolic bp values for each subject .I want to check each observation in the subject dataset against the standard dataset to find a match for the percentile:
Standard dataset:
age | percentile | val | ht |
1 | n1 | 45 | 50.5 |
1 | n2 | 55 | 50.5 |
1 | n3 | 65 | 50.5 |
1 | n4 | 75 | 50.5 |
1 | n5 | 85 | 50.5 |
1 | n1 | 49 | 55.8 |
1 | n2 | 54 | 55.8 |
1 | n3 | 66 | 55.8 |
1 | n4 | 75 | 55.8 |
1 | n5 | 88 | 55.8 |
subject dataset example:
id | ht | val | age |
xyz | 52 | 50 | 1 |
abc | 56 | 71 | 1 |
I want :
id | ht | val | age | percentile |
xyz | 52 | 50 | 1 | n1 |
abc | 56 | 71 | 1 | n3 |
Because the height for subject xyx is 52 which is less than 55.8 from the standard table it would fall under the 50.5 category as I need to consider the lower bound values and the val for that subject is 50 which fall under 45 as it is less than 55 it would be categorized under n1. The actual data consists of many ages with varying heights. Any help would be appreciated. Thank you.
Post test data in the form of a datastep!!
First up, the standard dataset, if you have an upper and lower bound, then it makes sense to keep these in one row:
age percentile val lower_ht upper_ht
1 n1 ? 50.5 55.8
1 n2 ...
That is a sensible model and will then allow you to merge onto your data based on age=age and lower_ht <= ht <= upper_ht.
The problem lies in the data modelling aspect of the "standard dataset", but you can sort this by sorting it by age, percentile, ht, then datastep first obs=lower, second=higher.
sorry about that. Here's the data in datastep:
data std;
input age percentile $ val ht;
datalines;
1 n1 45 50.5
1 n2 55 50.5
1 n3 65 50.5
1 n4 75 50.5
1 n5 85 50.5
1 n1 49 55.8
1 n2 54 55.8
1 n3 66 55.8
1 n4 75 55.8
1 n5 88 55.8
;
run;
data subj;
input id $ ht val age;
datalines;
xyz 52 50 1
abc 56 71 1
;
run;
I will need upper and lower bound values for height and val as well. Thank you!
Well, it doesn't make sense, as all the ranges are the same? Maybe it is the val part I am missing? Anyways some code towards what you want:
data std; input age percentile $ val ht; datalines; 1 n1 45 50.5 1 n2 55 50.5 1 n3 65 50.5 1 n4 75 50.5 1 n5 85 50.5 1 n1 49 55.8 1 n2 54 55.8 1 n3 66 55.8 1 n4 75 55.8 1 n5 88 55.8 ; run; proc sort data=std; by age percentile ht; run; data std_upd (keep=age percentile lower higher); set std; by age percentile; retain lower higher; if first.percentile then lower=ht; if last.percentile then do; higher=ht; output; end; run; data subj; input id $ ht val age; datalines; xyz 52 50 1 abc 56 71 1 ; run; proc sql; create table want as select a.*, b.percentile, b.lower, b.higher from subj a left join std_upd b on a.age=b.age and b.lower <= a.ht <= b.higher; quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.