Suppose I have such data as follows
Name Weight Age ID#
John 90 18 D123
Jack 89 18 D223
Jay 91 18 39DD
Jone 91 18 38DD
Brian 88 19 B123
Bob 87 19 B223
Bean 92 19 23BB
Black 92 19 24BB
So the ID# variable alphanumeric, but start with either a number or a letter.
The basic thought is
If people with same age, then choose people whose ID# starts with a alphabet letter, and subtotal their weight, and divide the subtotal by total weight of the people with same age.
For example, I would like to pick John and Jack (as they have same age and ID# starts with a letter), and subtotal their weight, and divide it by the overall weight of John Jack Jay Jone (they have the same age).
So the condition is
Age same, ID# start with a letter
then
Subtotal their weight,
Divide it by total weight with same Age
Can SAS do such work as recognize if a value starts with number or letter?
Thank you all very much.
I am just a beginner and humbly would like to learn from you. It is really a headache to jump to this level.
Thanks again.
That's a really strange problem.
Yes, SAS can identify if a string starts with a letter or number. The simplest way, for a beginner, would be to extract the first character using either CHAR() or SUBSTR() function and then use ANYALPHA() or ANYDIGIT() functions to test if the variable is a number or letter.
data want;
set have;
length first_char_cat $12;
first_char = substr(id, 1, 1);
if anydigit(first_char) then first_char_cat = 'Letter';
else if anyalpha(first_char) then first_char_cat = 'Number';
else first_char_cat = 'ERROR';
run;
To do your summaries it may be worth creating some summary statistics and joining them back into the table.
You can use PROC MEANS to calculate summary statistics for your specific groups and then merge them back in with the data.
It sounds like you have two groups, one by Age and one by Age and the first character?
Here are some examples here on how to create summary statistics and add them to your data set. Once you have the summary statistics in the calculation of the weights is a trivial exercise.
https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas
@yanshuai wrote:
Suppose I have such data as follows
Name Weight Age ID#
John 90 18 D123
Jack 89 18 D223
Jay 91 18 39DD
Jone 91 18 38DD
Brian 88 19 B123
Bob 87 19 B223
Bean 92 19 23BB
Black 92 19 24BB
So the ID# variable alphanumeric, but start with either a number or a letter.
The basic thought is
If people with same age, then choose people whose ID# starts with a alphabet letter, and subtotal their weight, and divide the subtotal by total weight of the people with same age.
For example, I would like to pick John and Jack (as they have same age and ID# starts with a letter), and subtotal their weight, and divide it by the overall weight of John Jack Jay Jone (they have the same age).
So the condition is
Age same, ID# start with a letter
then
Subtotal their weight,
Divide it by total weight with same Age
Can SAS do such work as recognize if a value starts with number or letter?
Thank you all very much.
I am just a beginner and humbly would like to learn from you. It is really a headache to jump to this level.
Thanks again.
That's a really strange problem.
Yes, SAS can identify if a string starts with a letter or number. The simplest way, for a beginner, would be to extract the first character using either CHAR() or SUBSTR() function and then use ANYALPHA() or ANYDIGIT() functions to test if the variable is a number or letter.
data want;
set have;
length first_char_cat $12;
first_char = substr(id, 1, 1);
if anydigit(first_char) then first_char_cat = 'Letter';
else if anyalpha(first_char) then first_char_cat = 'Number';
else first_char_cat = 'ERROR';
run;
To do your summaries it may be worth creating some summary statistics and joining them back into the table.
You can use PROC MEANS to calculate summary statistics for your specific groups and then merge them back in with the data.
It sounds like you have two groups, one by Age and one by Age and the first character?
Here are some examples here on how to create summary statistics and add them to your data set. Once you have the summary statistics in the calculation of the weights is a trivial exercise.
https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas
@yanshuai wrote:
Suppose I have such data as follows
Name Weight Age ID#
John 90 18 D123
Jack 89 18 D223
Jay 91 18 39DD
Jone 91 18 38DD
Brian 88 19 B123
Bob 87 19 B223
Bean 92 19 23BB
Black 92 19 24BB
So the ID# variable alphanumeric, but start with either a number or a letter.
The basic thought is
If people with same age, then choose people whose ID# starts with a alphabet letter, and subtotal their weight, and divide the subtotal by total weight of the people with same age.
For example, I would like to pick John and Jack (as they have same age and ID# starts with a letter), and subtotal their weight, and divide it by the overall weight of John Jack Jay Jone (they have the same age).
So the condition is
Age same, ID# start with a letter
then
Subtotal their weight,
Divide it by total weight with same Age
Can SAS do such work as recognize if a value starts with number or letter?
Thank you all very much.
I am just a beginner and humbly would like to learn from you. It is really a headache to jump to this level.
Thanks again.
Thank you so much Reeza,
So there will a new variable first_char_cat telling me if ID# start with a letter or number.
Very straightforward, thanks.
Then how can I subtotal all the Weight observation value that first_char_cat
is Letter or Number?
I am thinking this way,
what I need
= (subtotal of the Weight if observation has same age and ID# start with letter) / (total of the Weight if observation only has same age)
I can figure our how to calculate the denominator, simply SORT BY then SUM will be done.
But how to SUM a variable value under two different conditions?
Thank you very much, Reeza.
Did you check and run the examples I included in the link?
It fully walks through calculating summary statistics by a group and adding it back to a data set.
If you have, which portion isn't making sense to you?
Thank you Reeza!
I understand your code intuitively!
Next step for me will be doing the same thing using macro....as I have a thousand data files....but the variable names are the same mostly.
Do you have a hint on that?
Thanks a lot anyway.
@yanshuai wrote:
Thank you Reeza!
I understand your code intuitively!
Next step for me will be doing the same thing using macro....as I have a thousand data files....but the variable names are the same mostly.
Do you have a hint on that?
Thanks a lot anyway.
Don't. Append them together and process them all at once. Add a variable to the output data set that identifies which observations come from which input data set.
Or if you really want a macro - look at CALL EXECUTE() documentation on how to both build a simple macro and how to call it in a data driven methodology. There's also examples in my GitHub repo.
SAS can calculate almost anything, even meaningless stuff. SQL is one way to do this:
proc sql;
title "Just the summary";
select
age,
sum(anyalpha(first(id))) as nbLetterIds,
count(id) as nbIds,
sum(weight*anyalpha(first(id))) / sum(weight) as letterIdWeightFraction
from have
group by age;
title "All the data";
select
*,
sum(weight*anyalpha(first(id))) / sum(weight) as letterIdWeightFraction
from have
group by age;
quit;
Thank you PG!
I use your code and it gives me what I want.
But what's the meaning of "All the Data" table? nothing shows up.
Anyways, it works.
"All the data" query uses remerging to compile a table identical to have but with an extra column for the summary info.
Thank you!
Do you know how to do the same thing with macro command?
Eventually I have to learn macro...
Thanks.
Hello PG,
I come again. Really need your help.
I am using your code to capture what I want. But I wonder how to run the code for each dataset in one library (there are 1000 individual dataset, so I cannot do this manually).
I am looking at macro documentation, but cannot figure out by myself.
Thank you.
SQL doesn't operate on dataset lists. Post as a new topic. I'm sure someone will help out.
data have;
input Name $ Weight Age ID $;
datalines;
John 90 18 D123
Jack 89 18 D223
Jay 91 18 39DD
Jone 91 18 38DD
Brian 88 19 B123
Bob 87 19 B223
Bean 92 19 23BB
Black 92 19 24BB
;
data group;
set have;
group=notdigit(id);
run;
data want;
grptot=0;
do until(last.age);
set group;
by age group;
grptot+weight;
end;
do until(last.age);
grpsubtot=0;
do until(last.group);
set group;
by age group;
grpsubtot+weight;
if last.group then do; want=grpsubtot/grptot;output;end;
end;
end;
run;
Your code is very straightforward intuitively. I got its logic!
Thank you!
Do you have a idea how to do this using macro?
Thank you!
This problem doesn't lend itself to macros.
Macros are useful for:
1. Generalizing a process that you will repeat with different conditions
2. Looping when By processing isn't going to work for some reason.
I would leave macro's to your later learning, learn arrays, functions such as substr() and scan() and other things first.
If you want a good macro tutorial UCLA has a good one.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.