BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
yanshuai
Quartz | Level 8

Suppose I have such data as follows

 

Name    Weight     Age      ID#

John         90         18        D123

Jack         89         18        D223

Jay           91         18        39DD

Jone         91         18        38DD

Brian        88         19        B123

Bob          87         19       B223

Bean        92         19       23BB

Black        92         19       24BB

 

So the ID# variable alphanumeric, but start with either a number or a letter.

The basic thought is

If people with same age, then choose people whose ID# starts with a alphabet letter, and subtotal their weight, and divide the subtotal by total weight of the people with same age.

 

For example, I would like to pick John and Jack (as they have same age and ID# starts with a letter), and subtotal their weight, and divide it by the overall weight of John Jack Jay Jone (they have the same age).

 

So the condition is

Age same, ID# start with a letter

then

Subtotal their weight,

Divide it by total weight with same Age

 

Can SAS do such work as recognize if a value starts with number or letter?

 

Thank you all very much.

I am just a beginner and humbly would like to learn from you. It is really a headache to jump to this level.

Thanks again.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

That's a really strange problem. 

 

Yes, SAS can identify if a string starts with a letter or number. The simplest way, for a beginner, would be to extract the first character using either CHAR() or SUBSTR() function and then use ANYALPHA() or ANYDIGIT() functions to test if the variable is a number or letter. 

 

data want;

set have;

length first_char_cat $12;

first_char = substr(id, 1, 1);

if anydigit(first_char) then first_char_cat = 'Letter';
else if anyalpha(first_char) then first_char_cat = 'Number';
else first_char_cat = 'ERROR';

run;

To do your summaries it may be worth creating some summary statistics and joining them back into the table. 

You can use PROC MEANS to calculate summary statistics for your specific groups and then merge them back in with the data. 

It sounds like you have two groups, one by Age and one by Age and the first character? 

 

Here are some examples here on how to create summary statistics and add them to your data set. Once you have the summary statistics in the calculation of the weights is a trivial exercise. 

 

https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas

 

 


@yanshuai wrote:

Suppose I have such data as follows

 

Name    Weight     Age      ID#

John         90         18        D123

Jack         89         18        D223

Jay           91         18        39DD

Jone         91         18        38DD

Brian        88         19        B123

Bob          87         19       B223

Bean        92         19       23BB

Black        92         19       24BB

 

So the ID# variable alphanumeric, but start with either a number or a letter.

The basic thought is

If people with same age, then choose people whose ID# starts with a alphabet letter, and subtotal their weight, and divide the subtotal by total weight of the people with same age.

 

For example, I would like to pick John and Jack (as they have same age and ID# starts with a letter), and subtotal their weight, and divide it by the overall weight of John Jack Jay Jone (they have the same age).

 

So the condition is

Age same, ID# start with a letter

then

Subtotal their weight,

Divide it by total weight with same Age

 

Can SAS do such work as recognize if a value starts with number or letter?

 

Thank you all very much.

I am just a beginner and humbly would like to learn from you. It is really a headache to jump to this level.

Thanks again.




 

View solution in original post

20 REPLIES 20
Reeza
Super User

That's a really strange problem. 

 

Yes, SAS can identify if a string starts with a letter or number. The simplest way, for a beginner, would be to extract the first character using either CHAR() or SUBSTR() function and then use ANYALPHA() or ANYDIGIT() functions to test if the variable is a number or letter. 

 

data want;

set have;

length first_char_cat $12;

first_char = substr(id, 1, 1);

if anydigit(first_char) then first_char_cat = 'Letter';
else if anyalpha(first_char) then first_char_cat = 'Number';
else first_char_cat = 'ERROR';

run;

To do your summaries it may be worth creating some summary statistics and joining them back into the table. 

You can use PROC MEANS to calculate summary statistics for your specific groups and then merge them back in with the data. 

It sounds like you have two groups, one by Age and one by Age and the first character? 

 

Here are some examples here on how to create summary statistics and add them to your data set. Once you have the summary statistics in the calculation of the weights is a trivial exercise. 

 

https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas

 

 


@yanshuai wrote:

Suppose I have such data as follows

 

Name    Weight     Age      ID#

John         90         18        D123

Jack         89         18        D223

Jay           91         18        39DD

Jone         91         18        38DD

Brian        88         19        B123

Bob          87         19       B223

Bean        92         19       23BB

Black        92         19       24BB

 

So the ID# variable alphanumeric, but start with either a number or a letter.

The basic thought is

If people with same age, then choose people whose ID# starts with a alphabet letter, and subtotal their weight, and divide the subtotal by total weight of the people with same age.

 

For example, I would like to pick John and Jack (as they have same age and ID# starts with a letter), and subtotal their weight, and divide it by the overall weight of John Jack Jay Jone (they have the same age).

 

So the condition is

Age same, ID# start with a letter

then

Subtotal their weight,

Divide it by total weight with same Age

 

Can SAS do such work as recognize if a value starts with number or letter?

 

Thank you all very much.

I am just a beginner and humbly would like to learn from you. It is really a headache to jump to this level.

Thanks again.




 

yanshuai
Quartz | Level 8

Thank you so much Reeza,

 

So there will a new variable first_char_cat telling me if ID# start with a letter or number.

Very straightforward, thanks.

 

Then how can I subtotal all the Weight observation value that first_char_cat is Letter or Number?

 

I am thinking this way,

 

what I need

= (subtotal of the Weight if observation has same age and ID# start with letter) / (total of the Weight if observation only has same age)

 

I can figure our how to calculate the denominator, simply SORT BY then SUM will be done.

 

But how to SUM a variable value under two different conditions?

 

Thank you very much, Reeza.

Reeza
Super User

Did you check and run the examples I included in the link? 

It fully walks through calculating summary statistics by a group and adding it back to a data set. 

 

If you have, which portion isn't making sense to you?

yanshuai
Quartz | Level 8

Thank you Reeza!

 

I understand your code intuitively!

 

Next step for me will be doing the same thing using macro....as I have a thousand data files....but the variable names are the same mostly.

 

Do you have a hint on that?

 

Thanks a lot anyway.

Reeza
Super User

@yanshuai wrote:

Thank you Reeza!

 

I understand your code intuitively!

 

Next step for me will be doing the same thing using macro....as I have a thousand data files....but the variable names are the same mostly.

 

Do you have a hint on that?

 

Thanks a lot anyway.


Don't. Append them together and process them all at once. Add a variable to the output data set that identifies which observations come from which input data set.  

 

Or if you really want a macro - look at CALL EXECUTE() documentation on how to both build a simple macro and how to call it in a data driven methodology. There's also examples in my GitHub repo.

 

 

PGStats
Opal | Level 21

SAS can calculate almost anything, even meaningless stuff. SQL is one way to do this:

 

proc sql;
title "Just the summary";
select 
    age,
    sum(anyalpha(first(id))) as nbLetterIds,
    count(id) as nbIds,
    sum(weight*anyalpha(first(id))) / sum(weight) as letterIdWeightFraction
from have
group by age;
title "All the data";
select 
    *,
    sum(weight*anyalpha(first(id))) / sum(weight) as letterIdWeightFraction
from have
group by age;
quit;

 

PG
yanshuai
Quartz | Level 8

Thank you PG!

 

I use your code and it gives me what I want.

 

But what's the meaning of "All the Data" table? nothing shows up.

 

Anyways, it works.

PGStats
Opal | Level 21

"All the data" query uses remerging to compile a table identical to have but with an extra column for the summary info.

PG
yanshuai
Quartz | Level 8

Thank you!

 

Do you know how to do the same thing with macro command?

 

Eventually I have to learn macro...

 

Thanks.

yanshuai
Quartz | Level 8

Hello PG,

 

I come again. Really need your help.

 

I am using your code to capture what I want. But I wonder how to run the code for each dataset in one library (there are 1000 individual dataset, so I cannot do this manually).

 

I am looking at macro documentation, but cannot figure out by myself.

 

Thank you.

PGStats
Opal | Level 21

SQL doesn't operate on dataset lists. Post as a new topic. I'm sure someone will help out.

PG
novinosrin
Tourmaline | Level 20
data have;
input Name  $  Weight     Age      ID $;
datalines;
John         90         18        D123
Jack         89         18        D223
Jay           91         18        39DD
Jone         91         18        38DD
Brian        88         19        B123
Bob          87         19       B223
Bean        92         19       23BB
Black        92         19       24BB
;

data group;
set have;
group=notdigit(id);
run;

data want;
grptot=0;
do until(last.age);
set group;
by age group;
grptot+weight;
end;
do until(last.age);
grpsubtot=0;
do until(last.group);
set group;
by age group;
grpsubtot+weight;
if last.group then do; want=grpsubtot/grptot;output;end;
end;
end;
run;


yanshuai
Quartz | Level 8

Your code is very straightforward intuitively. I got its logic!

Thank you!

 

Do you have a idea how to do this using macro?

 

Thank you!

Reeza
Super User

This problem doesn't lend itself to macros. 

Macros are useful for:

1. Generalizing a process that you will repeat with different conditions

2. Looping when By processing isn't going to work for some reason.

 

I would leave macro's to your later learning, learn arrays, functions such as substr() and scan() and other things first. 

 

If you want a good macro tutorial UCLA has a good one. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 1009 views
  • 2 likes
  • 4 in conversation