DATA Step, Macro, Functions and more

How to operate variables according to certain condition of otherr variables (number/letter)?

Accepted Solution Solved
Reply
Contributor
Posts: 50
Accepted Solution

How to operate variables according to certain condition of otherr variables (number/letter)?

Suppose I have such data as follows

 

Name    Weight     Age      ID#

John         90         18        D123

Jack         89         18        D223

Jay           91         18        39DD

Jone         91         18        38DD

Brian        88         19        B123

Bob          87         19       B223

Bean        92         19       23BB

Black        92         19       24BB

 

So the ID# variable alphanumeric, but start with either a number or a letter.

The basic thought is

If people with same age, then choose people whose ID# starts with a alphabet letter, and subtotal their weight, and divide the subtotal by total weight of the people with same age.

 

For example, I would like to pick John and Jack (as they have same age and ID# starts with a letter), and subtotal their weight, and divide it by the overall weight of John Jack Jay Jone (they have the same age).

 

So the condition is

Age same, ID# start with a letter

then

Subtotal their weight,

Divide it by total weight with same Age

 

Can SAS do such work as recognize if a value starts with number or letter?

 

Thank you all very much.

I am just a beginner and humbly would like to learn from you. It is really a headache to jump to this level.

Thanks again.


Accepted Solutions
Solution
‎01-28-2018 03:32 PM
Super User
Posts: 24,027

Re: How to operate variables according to certain condition of otherr variables (number/letter)?

That's a really strange problem. 

 

Yes, SAS can identify if a string starts with a letter or number. The simplest way, for a beginner, would be to extract the first character using either CHAR() or SUBSTR() function and then use ANYALPHA() or ANYDIGIT() functions to test if the variable is a number or letter. 

 

data want;

set have;

length first_char_cat $12;

first_char = substr(id, 1, 1);

if anydigit(first_char) then first_char_cat = 'Letter';
else if anyalpha(first_char) then first_char_cat = 'Number';
else first_char_cat = 'ERROR';

run;

To do your summaries it may be worth creating some summary statistics and joining them back into the table. 

You can use PROC MEANS to calculate summary statistics for your specific groups and then merge them back in with the data. 

It sounds like you have two groups, one by Age and one by Age and the first character? 

 

Here are some examples here on how to create summary statistics and add them to your data set. Once you have the summary statistics in the calculation of the weights is a trivial exercise. 

 

https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas

 

 


yanshuai wrote:

Suppose I have such data as follows

 

Name    Weight     Age      ID#

John         90         18        D123

Jack         89         18        D223

Jay           91         18        39DD

Jone         91         18        38DD

Brian        88         19        B123

Bob          87         19       B223

Bean        92         19       23BB

Black        92         19       24BB

 

So the ID# variable alphanumeric, but start with either a number or a letter.

The basic thought is

If people with same age, then choose people whose ID# starts with a alphabet letter, and subtotal their weight, and divide the subtotal by total weight of the people with same age.

 

For example, I would like to pick John and Jack (as they have same age and ID# starts with a letter), and subtotal their weight, and divide it by the overall weight of John Jack Jay Jone (they have the same age).

 

So the condition is

Age same, ID# start with a letter

then

Subtotal their weight,

Divide it by total weight with same Age

 

Can SAS do such work as recognize if a value starts with number or letter?

 

Thank you all very much.

I am just a beginner and humbly would like to learn from you. It is really a headache to jump to this level.

Thanks again.




 

View solution in original post


All Replies
Solution
‎01-28-2018 03:32 PM
Super User
Posts: 24,027

Re: How to operate variables according to certain condition of otherr variables (number/letter)?

That's a really strange problem. 

 

Yes, SAS can identify if a string starts with a letter or number. The simplest way, for a beginner, would be to extract the first character using either CHAR() or SUBSTR() function and then use ANYALPHA() or ANYDIGIT() functions to test if the variable is a number or letter. 

 

data want;

set have;

length first_char_cat $12;

first_char = substr(id, 1, 1);

if anydigit(first_char) then first_char_cat = 'Letter';
else if anyalpha(first_char) then first_char_cat = 'Number';
else first_char_cat = 'ERROR';

run;

To do your summaries it may be worth creating some summary statistics and joining them back into the table. 

You can use PROC MEANS to calculate summary statistics for your specific groups and then merge them back in with the data. 

It sounds like you have two groups, one by Age and one by Age and the first character? 

 

Here are some examples here on how to create summary statistics and add them to your data set. Once you have the summary statistics in the calculation of the weights is a trivial exercise. 

 

https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas

 

 


yanshuai wrote:

Suppose I have such data as follows

 

Name    Weight     Age      ID#

John         90         18        D123

Jack         89         18        D223

Jay           91         18        39DD

Jone         91         18        38DD

Brian        88         19        B123

Bob          87         19       B223

Bean        92         19       23BB

Black        92         19       24BB

 

So the ID# variable alphanumeric, but start with either a number or a letter.

The basic thought is

If people with same age, then choose people whose ID# starts with a alphabet letter, and subtotal their weight, and divide the subtotal by total weight of the people with same age.

 

For example, I would like to pick John and Jack (as they have same age and ID# starts with a letter), and subtotal their weight, and divide it by the overall weight of John Jack Jay Jone (they have the same age).

 

So the condition is

Age same, ID# start with a letter

then

Subtotal their weight,

Divide it by total weight with same Age

 

Can SAS do such work as recognize if a value starts with number or letter?

 

Thank you all very much.

I am just a beginner and humbly would like to learn from you. It is really a headache to jump to this level.

Thanks again.




 

Contributor
Posts: 50

Re: How to operate variables according to certain condition of otherr variables (number/letter)?

Thank you so much Reeza,

 

So there will a new variable first_char_cat telling me if ID# start with a letter or number.

Very straightforward, thanks.

 

Then how can I subtotal all the Weight observation value that first_char_cat is Letter or Number?

 

I am thinking this way,

 

what I need

= (subtotal of the Weight if observation has same age and ID# start with letter) / (total of the Weight if observation only has same age)

 

I can figure our how to calculate the denominator, simply SORT BY then SUM will be done.

 

But how to SUM a variable value under two different conditions?

 

Thank you very much, Reeza.

Super User
Posts: 24,027

Re: How to operate variables according to certain condition of otherr variables (number/letter)?

Did you check and run the examples I included in the link? 

It fully walks through calculating summary statistics by a group and adding it back to a data set. 

 

If you have, which portion isn't making sense to you?

Contributor
Posts: 50

Re: How to operate variables according to certain condition of otherr variables (number/letter)?

Thank you Reeza!

 

I understand your code intuitively!

 

Next step for me will be doing the same thing using macro....as I have a thousand data files....but the variable names are the same mostly.

 

Do you have a hint on that?

 

Thanks a lot anyway.

Super User
Posts: 24,027

Re: How to operate variables according to certain condition of otherr variables (number/letter)?


yanshuai wrote:

Thank you Reeza!

 

I understand your code intuitively!

 

Next step for me will be doing the same thing using macro....as I have a thousand data files....but the variable names are the same mostly.

 

Do you have a hint on that?

 

Thanks a lot anyway.


Don't. Append them together and process them all at once. Add a variable to the output data set that identifies which observations come from which input data set.  

 

Or if you really want a macro - look at CALL EXECUTE() documentation on how to both build a simple macro and how to call it in a data driven methodology. There's also examples in my GitHub repo.

 

 

Esteemed Advisor
Posts: 5,627

Re: How to operate variables according to certain condition of otherr variables (number/letter)?

[ Edited ]

SAS can calculate almost anything, even meaningless stuff. SQL is one way to do this:

 

proc sql;
title "Just the summary";
select 
    age,
    sum(anyalpha(first(id))) as nbLetterIds,
    count(id) as nbIds,
    sum(weight*anyalpha(first(id))) / sum(weight) as letterIdWeightFraction
from have
group by age;
title "All the data";
select 
    *,
    sum(weight*anyalpha(first(id))) / sum(weight) as letterIdWeightFraction
from have
group by age;
quit;

 

PG
Contributor
Posts: 50

Re: How to operate variables according to certain condition of otherr variables (number/letter)?

Thank you PG!

 

I use your code and it gives me what I want.

 

But what's the meaning of "All the Data" table? nothing shows up.

 

Anyways, it works.

Esteemed Advisor
Posts: 5,627

Re: How to operate variables according to certain condition of otherr variables (number/letter)?

"All the data" query uses remerging to compile a table identical to have but with an extra column for the summary info.

PG
Contributor
Posts: 50

Re: How to operate variables according to certain condition of otherr variables (number/letter)?

Thank you!

 

Do you know how to do the same thing with macro command?

 

Eventually I have to learn macro...

 

Thanks.

Contributor
Posts: 50

Re: How to operate variables according to certain condition of otherr variables (number/letter)?

Hello PG,

 

I come again. Really need your help.

 

I am using your code to capture what I want. But I wonder how to run the code for each dataset in one library (there are 1000 individual dataset, so I cannot do this manually).

 

I am looking at macro documentation, but cannot figure out by myself.

 

Thank you.

Esteemed Advisor
Posts: 5,627

Re: How to operate variables according to certain condition of otherr variables (number/letter)?

SQL doesn't operate on dataset lists. Post as a new topic. I'm sure someone will help out.

PG
Super User
Posts: 2,075

Re: How to operate variables according to certain condition of otherr variables (number/letter)?

[ Edited ]
data have;
input Name  $  Weight     Age      ID $;
datalines;
John         90         18        D123
Jack         89         18        D223
Jay           91         18        39DD
Jone         91         18        38DD
Brian        88         19        B123
Bob          87         19       B223
Bean        92         19       23BB
Black        92         19       24BB
;

data group;
set have;
group=notdigit(id);
run;

data want;
grptot=0;
do until(last.age);
set group;
by age group;
grptot+weight;
end;
do until(last.age);
grpsubtot=0;
do until(last.group);
set group;
by age group;
grpsubtot+weight;
if last.group then do; want=grpsubtot/grptot;output;end;
end;
end;
run;


Contributor
Posts: 50

Re: How to operate variables according to certain condition of otherr variables (number/letter)?

Posted in reply to novinosrin

Your code is very straightforward intuitively. I got its logic!

Thank you!

 

Do you have a idea how to do this using macro?

 

Thank you!

Super User
Posts: 24,027

Re: How to operate variables according to certain condition of otherr variables (number/letter)?

This problem doesn't lend itself to macros. 

Macros are useful for:

1. Generalizing a process that you will repeat with different conditions

2. Looping when By processing isn't going to work for some reason.

 

I would leave macro's to your later learning, learn arrays, functions such as substr() and scan() and other things first. 

 

If you want a good macro tutorial UCLA has a good one. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 250 views
  • 2 likes
  • 4 in conversation