I have a data that has activities each person performs during a season -
Name | Season | Activity |
A | 1 | x |
A | 1 | y |
A | 1 | z |
A | 1 | t |
A | 2 | y |
A | 2 | u |
A | 3 | x |
B | 1 | u |
B | 1 | v |
B | 2 | r |
B | 2 | x |
B | 2 | y |
B | 3 | k |
B | 3 | j |
If a person has activity x & y in season 1 then I want all his rows for the first season to have a flag yes-
Name | Season | Disease | Flag |
A | 1 | x | Yes |
A | 1 | y | Yes |
A | 1 | z | Yes |
A | 1 | t | Yes |
A | 2 | y | No |
A | 2 | u | No |
A | 3 | x | No |
similarly I have different disease combination for different seasons and activity.
I am unable to write an efficient code for this.
The code I have come up with is --
Create table activity_x as
Select a.*
From table1 as a join (Select distinct person, season from table1 where activity= 'x') as b
on a.person = b.person and a.season= b.season
order by person, season;
Quit;
This would give me a data-set with all information for person and seasons where atleast one activity in the season is x.
Similarly I would repeat for activity y and get a dataset.
Then find inner join of the two.
But issue is that the activity code combinations are very complex
Any one of this list | and | Any one of this list |
a | r | |
b+c | m | |
d+c | n | |
s+c | o |
Is there any better way to do this?
Hello,
data have;
input name $ season $ activity $;
cards;
A 1 x
A 1 y
A 1 z
A 1 t
A 2 y
A 2 u
A 3 x
B 1 u
B 1 v
B 2 r
B 2 x
B 2 y
B 3 k
B 3 j
;
run;
proc sql noprint;
CREATE TABLE want AS
SELECT *, CASE WHEN (sum(activity="x") AND sum(activity="y")) THEN "YES" ELSE "NO" END AS Flag format=$3.
FROM have
GROUP BY name, season
;
quit;
Hi Could you please explain how sum function is working for categorical variables
(activity="x") implicitely defines a variable which is 1 when the condition is met and 0 otherwise.
The sum function will compute the sum for this implicit variable for all observations belonging
to the group specified by the GROUP BY clause, that is the sum per person/season.
So if sum(activity="x")>0, it means that at least one observation within the person/season group has
the variable activity equals to x.
Thanks for the explanation.
But the issue is that I have multiple activity columns ( activity 1- activity 5) and within that I have about 30 combinations of activities for which I have to make 8-10 flag variables. Is there any other way to do this so that I dont have to write an extremely long code?
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.