I have a data that has activities each person performs during a season -
Name | Season | Activity |
A | 1 | x |
A | 1 | y |
A | 1 | z |
A | 1 | t |
A | 2 | y |
A | 2 | u |
A | 3 | x |
B | 1 | u |
B | 1 | v |
B | 2 | r |
B | 2 | x |
B | 2 | y |
B | 3 | k |
B | 3 | j |
If a person has activity x & y in season 1 then I want all his rows for the first season to have a flag yes-
Name | Season | Disease | Flag |
A | 1 | x | Yes |
A | 1 | y | Yes |
A | 1 | z | Yes |
A | 1 | t | Yes |
A | 2 | y | No |
A | 2 | u | No |
A | 3 | x | No |
similarly I have different disease combination for different seasons and activity.
I am unable to write an efficient code for this.
The code I have come up with is --
Create table activity_x as
Select a.*
From table1 as a join (Select distinct person, season from table1 where activity= 'x') as b
on a.person = b.person and a.season= b.season
order by person, season;
Quit;
This would give me a data-set with all information for person and seasons where atleast one activity in the season is x.
Similarly I would repeat for activity y and get a dataset.
Then find inner join of the two.
But issue is that the activity code combinations are very complex
Any one of this list | and | Any one of this list |
a | r | |
b+c | m | |
d+c | n | |
s+c | o |
Is there any better way to do this?
Hello,
data have;
input name $ season $ activity $;
cards;
A 1 x
A 1 y
A 1 z
A 1 t
A 2 y
A 2 u
A 3 x
B 1 u
B 1 v
B 2 r
B 2 x
B 2 y
B 3 k
B 3 j
;
run;
proc sql noprint;
CREATE TABLE want AS
SELECT *, CASE WHEN (sum(activity="x") AND sum(activity="y")) THEN "YES" ELSE "NO" END AS Flag format=$3.
FROM have
GROUP BY name, season
;
quit;
Hi Could you please explain how sum function is working for categorical variables
(activity="x") implicitely defines a variable which is 1 when the condition is met and 0 otherwise.
The sum function will compute the sum for this implicit variable for all observations belonging
to the group specified by the GROUP BY clause, that is the sum per person/season.
So if sum(activity="x")>0, it means that at least one observation within the person/season group has
the variable activity equals to x.
Thanks for the explanation.
But the issue is that I have multiple activity columns ( activity 1- activity 5) and within that I have about 30 combinations of activities for which I have to make 8-10 flag variables. Is there any other way to do this so that I dont have to write an extremely long code?
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.