Hi,
I am dealing with data where one person can have multiple different observations (same ID, different observations). I have run an aggregation code for an event of interest so that if during any of the observations that event has occurred then it will show up as 1 otherwise 0. This shrinks the observations to the number of people by collapsing the IDs. so let's say 10 observations have collapsed into 5, as there are 10 people with same IDs who showed up twice on different days hence 10 observations. The problem is that once I run the code the data has already collapsed by IDs, and when I run the next code for another event of interest it starts off the collapsed data. How do I stop this from happening? How do I tell sas to look at original data when looking for another event of interest and not continue from the data collapsed by IDs?
I am a newbie, would appreciate any help, thanks
data asksas;
set asksas;
by ID;
if first.ID then Sumvar=0;
Sumvar + var;
if last.ID;
drop var;
run;
*checking aggregation code;
proc freq data=asksas;
tables sumvar;
run;
*drop sumvar;
data asksas (drop = Sumvar);
set asksas;
if Sumavar > 0 then var= 1 ;
else var = 0 ;
run ;
proc freq data=asksas;
tables var;
run;
@GreenTree1 wrote:
I think I have figured it out. I was using the data statement with set in such a way that it kept working on the modified data. I just changed it the set.orginial data and it worked.
Good. Then mark your report of the resolution as the solution to this topic.
Thank you for your reply, here is what my data looks like
ID date/oct redcloths . whiteclothes anydayred . anydaywhite
1 . 19 . 1 0
1 . 18 0 1
2 . 15 . 1 1
2 . 13 . 0 . 0
let's say I am looking at ID 1 wearing red clothes on any date then ID1 will be assigned 1 and so forth. When I do this the data gets collapsed by IDs, in this case, goes from 4 to 2 observation, since ID1 and 2 both wore red on atleats of the two days. following this if I am trying to look for IDs who wore white clothes on any date, it starts off from the collapsed data ie 2 observation whereas I want it to start off from the original 4 observations since wearing red and white clothes are independent events
sorry I meant
since ID1 and 2 both wore red on at least one of the two days
Hi @GreenTree1
Thank you for the clarification, very helpful.
Just to be sure to well-understand your need, what would the output data look like?
orginal
ID date/oct redcloths . whiteclothes anydayred . anydaywhite
1 . 19 . 1 0
1 . 18 0 1
2 . 15 . 1 1
2 . 13 . 0 . 0
collapsed
ID date/oct redcloths . whiteclothes anydayred . anydaywhite
1 . 19 . 1 0 1
2 . 15 . 1 1 1
Now when I try to run the code for "anydaywhite" it starts from the collapsed data of anydayred instead of the original data
Hi @GreenTree1
As variables redclothes and whiteclothes take the value 0 or 1:
- if the max value = 0, then it means that the person has never wore red or white
- if it is equal to 1, then it means that the person wore red or white at least once.
You can use the following code to retrieve this :
proc sql;
select ID,
max(redcloths) as anydayred,
max(whiteclothes) as anydaywhite
from asksas
group by ID;
quit;
Then you can merge it with other columns such as date_oct but I am not sure it makes sense.
Did that answer you question?
so what my code ultimately does is codes 1(redclothes) and 0 (no red clothes) for red clothes worn on any day in the column of redclothes for that specific IDs and removes the observations for that same ID on which red clothes were not worn. Hence decreasing the observations. Once I proceed with the same steps for assiging 1/0 for whiteclothes, sas starts from the subset of data that has been collapsed by IDs for redclothes . for the next step I want it to start from the original data and look for IDs for whiteclothes worn on any day
orginal
ID date/oct redcloths . whiteclothes
1 . 19 . 1 0
1 . 18 0 1
2 . 15 . 1 1
2 . 13 . 0 . 0
this is where i run the following code and its an intermediate step so the sample below may not be the correct representation.
data asksas;
set asksas;
by ID;
if first.ID then anydayred=0;
anydayred + redclothes;
if last.ID;
drop redclothes;
run;
collapsed
ID date/oct . whiteclothes anydayred
1 . 19 . 0 1
2 . 15 . 1 1
finally, if anydayred then redclothes is 1 and I get rid of the anydayred column because ultimately I am just looking at redclothes on any visit/observation.
and run this code
*drop anydayred;
data asksas (drop = anydayred);
set asksas;
if anydayred > 0 then redclothes= 1 ;
else redclothes = 0 ;
run ;
ID date/oct redcloths . whiteclothes
1 . 19 . 1 0
2 . 15 . 1 1
Now when I attempt to do the same steps for whiteclothes, it starts off from the 2 observations above, whereas I want to start from the original data.
Hope this has clarified my question. This is a complicated concept and I am not sure if I am explaining it well.
@GreenTree1 wrote:
I think I have figured it out. I was using the data statement with set in such a way that it kept working on the modified data. I just changed it the set.orginial data and it worked.
Good. Then mark your report of the resolution as the solution to this topic.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.