Solved: stop aggregation code

GreenTree1 · Posted 01-02-2020 12:16 PM

Hi,

I am dealing with data where one person can have multiple different observations (same ID, different observations). I have run an aggregation code for an event of interest so that if during any of the observations that event has occurred then it will show up as 1 otherwise 0. This shrinks the observations to the number of people by collapsing the IDs. so let's say 10 observations have collapsed into 5, as there are 10 people with same IDs who showed up twice on different days hence 10 observations. The problem is that once I run the code the data has already collapsed by IDs, and when I run the next code for another event of interest it starts off the collapsed data. How do I stop this from happening? How do I tell sas to look at original data when looking for another event of interest and not continue from the data collapsed by IDs?

I am a newbie, would appreciate any help, thanks

data asksas;
set asksas;

by ID;

if first.ID then Sumvar=0;

Sumvar + var;

if last.ID;

drop var;

run;

*checking aggregation code;

proc freq data=asksas;
tables sumvar;
run;

*drop sumvar;
data asksas (drop = Sumvar);
set asksas;

if Sumavar > 0 then var= 1 ;
else var = 0 ;

run ;

proc freq data=asksas;
tables var;
run;

mkeintz · Posted 01-02-2020 04:57 PM

@GreenTree1 wrote:
I think I have figured it out. I was using the data statement with set in such a way that it kept working on the modified data. I just changed it the set.orginial data and it worked.

Good. Then mark your report of the resolution as the solution to this topic.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

ed_sas_member · Posted 01-02-2020 12:18 PM

Hi @GreenTree1

Could you please provide some sample data?

Thank you!

GreenTree1 · Posted 01-02-2020 12:29 PM

Thank you for your reply, here is what my data looks like

ID date/oct redcloths . whiteclothes anydayred . anydaywhite

1 . 19 . 1 0

1 . 18 0 1

2 . 15 . 1 1

2 . 13 . 0 . 0

let's say I am looking at ID 1 wearing red clothes on any date then ID1 will be assigned 1 and so forth. When I do this the data gets collapsed by IDs, in this case, goes from 4 to 2 observation, since ID1 and 2 both wore red on atleats of the two days. following this if I am trying to look for IDs who wore white clothes on any date, it starts off from the collapsed data ie 2 observation whereas I want it to start off from the original 4 observations since wearing red and white clothes are independent events

GreenTree1 · Posted 01-02-2020 12:30 PM

sorry I meant

since ID1 and 2 both wore red on at least one of the two days

ed_sas_member · Posted 01-02-2020 12:48 PM

Hi @GreenTree1

Thank you for the clarification, very helpful.

Just to be sure to well-understand your need, what would the output data look like?

GreenTree1 · Posted 01-02-2020 12:54 PM

orginal

ID date/oct redcloths . whiteclothes anydayred . anydaywhite

1 . 19 . 1 0

1 . 18 0 1

2 . 15 . 1 1

2 . 13 . 0 . 0

collapsed

ID date/oct redcloths . whiteclothes anydayred . anydaywhite

1 . 19 . 1 0 1

2 . 15 . 1 1 1

Now when I try to run the code for "anydaywhite" it starts from the collapsed data of anydayred instead of the original data

ed_sas_member · Posted 01-02-2020 01:02 PM

Hi @GreenTree1

As variables redclothes and whiteclothes take the value 0 or 1:

- if the max value = 0, then it means that the person has never wore red or white

- if it is equal to 1, then it means that the person wore red or white at least once.

You can use the following code to retrieve this :

proc sql;
	select ID,
		   max(redcloths) as anydayred,
		   max(whiteclothes) as anydaywhite
	from asksas
	group by ID;
quit;

Then you can merge it with other columns such as date_oct but I am not sure it makes sense.

Did that answer you question?

GreenTree1 · Posted 01-02-2020 04:03 PM

so what my code ultimately does is codes 1(redclothes) and 0 (no red clothes) for red clothes worn on any day in the column of redclothes for that specific IDs and removes the observations for that same ID on which red clothes were not worn. Hence decreasing the observations. Once I proceed with the same steps for assiging 1/0 for whiteclothes, sas starts from the subset of data that has been collapsed by IDs for redclothes . for the next step I want it to start from the original data and look for IDs for whiteclothes worn on any day

orginal

ID date/oct redcloths . whiteclothes

1 . 19 . 1 0

1 . 18 0 1

2 . 15 . 1 1

2 . 13 . 0 . 0

this is where i run the following code and its an intermediate step so the sample below may not be the correct representation.

data asksas;
set asksas;

by ID;

if first.ID then anydayred=0;

anydayred + redclothes;

if last.ID;

drop redclothes;

run;

collapsed

ID date/oct . whiteclothes anydayred

1 . 19 . 0 1

2 . 15 . 1 1

finally, if anydayred then redclothes is 1 and I get rid of the anydayred column because ultimately I am just looking at redclothes on any visit/observation.

and run this code

*drop anydayred;
data asksas (drop = anydayred);
set asksas;

if anydayred > 0 then redclothes= 1 ;
else redclothes = 0 ;

run ;

ID date/oct redcloths . whiteclothes

1 . 19 . 1 0

2 . 15 . 1 1

Now when I attempt to do the same steps for whiteclothes, it starts off from the 2 observations above, whereas I want to start from the original data.

Hope this has clarified my question. This is a complicated concept and I am not sure if I am explaining it well.

GreenTree1 · Posted 01-02-2020 04:29 PM

I think I have figured it out. I was using the data statement with set in such a way that it kept working on the modified data. I just changed it the set.orginial data and it worked.

mkeintz · Posted 01-02-2020 04:57 PM

@GreenTree1 wrote:
I think I have figured it out. I was using the data statement with set in such a way that it kept working on the modified data. I just changed it the set.orginial data and it worked.

Good. Then mark your report of the resolution as the solution to this topic.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

stop aggregation code

Re: stop aggregation code

Re: stop aggregation code

Re: stop aggregation code

Re: stop aggregation code

Re: stop aggregation code

Re: stop aggregation code

Re: stop aggregation code

Re: stop aggregation code

Re: stop aggregation code

Re: stop aggregation code

SAS Innovate 2025: Call for Content