BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GreenTree1
Obsidian | Level 7

Hi,

 

I am dealing with data where one person can have multiple different observations (same ID, different observations). I have run an aggregation code for an event of interest so that if during any of the observations that event has occurred then it will show up as 1 otherwise 0. This shrinks the observations to the number of people by collapsing the IDs. so let's say 10 observations have collapsed into 5, as there are 10 people with same IDs who showed up twice on different days hence 10 observations. The problem is that once I run the code the data has already collapsed by IDs, and when I run the next code for another event of interest it starts off the collapsed data. How do I stop this from happening? How do I tell sas to look at original data when looking for another event of interest and not continue from the data collapsed by IDs? 

 

I am a newbie, would appreciate any help, thanks

 

data asksas;
set asksas;

by ID;

if first.ID then Sumvar=0;

Sumvar + var;

if last.ID;

drop var;

run;

*checking aggregation code;

proc freq data=asksas;
tables sumvar;
run;

*drop sumvar;
data asksas (drop = Sumvar);
set asksas;

if Sumavar > 0 then var= 1 ;
else var = 0 ;

run ;

proc freq data=asksas;
tables var;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@GreenTree1 wrote:
I think I have figured it out. I was using the data statement with set in such a way that it kept working on the modified data. I just changed it the set.orginial data and it worked.

 

 

Good.  Then mark your report of the resolution as the solution to this topic.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

9 REPLIES 9
ed_sas_member
Meteorite | Level 14

Hi @GreenTree1 

 

Could you please provide some sample data?

Thank you!

GreenTree1
Obsidian | Level 7

 

Thank you for your reply, here is what my data looks like

 

ID  date/oct   redcloths . whiteclothes    anydayred .  anydaywhite

1 .  19 .          1                   0

1 .  18             0                   1

2 .   15 .          1                   1

2 .   13 .          0 .                 0

 

let's say I am looking at ID 1 wearing red clothes on any date then ID1 will be assigned 1 and so forth. When I do this the data gets collapsed by IDs, in this case, goes from 4 to 2 observation, since ID1 and 2 both wore red on atleats of the two days. following this if I am trying to look for IDs who wore white clothes on any date, it starts off from the collapsed data ie 2 observation whereas I want it to start off from  the original 4 observations since wearing red and white clothes are independent events

GreenTree1
Obsidian | Level 7

sorry I meant 

 

since ID1 and 2 both wore red on at least one of the two days

ed_sas_member
Meteorite | Level 14

Hi @GreenTree1 

Thank you for the clarification, very helpful.

Just to be sure to well-understand your need, what would the output data look like?

GreenTree1
Obsidian | Level 7

 

orginal

ID  date/oct   redcloths . whiteclothes    anydayred .  anydaywhite

1 .  19 .          1                   0

1 .  18             0                   1

2 .   15 .          1                   1

2 .   13 .          0 .                 0

 

collapsed

ID  date/oct   redcloths . whiteclothes    anydayred .  anydaywhite

1 .  19 .          1                   0                        1

 

2 .   15 .          1                   1                        1 

 

Now when I try to run the code for "anydaywhite" it starts from the collapsed data of anydayred instead of the original data

 

 

 

 

ed_sas_member
Meteorite | Level 14

Hi @GreenTree1 

 

As variables redclothes and whiteclothes take the value 0 or 1:

- if the max value = 0, then it means that the person has never wore red or white

- if it is equal to 1, then it means that the person wore red or white at least once.

You can use the following code to retrieve this :

 

proc sql;
	select ID,
		   max(redcloths) as anydayred,
		   max(whiteclothes) as anydaywhite
	from asksas
	group by ID;
quit;

Then you can merge it with other columns such as date_oct but I am not sure it makes sense.

 

Did that answer you question?

GreenTree1
Obsidian | Level 7

so what my code ultimately does is codes 1(redclothes) and 0 (no red clothes) for red clothes worn on any day in the column of redclothes for that specific IDs and removes the observations for that same ID on which red clothes were not worn. Hence decreasing the observations. Once I proceed with the same steps for assiging 1/0 for whiteclothes, sas starts from the subset of data that has been collapsed by IDs for redclothes . for the next step I want it to start from the original data and look for IDs for whiteclothes  worn on any day

 

orginal

ID  date/oct   redcloths . whiteclothes    

1 .  19 .          1                   0

1 .  18             0                   1

2 .   15 .          1                   1

2 .   13 .          0 .                 0

 

this is where i run  the following code and its an intermediate step so the sample below may not be the correct representation.

data asksas;
set asksas;

by ID;

if first.ID then anydayred=0;

anydayred + redclothes;

if last.ID;

drop redclothes;

run;

collapsed

ID  date/oct    . whiteclothes   anydayred

1 .  19 .                            0                        1

2 .   15 .                            1                        1 

 

 

finally, if anydayred then redclothes is 1 and I get rid of the anydayred column because ultimately I am just looking at redclothes on any visit/observation.

and run this code

*drop anydayred;
data asksas (drop = anydayred);
set asksas;

if anydayred > 0 then redclothes= 1 ;
else redclothes = 0 ;

run ;

 

ID  date/oct   redcloths . whiteclothes   

1 .  19 .          1                   0                       

 

2 .   15 .          1                   1                       

 

Now when I attempt to do the same steps for whiteclothes, it starts off from the 2 observations above, whereas I want to start from the original data.

 

Hope this has clarified my question. This is a complicated concept and I am not sure if I am explaining it well.

 

 

GreenTree1
Obsidian | Level 7
I think I have figured it out. I was using the data statement with set in such a way that it kept working on the modified data. I just changed it the set.orginial data and it worked.
mkeintz
PROC Star

@GreenTree1 wrote:
I think I have figured it out. I was using the data statement with set in such a way that it kept working on the modified data. I just changed it the set.orginial data and it worked.

 

 

Good.  Then mark your report of the resolution as the solution to this topic.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 690 views
  • 0 likes
  • 3 in conversation