I am a beginning SAS user that needs help writing a SAS data step code. Above is a small sample of the data I am working with where each observation represents an individual. I would like to write a code that keeps observations only for those individuals that have the same household ID # as well as the same sex as another individual in the data set (For example I would want to keep observations 1 and 2 above). Is this possible? If so, could someone please help me out with how you would go about writing the code to do this. Thanks in advance!
Yes there is a way and with SAS there are multiple ways to do things. Below is a method, not neccessarily the best but for a new to SAS user this will show you some of the data processing power of SAS.
Also read the documentation again and again and again. When I code in SAS I always have the documentation page up in my browser. http://support.sas.com/onlinedoc/913/docMainpage.jsp
I would check out in the documentation
Base SAS -> SAS Language Reference: Concepts -> DATA Step Concepts
by Household_ID Sex;
if first.sex ne last.sex;
* for extra fun...;
by Household_ID sex;
/* first. = 1 for the first occurrence of the value
last. = 1 for the last occurrence of the value
so in your example: obs1 and 2 are duplicates
for obs=1, Houshold_ID =0001 sex=1, first.household_id=1 and last.household_id=0
first.sex =1 and last.sex=0
for obs=2, Houshold_ID =0001 sex=1, first.household_id=0 and last.household_id=1
first.sex =0 and last.sex=1
for obs=5, Houshold_ID =0003 sex=1, first.household_id=1 and last.household_id=1
first.sex =1 and last.sex=1
since there is only 1 record for household_id 0003 the first and last occurence is the same record hence first.household_id=last.household_id
First, welcome to the world of SAS.
My favorite saying about SAS is
"SAS is the greatest data processing language on the planet".
Now, from a different perspective to Darrylovia's, and my favorite one.
SAS is different from other programming languages. It is data centric, not process and control centric. What that means is that the mindset should look at "I have some data and I want to transform it ..." or "I need to count and sum and ..." SAS normally handles the issues of opening, closing and looping for you.
Now your problem is actually more advanced than basic.
Level 1 SAS problems, and most data manipulations, involve doing things within a single record or observation, including selecting out records/observations through some conditional requirement.
Level 2 SAS problems want to sum a column or count things, thus spanning multiple observations, but simply. These generally require a "RETAIN" statement to accomplish. These problems usually introduce the user/programmer to many of SAS's proc's that already accomplish what you may want to do.
Level 3 SAS problems involve comparing data from prior observations with the current observation. I call this level 3 because it's not the easiest thing to do, and is easy to get lost.
Darrylovia's solution is very clever and not obvious, especially for a novice.
Standard is the initial sort to group the data into a workable order. This is a standard thing for SAS programming/analysis.
The use of first. and last. is bread and butter for level 2 programming.
But his complete understanding of what last. and first. means and a little creative stretch allowed him to apply them in his solution for the level 3 problem. Someone else may make use of RETAINed values (a less sophisticated solution). Another person may be "clever" with using the LAG function. Another method may simply sort the data with the NODUP option into a "unique_data" dataset and then use proc sql to take the difference between the original and unique_data to produce a dataset similar, but not identical to your request; but, it may work well for the required need.