BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Obs Household ID Sex
1 0001 1
2 0001 1
3 0002 1
4 0002 2
5 0003 1
6 0004 1

I am a beginning SAS user that needs help writing a SAS data step code. Above is a small sample of the data I am working with where each observation represents an individual. I would like to write a code that keeps observations only for those individuals that have the same household ID # as well as the same sex as another individual in the data set (For example I would want to keep observations 1 and 2 above). Is this possible? If so, could someone please help me out with how you would go about writing the code to do this. Thanks in advance!
3 REPLIES 3
darrylovia
Quartz | Level 8
DaveC,

Yes there is a way and with SAS there are multiple ways to do things. Below is a method, not neccessarily the best but for a new to SAS user this will show you some of the data processing power of SAS.

Also read the documentation again and again and again. When I code in SAS I always have the documentation page up in my browser.
http://support.sas.com/onlinedoc/913/docMainpage.jsp
I would check out in the documentation
Base SAS -> SAS Language Reference: Concepts -> DATA Step Concepts


Now for your solution.

data One;
input Household_ID $ Sex;
datalines;
0001 1
0001 1
0002 1
0002 2
0003 1
0004 1
;
run;

proc sort data=One;
by Household_ID Sex;
run;

data Two;
set One;
by Household_ID Sex;
if first.sex ne last.sex;
run;

* for extra fun...;
data Behind_the_scenes;
set one;
by Household_ID sex;

/* first. = 1 for the first occurrence of the value
last. = 1 for the last occurrence of the value
so in your example: obs1 and 2 are duplicates
Houshold_ID =0001
for obs=1, Houshold_ID =0001 sex=1, first.household_id=1 and last.household_id=0
first.sex =1 and last.sex=0
for obs=2, Houshold_ID =0001 sex=1, first.household_id=0 and last.household_id=1
first.sex =0 and last.sex=1

..
Notices:
for obs=5, Houshold_ID =0003 sex=1, first.household_id=1 and last.household_id=1
first.sex =1 and last.sex=1
since there is only 1 record for household_id 0003 the first and last occurence is the same record hence first.household_id=last.household_id

*/
first_Household_ID=first.Household_ID;
last_Household_ID=last.Household_ID;
first_sex=first.sex;
last_sex=last.sex;

run;
deleted_user
Not applicable
First, welcome to the world of SAS.
My favorite saying about SAS is
"SAS is the greatest data processing language on the planet".

Now, from a different perspective to Darrylovia's, and my favorite one.

SAS is different from other programming languages. It is data centric, not process and control centric. What that means is that the mindset should look at "I have some data and I want to transform it ..." or "I need to count and sum and ..." SAS normally handles the issues of opening, closing and looping for you.

Now your problem is actually more advanced than basic.
Level 1 SAS problems, and most data manipulations, involve doing things within a single record or observation, including selecting out records/observations through some conditional requirement.
Level 2 SAS problems want to sum a column or count things, thus spanning multiple observations, but simply. These generally require a "RETAIN" statement to accomplish. These problems usually introduce the user/programmer to many of SAS's proc's that already accomplish what you may want to do.
Level 3 SAS problems involve comparing data from prior observations with the current observation. I call this level 3 because it's not the easiest thing to do, and is easy to get lost.

Darrylovia's solution is very clever and not obvious, especially for a novice.

Standard is the initial sort to group the data into a workable order. This is a standard thing for SAS programming/analysis.

The use of first. and last. is bread and butter for level 2 programming.

But his complete understanding of what last. and first. means and a little creative stretch allowed him to apply them in his solution for the level 3 problem. Someone else may make use of RETAINed values (a less sophisticated solution). Another person may be "clever" with using the LAG function. Another method may simply sort the data with the NODUP option into a "unique_data" dataset and then use proc sql to take the difference between the original and unique_data to produce a dataset similar, but not identical to your request; but, it may work well for the required need.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1371 views
  • 0 likes
  • 2 in conversation