Removing observations based on other observations - SAS University Edition

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Removing observations based on other observations - SAS University Edition

[ Edited ]

I'm working with the IPUMS-USA data to try and get numbers for homelessness (specifically, doubled-up households). For simplicity's sake, there are two "columns" I'm concerned with - Serial and Poverty. (There's more, but if I can get this answered I can extend the answer to whatever else I need.)

 

For example, let's take a basic setup: (note: the real data set has over 3 million observations - I can't sanely pick through all of them manually - thanks for the catch, draycut! i forgot to include this relatively important note)

 

 

Serial    P
111       81
111       200
111       200
111       15 
111       15
112       99
112       99
112       24

 

 

From this data, I'd want to include anyone with the serial (S) 112, since everyone has an income below the poverty line. I don't, however, want to include -anyone- with the S of 111, since they're one "household" wihose primary members are above the poverty level. I figure I can add more conditionals as needed, but this is the basic framework of what I need - how can I make sure I can remove everyone with a specific Serial value if  their poverty value goes over what I'm looking for?


Accepted Solutions
Solution
‎11-23-2016 10:55 AM
Grand Advisor
Posts: 17,288

Re: Removing observations based on other observations - SAS University Edition

 Find the ID's that you want to filter out.

 

 

proc sql;

create table remove as

select distinct serial 

where P<50; *You need to build this rule;

 

create table filtered as

select *

from have

where serial not in (

     Select serial from remove);

quit;

View solution in original post


All Replies
Super Contributor
Posts: 498

Re: Removing observations based on other observations - SAS University Edition

Something like this? 

 

I have assumed a poverty value of 50 Smiley Happy

 

data have;
input Serial P;
datalines;
111       81
111       200
111       200
111       15 
111       15
112       99
112       99
112       24
;

data want;
   set have;
   where Serial = 112 and P > 50;
run;
Occasional Contributor
Posts: 8

Re: Removing observations based on other observations - SAS University Edition

Except I'm going to be wanting to do this over a data set of about 3 million, and I don't know which serials I'll actually want to keep, yes that's the basic gist of what I want to do. I'm thinking I basically need to go over the data once and put serials I need to remove into an array, then go over the data again and remove all the values I caught originally...?

Super Contributor
Posts: 498

Re: Removing observations based on other observations - SAS University Edition

Well, is there something that uniquely identifies what serials you want to keep / leave out? Smiley Happy

Occasional Contributor
Posts: 8

Re: Removing observations based on other observations - SAS University Edition

Poverty values above 125, various other identifiers. The problem with the suggested solutions is, even if I do a filter on the sample dataset I provided for poverty < 125, it leaves individuals of serial 111 in that I don't want to keep in. Again, manually going through and figuring out which households had a value above this so I can manually tell SAS to remove them isn't practical, given the >3 million observations.

Super Contributor
Posts: 263

Re: Removing observations based on other observations - SAS University Edition

data want1 want2;
   set have;
   IF Serial = 112 and P > 50 then output want1;
else output want2; run;

 

Occasional Contributor
Posts: 8

Re: Removing observations based on other observations - SAS University Edition

See above replies - basically if the data set was as small as the sample I posted, this method would work, but I'm dealing with over 3 million observations and can't manually go through and determine which serial numbers to remove, nor can I just remove by poverty because it could keep individuals in the same serial number that have a lower poverty value.

Super Contributor
Posts: 263

Re: Removing observations based on other observations - SAS University Edition

You said in an earlier post you don't know which serials  you'll actually want to keep. How will you determine which observations you want to keep?

Occasional Contributor
Posts: 8

Re: Removing observations based on other observations - SAS University Edition

If anyone in the "family" (serial group) has a poverty value over 125, I need to remove that serial group. 

If anyone has certain other values (IPUMS has sections in the data indicating how someone is related to the family, if the housing is group quarters, if they're over or under certain ages, etc) I can remove them on an individual basis, so the solution of just going through and auto-removing those works.

It doesn't work when I need to remove an entire set of observations when even one meets a certain qualification.

Occasional Contributor
Posts: 8

Re: Removing observations based on other observations - SAS University Edition

Here's an example of the actual code I'm using for the base program. Sorry I didn't post this until now.

 

/*This is the main ACS Program that gives the total number of people
living doubled up according to the ACS*/

data acs_2015.ipumsdata;
set acs_2015.md_2015;
where city=1190 and GQ<3 and poverty<126;
run;     /*Tells SAS to include only Chicago observations and exclude Group Quarters*/


data work.ipumsdata2;    /*Tells SAS to exclude household heads and spouses*/
set acs_2015.ipumsdata;
where relate>2;
run;

data work.ipumsdata2; /*Excludes foster children, roommates/housemates, unmarried partners, and roomers/boarders*/
set work.ipumsdata2;
if related=1241 or related=1242 or related=1114 or related=1115 then delete;
run;

data work.ipumsdata2; /*Excludes any relatives over 65*/
set work.ipumsdata2;
if relate<11 and age>65 then delete;
run;

data work.ipumsdata2; /*Excludes children of householder 24 or younger*/
set work.ipumsdata2;
if relate=3 and age<25 then delete;
run;

data work.ipumsdata2; /*Excludes children of householder who don't have their own children*/
set work.ipumsdata2;
if relate=3 and nchild=0 then delete;
run;

proc sql;
create table work.query AS
select city, poverty, perwt, relate, related, serial, age, nchild, school from work.ipumsdata2;
run;

proc print data=work.query;
sum perwt;
run;
Super Contributor
Posts: 263

Re: Removing observations based on other observations - SAS University Edition

You could consolidate your statements like so. You'll have fewer data steps to work with.

 

/*This is the main ACS Program that gives the total number of people
living doubled up according to the ACS*/

data acs_2015.ipumsdata;
set acs_2015.md_2015;
where city=1190 and GQ<3 and poverty<126;
run; /*Tells SAS to include only Chicago observations and exclude Group Quarters*/
data work.ipumsdata2;
set acs_2015.ipumsdata;
/*Tells SAS to exclude household heads and spouses*/
if relate>2;
/*Excludes foster children, roommates/housemates, unmarried partners, and roomers/boarders*/
if related in (1241,1242,1114,1115) then delete;
/*Excludes children of householder 24 or younger*/
if relate=3 and age<25 then delete;
/*Excludes children of householder who don't have their own children*/
if relate=3 and nchild=0 then delete;
run;

Occasional Contributor
Posts: 8

Re: Removing observations based on other observations - SAS University Edition

I'll keep this in mind, but it doesn't answer the initial question.

Solution
‎11-23-2016 10:55 AM
Grand Advisor
Posts: 17,288

Re: Removing observations based on other observations - SAS University Edition

 Find the ID's that you want to filter out.

 

 

proc sql;

create table remove as

select distinct serial 

where P<50; *You need to build this rule;

 

create table filtered as

select *

from have

where serial not in (

     Select serial from remove);

quit;

Grand Advisor
Posts: 17,288

Re: Removing observations based on other observations - SAS University Edition

Just a quick FYI you may run into limitations with SAS UE in terms of dealing with 3 million records.

Occasional Contributor
Posts: 8

Re: Removing observations based on other observations - SAS University Edition

What sort of limitations?

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 21 replies
  • 435 views
  • 0 likes
  • 8 in conversation