BookmarkSubscribeRSS Feed
krisk
Calcite | Level 5

Hello SAS experts, 

 

I'm using a very large dataset of births that I want to separate into either rural or urban zip codes.  There are about 500 zip codes total and are all in my data set under the variable Res_Zip.  I believe I need to make two new variables, i.e. rural zip and urban zip and tell sas which codes go where?   However, I also need to apply some exclusion criteria, for example, I want to exclude mothers with gestational diabetes and multiple births. Not sure where to start.  

 

Thanks, 

Kris

3 REPLIES 3
japelin
Rhodochrosite | Level 12

First of all, it seems that the specifications need to be clarified and put into writing before programming in SAS.
1) What are the rules for separating Res_Zip into rural zip and urban zip?
Identify by number of characters, by separator, by number, etc?


2) Clarify exclusion criteria.
Number of pieces, contents, data stored, etc?

 

3) What should be done with the excluded data?
Delete or keep, etc?

 

Once these are determined, you can proceed to the programming stage.

 

If you don't know the specific programming method such as functions, you can ask us again.
Don't forget to attach some data (sample or part of real data).

 

We don't know if you can program but just don't have your specifications together, or if you don't know everything and are at a loss in front of the data.

 

krisk
Calcite | Level 5

I'm obviously struggling here.  Took SAS intro class years ago and now I'm working with a very large data set  (900K) involving birth certificate data with over 200 variables. I want to make cross freq tables with exposure being rural vs urban zip code and outcome being birth weight <3000 gr vs >3000gr. However to do that I would need to make new variables of rural zips vs urban zips (not sure how to do this).  I also want to exclude multiple births (plurality >1) as well as preterm babies (EstGest =/<37 weeks).

 

1. I compiled a list of rural vs. urban zip codes and need to designate which zip belongs to which category. The raw data shows them in column format by 5 digit code.  They are character variables with length of 5

 

2. Exclusion of multiple births (plurality >1) as well as preterm babies (EstGest =/<37 weeks). Character variable length of 5 

 

3. I don't want the multiple births or preterm numbers for my current data set but I'm wary of deleting them. 

 

I can't run the data set because it's so large and crashes my sas every time. 

 

To sum up, I am at a loss in front of the data. 

ballardw
Super User

You probably only want one variable that has values to indicate "Rural" or "Urban".

 

Definition criteria is critical.

And just to throw a spanner into the works, what time period does your data represent? If the criteria ever involve something like population density then that changes over time and you need to consider how that changing time element is interpreted. This is more likely to be important if your data spans three or more years.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 698 views
  • 2 likes
  • 3 in conversation