I have a dataset where observations are identified by two factors: geographic area, and grid. Geographic area is represented by three variables: 1. A 3-letter code for that geographic area; 2. A 4-digit numerical ID for that geographic area; and 3. The full name of the geographic area (i.e. “Summer County”). The grid is represented by a 4 or 5-digit number. The grid number is really the crucial ID variable, with all other information (case count, relative risk, etc) being linked to the grid number. Most of the time, a grid falls cleanly within a geographic area. However, occasionally, the grid overlaps two adjacent geographic areas. When this occurs, in the current dataset, the information (case count, etc) simply repeats, so that there are two identical rows, differing only in that one row specifies one geographic area, and the other row specifies the other geographic area. For example, if grid #54321 falls into geographic area ABC *and* geographic area XYZ, while grid #11111 fits in only 1 geographic area, it looks like: Grid # Geographic Code Geographic ID Geographic Name Case count RR etc 54321 ABC 1000 Summer County 3 1.5 54321 XYZ 1001 Winter County 3 1.5 11111 DEF 1002 Fall County 2 1 Note that the critical information, such as case count and RR, is the same for a particular grid, even if that grid covers two geographic areas. I created this file by merging a SAS dataset listing the case count, RR, etc per grid number (after using SAS to calculate all that) with a file specifying the geographic code(s), ID(s), and name(s) that correspond with each grid number. The problem is that it is confusing for the people who will receive this data to recognize that a grid includes two different geographic areas if the information is on two rows. What we would like to do is combine the information into one row, for example: Grid # Geographic Code Geographic ID Geographic Name Case count RR etc 54321 ABC/XYZ 1000/1001 Summer County/Winter County 3 1.5 11111 DEF 1002 Fall County 2 1 Does anyone have any idea on how I would go about accomplishing this? If needed, I could probably afford to drop the geographic ID variable and the geographic name variables, and somehow reintroduce those later – it is most imperative that the geographic codes get combined. Currently, my only thought is to try to find out of all the grids (there are approximately 2400), which ones fall into two geographic areas, create a master file in Excel that lists these with a combined geographic name, geographic ID, and geographic code, and then merge that with the file containing RR, case count, etc per grid number, instead of using the original file, but I am wondering if there is some way to do it all within SAS instead. Please note (in case it is relevant) that this is a very simplified version - in reality, most geographic areas have multiple grids assigned to them. Thank you very much for any help – I sincerely appreciate it.
... View more