About kmaths

kmaths · ‎03-17-2015

Thank you, that works perfectly. Much appreciated!

kmaths · ‎03-17-2015

I have a dataset with dates in one column. I have been using functions in Excel to extract the year and the month/day as two separate columns from the full date prior to importing the dataset into SAS, so the dataset looks like this once imported into SAS: Full_Date Year Date Other Information 01JAN2012 2012 01/01 XYZ 03JUL2014 2014 07/03 ABC 26FEB2015 2015 02/26 DEF We would like to instead be able to extract the year and date information and create these variables from the full_date directly in SAS after importing the file containing just the full date (and other info), without needing to first do the manipulation in Excel. So we would like to import the following, and make it look like the above dataset, all in SAS: Full_Date Other Information 01JAN2012 XYZ 03JUL2014 ABC 26FEB2015 DEF I know SAS has a YEAR function, but does a function exist to extract month/date in this fashion too? Thank you for any help.

kmaths · ‎02-27-2015

Thank you for the input everyone. The solution from PGStats worked really simply and gave the result I was looking for - thank you so much for your help!

kmaths · ‎02-26-2015

I have a dataset where observations are identified by two factors: geographic area, and grid. Geographic area is represented by three variables: 1. A 3-letter code for that geographic area; 2. A 4-digit numerical ID for that geographic area; and 3. The full name of the geographic area (i.e. “Summer County”). The grid is represented by a 4 or 5-digit number. The grid number is really the crucial ID variable, with all other information (case count, relative risk, etc) being linked to the grid number. Most of the time, a grid falls cleanly within a geographic area. However, occasionally, the grid overlaps two adjacent geographic areas. When this occurs, in the current dataset, the information (case count, etc) simply repeats, so that there are two identical rows, differing only in that one row specifies one geographic area, and the other row specifies the other geographic area. For example, if grid #54321 falls into geographic area ABC *and* geographic area XYZ, while grid #11111 fits in only 1 geographic area, it looks like: Grid # Geographic Code Geographic ID Geographic Name Case count RR etc 54321 ABC 1000 Summer County 3 1.5 54321 XYZ 1001 Winter County 3 1.5 11111 DEF 1002 Fall County 2 1 Note that the critical information, such as case count and RR, is the same for a particular grid, even if that grid covers two geographic areas. I created this file by merging a SAS dataset listing the case count, RR, etc per grid number (after using SAS to calculate all that) with a file specifying the geographic code(s), ID(s), and name(s) that correspond with each grid number. The problem is that it is confusing for the people who will receive this data to recognize that a grid includes two different geographic areas if the information is on two rows. What we would like to do is combine the information into one row, for example: Grid # Geographic Code Geographic ID Geographic Name Case count RR etc 54321 ABC/XYZ 1000/1001 Summer County/Winter County 3 1.5 11111 DEF 1002 Fall County 2 1 Does anyone have any idea on how I would go about accomplishing this? If needed, I could probably afford to drop the geographic ID variable and the geographic name variables, and somehow reintroduce those later – it is most imperative that the geographic codes get combined. Currently, my only thought is to try to find out of all the grids (there are approximately 2400), which ones fall into two geographic areas, create a master file in Excel that lists these with a combined geographic name, geographic ID, and geographic code, and then merge that with the file containing RR, case count, etc per grid number, instead of using the original file, but I am wondering if there is some way to do it all within SAS instead. Please note (in case it is relevant) that this is a very simplified version - in reality, most geographic areas have multiple grids assigned to them. Thank you very much for any help – I sincerely appreciate it.

kmaths · ‎01-26-2015

Thank you Reeza, datasp, Xia, and Loko! I truly appreciate all your very helpful answers. I ended up using the second solution proposed by datasp, but I'm sure any of these would work well. It's great to see the different ways this can be achieved. Thanks again for the help!

kmaths · ‎01-25-2015

I currently have two datasets. In each dataset, observations are organized by two variables: date (in the form of day/month) and a 3-digit alphanumeric code. For each date/code combo there is a count (usually either 0 or 1). There are 7 days in the range (say Jul. 1-7). One dataset has observations from 2014, and the other dataset has a combined average observation from 2012 and 2013, but I have removed the year from the date so that I will be able to merge the datasets. Some codes have no counts for a given day in one or the other of the datasets because that code simply wasn't used on a given day; and some of the possible codes (of which there are 516, contained in a finite list in another spreadsheet) were not used in one or the other of the datasets at all. When a code from the master list wasn't originally used in that 7-day period, I used proc sql to insert those missing codes into the dataset (thanks to help provided in a previous thread here). For such codes, there is only one observation with a missing (.) date and a count of 0. What I want to do is: A) combine the two datasets, by code and date; and B) somehow fill in missing days so that each code has an observation for each of the 7 days (Jul. 1-7), with the count being 0 if there was originally no observation. An example of the current structure of Dataset 1 is: Code Date Observed Count (2014) 1X1 07/01 1 1X1 07/03 1 1X1 07/06 0 2Y3 . 0 3J8 07/02 1 3J8 07/03 0 An example of the current structure of Dataset 2 is: Code Date Observed Count (2013/2012 Avg) 1X1 07/01 0 1X1 07/02 1 1X1 07/04 0 2Y3 07/01 1 3J8 . 0 What I would like the final dataset to look like is: Code Date Observed Count (2014) Observed Count (2013/2012 Avg) 1X1 07/01 1 0 1X1 07/02 0 1 1X1 07/03 1 0 1X1 07/04 0 0 1X1 07/05 0 0 1X1 07/06 0 0 1X1 07/07 0 0 2Y3 07/01 0 1 2Y3 07/02 0 0 2Y3 07/03 0 0 2Y3 07/04 0 0 2Y3 07/05 0 0 2Y3 07/06 0 0 2Y3 07/07 0 0 3J8 07/01 0 0 3J8 07/02 1 0 3J8 07/03 0 0 3J8 07/04 0 0 3J8 07/05 0 0 3J8 07/06 0 0 3J8 07/07 0 0 I am not sure how to go about combining and filling in gaps in the datasets in this manner. I did find a somewhat similar example using PROC EXPAND online, but wasn't sure how to adapt it to this scenario. Any guidance on how to achieve this task would be sincerely appreciated. Thank you!

kmaths · ‎01-20-2015

Thank you very much for your help everyone! The code you provided works well, PGStats, and thank you for the syntax for obtaining a list of the unused codes, Marina - that is an excellent way to keep tabs on them and see if it's working as expected. I sincerely appreciate the assistance!

kmaths · ‎01-19-2015

Thank you PGStats, that sounds promising! Currently using this code I am receiving an error, so I think I must be inputting my dataset names incorrectly. In the last line before the quit statement, if I use the full dataset name (i.e. "libraryname.dataset2.code"), I get a syntax error (Error 22-322: "Syntax error, expecting one of the following..."). If I leave out the library name ("i.e. "dataset2.code"), I receive the following error: "WARNING: This DELETE/INSERT statement recursively references the target table. A consequence of this is a possible data integrity problem. ERROR: You cannot reopen LIBRARYNAME.DATASET1 for update access with member-level control because LIBRARYNAME.DATASET1 is in use by you in resource environment SQL. ERROR: PROC SQL could not undo this statement if an ERROR were to happen as it could not obtain exclusive access to the data set. This statement will not execute as the SQL option UNDO_POLICY=REQUIRED is in effect."

kmaths · ‎01-19-2015

I have two datasets, imported from Excel. Dataset 1 contains observations for several thousand people, each identified by a 3-digit alphanumeric code. Several other variables are linked to each of these codes (i.e. gender, age, date, etc). Sometimes the same code is (correctly) used more than once for different people, so there may be say 5 lines each identified by the same code but with different ages, genders, and so on. Dataset 2 contains only a list of each possible 3-digit code that could be assigned (of which there are 526), with no further information. I would like to compare the two datasets, and identify which of the 526 codes found in Dataset 2 are not being used in Dataset 1. I then wish to add these unused codes into Dataset 1, and list them as missing or blank on the other variables (i.e. gender, age, etc) so that it is clear that there are no observations associated with these particular codes at the present time. I am unsure how to approach this - initially I thought it might be considered a "full outer join" using PROC SQL, but that didn't seem to work. I also considered that I might be able to use PROC COMPARE to identify the values unique to Dataset 2, output them, and then use PROC APPEND to add them into Dataset 1, but I am not sure if this would work, as PROC COMPARE seems to compare only pairs of values. Any advice on how to approach this would be sincerely appreciated! Thank you.

Online Status	Offline
Date Last Visited	‎09-01-2015 07:11 AM

Re: Extracting month/day from dates

Extracting month/day from dates

Re: Combining observations

Combining observations

Re: Combining and filling in gaps in datasets

Combining and filling in gaps in datasets

Re: Identifying distinct values and adding them to existing dataset

Re: Identifying distinct values and adding them to existing dataset

Identifying distinct values and adding them to existing dataset

Re: Extracting month/day from dates

Extracting month/day from dates

Re: Combining observations

Combining observations

Re: Combining and filling in gaps in datasets

Combining and filling in gaps in datasets

Re: Identifying distinct values and adding them to existing dataset

Re: Identifying distinct values and adding them to existing dataset

Identifying distinct values and adding them to existing dataset