About Walternate

Walternate · ‎02-10-2016

This worked except for one thing: ID3 is incrementing on every row instead of for every change in ID2, like this: ID1 ID2 ID3 1.1 abc 1 1.1 abc 2 1.2 abc 3 1.1 def 4 Instead, it should only increase if the ID2 changes: ID1 ID2 ID3 1.1 abc 1 1.1 abc 1 1.2 abc 1 1.1 def 2

Walternate · ‎02-10-2016

Hi, I have a dataset at the person-month level with an ID variable that is not clean, another ID variable, and a few other categorical variables. Dataset1: ID1 ID2 categ vars..... 1 123 1.1 123 1.2 234 1.2 234 In ID1, the 1 is the valid value, while 1.1 and 1.2 are invalid values. ID 2 is the way I can distinguish one person from another, but I need to retain as many valid values of ID1 as possible. I need to do two things: One is that for every person (ie, value of ID2) that has at least one valid value of ID1, I want to fill the rest of their rows with that valid value of ID1. The other is that for every person that never has a valid value of ID1, I want a new variable, ID3, to give them a value. Dataset 2 ID1 ID2 ID3 categ vars..... 1 123 1 123 . 234 1 . 234 2 This is what I have coded so far to accomplish this: proc sql; create table want1 as select *, case when ID1 like '%.% then 1 else 0 end as invalid_ID1 from have; quit; proc sql; create table want2 as select *, count(ID2) as num_rows, count(invalid_ID1) as rows_w_invalid_ID1 from want1 group by ID2 order by ID2, invalid_ID1; quit; data want3; set want2; by ID2 invalid_ID1; retain ID3 1; if num_rows>rows_w_invalid_ID1 then ID3=.; else if first.ID2 and num_rows=rows_w_invalid_ID1 then do; ID3+1; ID1=' ';end; retain ID1_keep; if first.ID2 then ID1_keep=ID1; if rows_w_invalid_id=1 and num_rows>rows_w_invalid_id then ID1=ID1_keep; run; This code correctly completes my first objective of overwriting invalid values of ID1 with valid values for each person using ID2 as the person identifier. However, when assigning ID3, it resets to 1 every time it encounters a missing value, ie someone who would not be assigned a value of ID3. So it looks like this: Dataset 2 ID1 ID2 ID3 categ vars..... 1 123 1 123 . 234 1 . 234 2 2 345 2 345 . 456 1 What I would want it to do is continue counting rather than resetting upon encountering the missing variable. Any help is much appreciated.

Walternate · ‎02-08-2016

Hi, I have two datasets with the same variables. Dataset 2 has some rows that are in Dataset 1, but also some rows that are unique to Dataset 2. What I'd like to do is stack the two datasets such that all rows from Dataset 1 remain, and the rows from Dataset 2 are only added if that row doesn't always exist in Dataset 1. Dataset 1 ID categ_var mth_var 1 abc 200901 2 def 200901 Dataset 2 ID categ_var mth_var 1 abc 200901 1 abc 200902 Output dataset: ID categ_var mth_var found_in_ds1 1 abc 200901 1 1 abc 200902 0 2 def 200901 1 Additionally, if possible, I would like to be able to create a variable indicating whether an ID from Dataset 2 was found in Dataset 1. Any help is much appreciated.

Walternate · ‎02-03-2016

Hi, I have two datasets that I am trying to join to one another. The end goal would either be (preferably) a single dataset with all people from Dataset 1 and Dataset 2 with a variable (or variables) telling me which dataset(s) each person was found in, or separate datasets for 1. People found in both Dataset 1 and Dataset 2, 2. People found in just Dataset 1, 3. People found in just Dataset 2. There are two tricky parts: One is that there are several different ID variables on which the datasets could be joined (which all need to be tried as no one, or even combination of two, can cover all the possible links that can be made). The other is that while Dataset 2 is at the person level, Dataset 1 is at the person/group level, and a row from Dataset 1 should only be joined to Dataset 2 if the group in Dataset 2 matches the group in Dataset 1. For example: Dataset 1: ID1 ID2 ID3 ID4 ID5 Group.........categ_vars_from_dataset_1 1 . . . . A abc 1 . . . . B def . 2 . . . A ghi Dataset 2: ID1 ID2 ID3 ID4 ID5 Group.........categ_vars_from_dataset_2 1 . . . . A xyz . 2 . . . A jkl Desired outcome dataset: ID1 ID2 ID3 ID4 ID5 Group categ_vars_from_dataset_1 categ_vars_from_dataset_2 person_source_datasets 1 . . . . A abc xyz 1 and 2 1 . . . . B def . 1 . 2 . . . A ghi jkl 1 and 2 The first row of Dataset 1 can be joined to Dataset2 on ID1 and Group, so it has categorical vars from both datasets and the new variable person_source_datasets='1 and 2'. The second row of Dataset1 cannot be joined on any of the ID variables (they have a value for ID1 but their value for Group (B) is not found in the second dataset, so their row in the desired outcome dataset only has the categ vars from Dataset1 and their source dataset variable indicates that they come from Dataset 1 only. The third row of Dataset1 can be joined to Dataset2 on ID2 and Group, so it has categorical vars from both datasets. I do need to try to join on every possible ID, though the order doesn't matter much (except that ID1 should be first as is it is the most populated). As I said, if it's easier to output separate datasets rather than having one summary dataset at the end, that's fine. Most people will have values for more than one ID variable (they can have values for all 5). Any help is much appreciated.

Walternate · ‎02-01-2016

Hi, I have a dataset which is at the person-month level. It has 2 ID variables and a month variable: Dataset1: ID1 ID2 month 1 abc 200901 1 abc 200902 abc 200903 1 abc 200904 def 200901 def 200902 I created a third ID variable which uses ID1 as the base ID (it doesn't look like it from the dataset above, but ID1 is more likely to be populated and cleaner than ID2). However, each row that does not have an ID1 will have the value of ID2 for the new ID variable: ID1 ID2 month ID3_calc 1 abc 200901 1 1 abc 200902 1 abc 200903 abc 1 abc 200904 1 def 200901 def def 200902 def Ultimately I put the data at the person-level, so there is one row per person with ID3_calc: ID3_calc 1 abc def The issue is that I want to divide into 2 datasets: Dataset 2 would have all people for which ID3_calc=ID1. Dataset 3 would have all people for whom ID3_calc=ID2, but only those in which the ID2 does not appear anywhere in Dataset 1. That is, for Dataset 3, I would only want ID3_calc=def, because that person does not have any rows in Dataset 1 in which they do have a value of ID1, while ID3_calc=abc does have rows in Dataset1 under ID1=1. The people like ID3_calc=abc, for whom I have data under both their ID1 and ID2, I would want to just keep the variables based around their ID1s and delete the rows grouped by their ID2s. Any help is much appreciated.

Walternate · ‎01-26-2016

ID1 ID2 Month 00123 null Jan 2010 abc 12345678 Feb 2010 00123 12345678 Mar 2010 Either ID2 should be filled in all 3 rows: ID1 ID2 Month 00123 12345678 Jan 2010 abc 12345678 Feb 2010 00123 12345678 Mar 2010 Or there could be a new ID var if necessary, but I'm not sure what that would look like in terms of the other 3 IDs. The point is that I'm ultimately trying to create a person-level dataset so it's important that the dataset consistently identifies people across each month.

Walternate · ‎01-26-2016

I'm now finding some values of ID that are just nonsense and not corresponding with the "real" values (ie, abc when the real value is D12345), so I think I should change my question. Each row will either have a valid value of ID1 or ID2, but not necessarily both, like this: ID1 ID2 Month 00123 null Jan 2010 abc 12345678 Feb 2010 00123 12345678 Mar 2010 So row 3 has legitimate values for both ID1 and ID2; while row 1 has a legitimate value for ID1 and ID2=null; while row 2 has an illegitimate value for ID1 while ID2 is valid. What I would like to do is use ID2 (which doesn't seem to have issues of invalid values like ID1, but is just missing in some rows) as the primary identifier, but somehow fill the rows where ID2=null with the correct ID2 value, which I will have to identify using ID1 so I know it's the same person on both rows. Is there a way to do that?

Walternate · ‎01-26-2016

If I strip 0s for all of them, don't I have to set a length? This could be difficult as without the zeroes, some would have a length of 5, some 4, some 3, some 2, some 1.

Walternate · ‎01-26-2016

Hi, I have a dataset at the person-month level with ID and month. ID Month 00001 Jan 2010 00001 Feb 2010 00001 Mar 2010 1 Apr 2010 00001 May 2010 The issue is that for some people in some months, the ID var has been stripped of the leading 0s. I thought that I could just add leading 0s until the length of the ID var was reached, but some of the IDs are 5 digits and some are 7. There is a second ID var which should correspond with ID, although it is not always populated (which is why I used ID in the first place). Is there a way to correct the values of ID which have been stripped that I haven't thought of? Any help is much appreciated.

Walternate · ‎01-22-2016

Hi, I have two datasets which I would like to stack. They have the same variables. The tricky part is that I only want to keep people from Dataset 2 if they don't already have an entry in Dataset 1; that is, I want to add to Dataset 1 those people from Dataset 2 that are unique and not already in the dataset only. I know how to do this in two steps (merge the two such that only the people who are in Dataset 2 but not Dataset 1 are left, then stack that with Dataset 1 using a set statement), but I wanted to know if there's a way to do it in one step. Any help is much appreciated.

Walternate · ‎01-19-2016

Hi, I have a dataset at the person-month level (however, each person can have more than one row in one month). It has an ID variable, three date variables, and two categorical variables. If a person has more than one row per month, the values of Date2, Categ 1, and Categ2 will be different in each row. Date1 shows which month/year the variables are "as of". ID Date1 Date2 Date3 Categ1 Categ2 1 201001 1/1/2009 . abc a 1 201001 1/25/2010 . def b 1 201002 1/1/2009 . abc a 1 201002 1/25/2010 2/3/2010 def b 1 201003 1/1/2009 . abc a 1 201003 3/12/2010 3/12/2010 ghi b What I want is to do the following: 1. Create an indicator variable for everyone that has Categ1=def, and a variable that says the earliest Date2 for which they have that value. 2. Take the value of Categ2 at the earliest Date2 where Categ1=def (in this case, b). Find the minimum value of Date3 within the rows where Categ2=b. 3. Additionally, also within the rows where Categ2=b, see whether there are any values of ghi in Categ1; if so, create an indicator variable showing that the person had that value and a date variable for when they had it (using Date2). I can do most of this by myself; the issue I'm having is systematizing only looking in the rows with the Categ2 value that corresponds with the earliest instance of Categ1=def. Any help is much appreciated.

Walternate · ‎12-10-2015

Hi, I have a dataset with ID, a date variable, a variable I created which has the month of the date variable but is always 2 digits (ie, 01, 02, etc.) and indicators over 24 months called mth_200501 through mth_200612. To sumthe number of months where mth_xxxxxx=1 prior to the date variable, I used the following code: data want; set have; array mths{24} mth_200501--mth_200612; do i=1 to 24 until (vname(mth{i})=cats('mth', year(date1), date1_mth_char)); prior_mths+mth{i}; end; run; It worked, but I received a message that there were 24 missing values generated by this code. There was only 1 person with a value of missing for date1, so my question is this: is the source of the 24 missing that the 1 person with missing values for date1 went through each month of the array? Any help is much appreciated.

Walternate · ‎12-03-2015

The problem is that the ID is not always the same length--it is sometimes 5 and sometimes 6 digits. The date part is 18423, the 1 is from the numeric variable which was also concatenated with the ID var and date var.

Walternate · ‎12-03-2015

Hi, I have a dataset with an ID variable, and a concatenation of the ID variable, a date variable, and a numeric variable. What I want is to take the date value out of the concatenated variable and format it as a date. ID concatenated_var 12345 12345184231 123456 12345618444123 What I tried was this: data want; set have; date_var=input(substr(ID, 5, 5), 5.); run; and then another step that formatted date_var as a date variable (for some reason when I tried to do that all in one step like this: date_var=input(substr(ID, 5, 5), mmddyy10.), every value of date_var was missing). It worked, but the problem is that I'm now finding that the ID variable has either 5 or 6 digits. The numeric variable at the end of the concatenated variable also has different lengths. My question is whether there's a way to extract the date part of the concatenated variable (possibly using the ID variable since I do have that one; I could probably bring the numeric one in but don't currently have it in my dataset). Any help is much appreciated.

Walternate · ‎11-30-2015

Hi, I have two datasets. Dataset1 is at the person-level and has ID as well as some categorical variables. Dataset2 is at the person-event level and has a matching ID variable and an event date variable that is month/year. Dataset 1 ID categ_var1...categ_var5 1 abc ghi 2 def jkl Dataset 2 ID event_date 1 Apr 2012 1 May 2012 1 Jun 2012 1 Aug 2012 Some people from Dataset 1 will not be in Dataset 2 at all. What I want is to keep everyone from Dataset 1 that has events in April, May, and June 2012 in Dataset 2 (so ID=1 would stay, ID=2 would not as they were not found in Dataset 2). Any help is much appreciated.

Online Status	Offline
Date Last Visited	‎02-27-2025 04:00 PM

ODS Excel - building within-document hyperlink using a numeric row var...

Reading in SAS program and not seeing the formatting the way it shoul...

Weird characters messing up directory/file name macros

Using libname to create directories when some directories not represen...

How to build an output indicating which numbers in a range are not pre...

Re: Possible to remove carriage returns from a string and leave the re...

Re: Pattern matching to two different patterns

Pattern matching to two different patterns

Re: Possible to remove carriage returns from a string and leave the re...

Possible to remove carriage returns from a string and leave the rest o...

Re: Parsing a character string based on format

Re: Residuals in logistic regression

Merge step overwriting shared vars?

Transposing multiple variables

Re: Missing values in infile statement

Re: Assigning an ID variable across missing rows

Assigning an ID variable across missing rows

Stacking two datasets with overlapping rows

Creating a dataset through joins on several ID variables

Identifying a combined ID var across multiple rows

Re: Keeping ID vars consistent across multiple rows

Re: Keeping ID vars consistent across multiple rows

Re: Keeping ID vars consistent across multiple rows

Keeping ID vars consistent across multiple rows

Stacking two datasets with overlapping IDs

Matching later values of a variable to values as of a given date

Missing values in an array

Re: Separating a concatenated variable

Separating a concatenated variable

Excluding people based on rows in a second dataset