About Walternate

Walternate · ‎11-17-2015

Hi all, I have two datasets. One is at the person level (one row per ID) and has a date var. The other has multiple rows per person and has a date var and a categorical var, each of which can be different on each row: Dataset 1 ID date_var1 1 2/5/2013 2 9/13/2014 3 6/4/2012 Dataset 2 ID date_var2 categ_var2 1 1/4/2013 abc 1 1/10/2013 def 1 2/1/2014 ghi 3 8/17/2012 abc 3 9/12/2013 jkl What I want is to join the two datasets such that: 1. If someone from Dataset 1 (like ID 2) is not found in Dataset 2, they remain and all their Dataset 1 vars stay the same. 2. If someone from Dataset 1 is in Dataset 2 and at least one of their date_var2s is prior to date_var1, they join to the row in Dataset2 with the most recent date prior to date_var1 (so for ID1, this would be the second row with date_var2=1/10/2013 and categ_var2=def 3. If someone from Dataset 1 is in Dataset 2 but none of their date_var2s is prior to their date_var1 (like ID=3), they should remain in the resulting dataset with all their variables from Dataset 1 the same and the variables from Dataset 2 set to missing. Any help is much appreciated.

Walternate · ‎11-03-2015

Hi, I have two datasets: Dataset 1 is at the person level and each person has a corresponding date. There are also a number of person-level categorical variables that I'd like to keep as is if possible. In Dataset 2 each person occurs multiple times and has multiple different dates. Dataset1: ID1 Date1 var1 var2....etc. 1 10/31/2014 abc def 2 9/7/2013 xyz abc Dataset2: ID2 Date2 1 9/7/2012 1 2/4/2013 1 10/1/2014 2 9/7/2012 2 10/2/2014 I'm trying to merge the datasets such that from Dataset 2, I only bring in one record for each person--the record that has the Date2 value that remains true as of that person's Date1 in Dataset1. So for example, ID=1 has a Date1 value of 10/31/2014. I would want the latest Date2 value for ID=1 that occurs before Date1--in this case, 10/1/2014. Any help is much appreciated.

Walternate · ‎10-26-2015

Hi, I have a dataset which is supposed to be at the person-level but in reality has some duplicate rows. It has an ID variable, 2 date variables, and several other categorical variables. ID Date1 Date2 Categ1.......Categ10 1 1/3/2014 4/9/2013 abc xyz 1 2/15/2015 def jkl 2 10/9/2013 abc def 2 11/4/2014 jkl xyz 3 2/28/2012 abc xyz 3 3/15/2013 9/7/2014 def jkl What I want to do when there are dupes is to take the row with the latest Date1. I know how to do that by itself: data want; set have; by ID Date1; if last.ID then output; run; The problem is that in cases where a person has a value for Date2 in the row with the earlier Date1, that value of Date2 should be filled into the later Date1 row (ie, the row that will be output into the new dataset without duplicates). So for person 1 for example, their second row would be output and would have Date1=2/15/2015 and Date2=4/9/2013. All the other variables should maintain their values from the original Date1=2/15/2015 row). I need to be able to do this without overwriting Date2 in case the later/output row for a person already has a value for Date2--ie, person 3 should have Date1=3/15/2013 and Date2=9/7/2014 in the output dataset, with all the other variables corresponding to the Date1=3/15/2013 row. Any help is much appreciated.

Walternate · ‎10-22-2015

Hi, Thank you for your reply. I tried it and it worked; the problem is that I only want it to sum for people whose values of date aren't missing. So I tried this: if date ne . then do; total=0; array m{157} mth_200201--mth201501; do i=1 to 157 until (vname(m{i})=cats('mth', year(date), month(date))); total+m{i}; end; As I said, the code did exactly what I wanted it to for people with values of date. The problem is that there are people who are missing values of date whom I don't want to have values of total. The code as it is now is retaining the value of total for each person with a value of date and filling it in for each person with missing date until the next person with a value for date is reached. I need a way to leave total blank for those who are missing date (I thought that was what the subsetting if statement at the beginning was supposed to do, but I could be wrong...).

Walternate · ‎10-22-2015

Sub-dataset names: addl_vars_2010 addl_vars_2011 addl_vars_2012 addl_vars_2013 addl_vars_2014 addl_vars_2015 The exact date should be counted as the end of the year (ie, Dec 31st). Thanks!

Walternate · ‎10-21-2015

Hi, I have a dataset at the person level with an ID and a date variable, and 5 sub-datasets, each of which represents the patients' data in a single year. What I want is to join the dataset to the sub-datasets on ID, but I only want to pull in data from the sub-datasets in the sub-dataset representing the year closest to the year of the date variable, eg: Main dataset: ID Date 1 10/5/2010 2 1/3/2012 Sub dataset 2011: ID var1 var2 var3 1 a b c Sub dataset 2012: ID var1 var2 var3 2 d e f So after the join, it should be like this: ID Date closest_var1 closest_var2 closest_var3 1 10/5/2010 a b c 2 1/3/2012 d e f Any help is much appreciated.

Walternate · ‎10-19-2015

Hi, I have a dataset at the person level with an ID variable, a date variable, and several monthly count variables. ID Date mth_200201 mth_200202 mth_200203...mth_201501 1 1/5/2013 5 1 4 2 2/13/2007 . . 2 3 3/8/2002 1 3 1 What I want to do is create a total variable which is a sum of the values in mth_200201 through the mth_ variable corresponding with the date (so for ID 3 above, through mth_200203 for a total of 1+3+1=5). Any help is much appreciated.

Walternate · ‎10-15-2015

Hi, I have a dataset in which each row is a person's score for a given event on a given date. People have 5 events in each day, as well as a total score which incorporates the 5 event scores but is not a straight sum. People can be in the dataset on multiple days. ID Date Event Score 1 10/1/2012 1 5 1 10/1/2012 2 3 1 10/1/2012 3 7 1 10/1/2012 4 2 1 10/1/2012 5 3 1 10/1/2012 tot 17 1 10/1/2012 tot 14 2 9/7/2014 1 3 2 9/7/2014 1 null 2 9/7/2014 2 7 2 9/7/2014 2 null I'm trying to clean the data but it has 3 problems: 1. One person will have multiple sets of events on one day, in which one set has values and one set is null/missing 2. One person will have multiple sets of events on one day, in which both sets have identical values 3. One person will have multiple sets of events in one day, in which both sets have different values This is my solution to 1 and 2: Proc sort data=have; By ID date event descending score; Run; Data want; Set have; By ID date event descending score; If first.event output want; Run; That should output the first of two identical rows or a non-missing row if one set of responses has values and the other set is missing. However, it does not address the third problem. I know I can't get rid of a row at random so if I could maybe create a flag or some sort of indicator that would show that the person had two distinct scores for that event, that would be helpful. Additionally, when I was going through the data, the ones that I noticed people having multiple scores on the same day for were the total scores. If event=tot were the only ones for which a person had two distinct values of score, I could just delete the event=tot rows altogether, transpose the data, then calculate the total scores myself. However, I would need to be able to verify that it was only the event=tot rows that had two distinct responses, and I'm not sure how to do that. Any help is much appreciated.

Walternate · ‎10-08-2015

Hi, I have a dataset at the person level but with duplicate rows. It has ID and character variables A, B, and C. I wanted unique rows, so I ran this code: proc sort nodupkey data=have; by ID char_A char_B char_C; run; It worked without producing an error message, but when looking through the data I noticed that at least one duplicate row remained. ID Char_A Char_B Char_C 1 abc- d def_g ghi 1 abc- d def_g ghi I'm not sure why this row remained in the data, as it looks like most of the duplicate rows were correctly deleted. Is there a way to troubleshoot and figure out whether there's some minor difference between the character variables or some other reason that the duplicate row wasn't removed? Thanks!

Walternate · ‎10-01-2015

Hi, I have a dataset at the person level, with an ID variable and several month/year variables. ID 200101 200102 200103............through 201505 1 1 . 1 2 . . 1 3 . 1 1 Values of the month/year variables can either be 1 or missing. What I need is to create a new variable that tells me the date at which someone had their first value of 1. So for ID 1, that would be 200101, for 2, it would be 200103, etc. I'm hoping for a solution that doesn't involve typing out everything by hand as I have over 100 month/year variables. Thanks!

Walternate · ‎09-16-2015

The proc sql solution yielded the correct output, but would you mind explaining to me exactly what the code is doing so I can understand how it works? Thanks!

Walternate · ‎09-08-2015

Hi, I have two datasets that each have ID and a date variable. Dataset 1 also has Var1 (which is continous) and Dataset 2 also has Var2 (a dummy variable). Dataset 1 ID Date Var1 1 1/5/12 30 1 3/9/13 54 1 10/4/14 3 2 2/1/12 9 2 1/1/13 23 Dataset 2 ID Date Var2 1 1/5/12 0 1 2/7/12 0 1 3/9/13 1 2 2/1/12 1 2 4/3/12 1 3 4/5/13 1 3 10/4/13 1 The goal is to have for each ID/date combo as much information as possible--so ideally I'd want the values of both Var1 and Var2 on a given date, but if a person only has Var1 or Var2 on a given date, that should be displayed. Therefore, I joined them using proc sql full join on ID and date: proc sql; create table want as select * from Dataset1 a full join Dataset2 b on (a.ID=b.ID) and (a.date=b.date); quit; This gave me the output dataset I wanted: ID Date Var1 Var2 1 1/5/12 30 0 1 2/7/12 0 1 3/9/13 54 1 1 10/4/14 3 2 2/1/12 9 1 2 4/3/12 1 2 1/1/13 23 3 4/5/13 1 3 10/4/13 1 The issue is that there are some people like ID 3, who are only in Dataset 2 but not Dataset 1. Ideally, I'd like to exclude these people from my final output dataset altogether, but I'm not sure how to do that with a full outer join. Alternatively, if I just had a way of identifying which people came from Dataset 2 only, that would be helpful as well. Any help is much appreciated.

Walternate · ‎08-28-2015

Hi, I have a dataset with 5 variables--ID, a group variable that can be anywhere from -50 to +50, and three count variables. The dataset is at the person/group level (so each ID could occur 100 times if a person has a count for every group): ID group_var count_var1 count_var2 count_var3 1 -50 0 1 3 1 -45 2 1 5 1 -1 2 0 1 1 4 4 3 2 1 12 1 0 0 2 -25 1 2 3 2 -7 0 0 1 2 -2 3 4 2 What I want is to obtain summative counts using values of group. So for each ID, I would want to know the sum of count_var1, count_var2, and count_var3, for group_var=-50 to -26, -25 to -1, +1 to +25, and +26 to +50. The output would ideally look like this: ID sum_count_var1_gp_neg_50_to_neg_26 sum_count_var2_gp_neg_50_to_neg_26 sum_count_var3_gp_neg_50_to_neg_26 1 2 2 8 and continuing across with the sums for the other group ranges. Any help is much appreciated!

Walternate · ‎08-26-2015

The output dataset would look like this: Main Dataset ID Date Key_var Categ_var 1 4/5/15 123 A 1 3/9/14 456 B 1 2/3/15 123 2 2/1/13 B 2 1/5/12 789 A Lookup Table: Lookup_key_var Oth_var 123 abc 456 def 789 ghi Combined dataset: ID Date Key_var Categ_var Oth_var 1 4/5/15 123 A abc 1 3/9/14 456 B 1 2/3/15 123 2 2/1/13 B 2 1/5/12 789 A ghi In other words, exactly the same as the Main dataset except that for rows where Categ_var=A, the lookup table has provided values of Oth_var (linked to values of Key_var).

Walternate · ‎08-26-2015

Hi, I have two datasets, one of which is my main dataset and the other of which is essentially a lookup table. The main dataset is at the person/date level and has ID, date, the key variable that I want to connect to the lookup table, and a categorical variable. The lookup table has a variable matching the key variable (though it has a different name) and the variable that I want to pull in to give me information about the key variable. Main Dataset ID Date Key_var Categ_var 1 4/5/15 123 A 1 3/9/14 456 B 1 2/3/15 123 2 2/1/13 B 2 1/5/12 789 A Lookup Table: Lookup_key_var Oth_var 123 abc 456 def 789 ghi The issue is that while I want to keep all of the rows in the main dataset, I only want to join to the lookup table those rows that have a value of A for the categorical variable (other rows have other values, including missing, for this variable). Any help is much appreciated.

Online Status	Offline
Date Last Visited	‎02-27-2025 04:00 PM

ODS Excel - building within-document hyperlink using a numeric row var...

Reading in SAS program and not seeing the formatting the way it shoul...

Weird characters messing up directory/file name macros

Using libname to create directories when some directories not represen...

How to build an output indicating which numbers in a range are not pre...

Re: Possible to remove carriage returns from a string and leave the re...

Re: Pattern matching to two different patterns

Pattern matching to two different patterns

Re: Possible to remove carriage returns from a string and leave the re...

Possible to remove carriage returns from a string and leave the rest o...

Re: Parsing a character string based on format

Re: Residuals in logistic regression

Merge step overwriting shared vars?

Transposing multiple variables

Re: Missing values in infile statement

Joining two datasets based on relative dates

Joining two datasets on date that's true as of the date in the second ...

Filling a variable across multiple rows conditionally

Re: Summing monthly variables until a given date is reached

Re: Matching main dataset to sub datasets based on date var

Matching main dataset to sub datasets based on date var

Summing monthly variables until a given date is reached

Dealing with duplicate sets of rows

proc sort nodupkey left in a duplicate row

Filling variable with names of other variables

Re: Summing vars based on values of another variable

Identifying people from a specific dataset in proc sql full join

Summing vars based on values of another variable

Re: Using a lookup table conditionally

Using a lookup table conditionally