About asgee

asgee · ‎02-26-2021

Hi @mkeintz ! Thanks for the reply. I see what you mean. I think I like your idea of doing the SQL commands instead since that's exactly what I was trying to go for (same values across each of the 5 vars). Thanks, will try it out myself!

asgee · ‎02-26-2021

Hi all, I have this dataset here which is a subset from a much larger master dataset: Dataset A: ID Var1 Var2 Var3 Var4 Var5 1-2345 10 64.5 9.8 2.1 5 6-7890 10 5.4 7.5 10 6.4 9-8755 1.1 4.2 6.4 10.5 5.8 4-3210 10 15.4 3.1 11 13 I have new variables I want to add onto Dataset A based on this Dataset B: Dataset B: Var1 Var2 Var3 Var4 Var5 Var6 Var7 10 64.5 9.8 2.1 5 7.7 5.5 10 5.4 7.5 10 6.4 9.4 6.8 1.1 4.2 6.4 10.5 5.8 8.2 10 10 15.4 3.1 11 13 1.2 51 The problem as you can see is that I don't have an "ID" variable on Dataset B (it's complicated) and so I don't have an identifier variable I can use to match and merge the two datasets together. I tried the code below but my understanding is that this prioritizes the variables I'm listing in order. I'm thinking this would be an issue since Var1 contains 3 rows of "10" which could confuse the merging since it won't know which ID to merge it to (?). proc sort data = dataA; by var1 var2 var3 var4 var5; run; proc sort data = dataB; by var1 var2 var3 var4 var5; run; data new; merge dataA (in=A) dataB (in=B); if A; run; How would I merge it so that it is the combination of values across "var1" - "var5" that I am merging by? Since Dataset A originally came from Dataset B, the combination of unique values across "Var1" - "Var5" in both datasets are essentially the same. The only thing new is the "Var6" & "Var7" variables that I'm trying to add from Dataset B to Dataset A. Any help would be appreciated, thanks.

asgee · ‎01-02-2021

Thanks for this! The code works great and accounts for that as well, thanks again @novinosrin! Really appreciate the help.

asgee · ‎01-02-2021

Thanks so much @novinosrin, yes I realized my mistake afterwards - my apologies for that 😞 . The code above works perfectly, really appreciate your help! Will be more wary the next time I post. Thanks again!

asgee · ‎01-02-2021

@novinosrin I've edited my question above to reflect the "Data" variable. Perhaps the values being (0,1 or .) make a big difference...

asgee · ‎01-02-2021

Hi @novinosrin, thanks for the response! The code appears to work through all of the criteria, however I think it flips the last criteria where if the ID has two observations with existing data (T1 and T2 are non-missing). My fault for not mentioning that the original "Data" variable has values in the format of (0, 1 or .). So for example: - If you were to change the data on ID 9876 to have two rows of non-missing data on both T1 and T2, but those data are "0" (meaning T1=0, T2=0). They'd be considered "non-missing" as they're not "." -When I ran your code, I think for ID's like these, it prioritized picking the T1 row instead of the T2 row. I guess since both rows were 0, it instead picked the first in line. Otherwise, the code works perfectly when selecting the issue of T2 being missing, but T1 isn't (i.e. the ID 9000 issue). It also picks up the ID's with only single rows. I'll keep testing out your code and see if I can adjust it. Maybe its just a resorting error on my end... Thanks again.

asgee · ‎01-02-2021

Hi @PaigeMiller, thanks for the response! I tested your code but I think the proc sort at the start removed all the missing data 😞 I think I'd still want to keep the missing data if the ID has all their rows missing. However, it did fix the issue of the T2 being missing but T1 as not. Will try to test out and see if I can adjust this if anything. Thanks again.

asgee · ‎01-02-2021

Hi all, I have some repeated measures data where each ID could potentially have 2 rows of data. A sample is shown below. ID Time Data 1234 T1 1 1234 T2 1 5555 T1 0 6777 T2 . 9876 T1 0 9876 T2 0 1000 T1 1 2000 T2 1 8888 T1 1 8888 T2 . 9000 T1 0 9000 T2 . 1010 T1 . 1010 T2 . I'm trying to reduce each ID down to 1 observation each. The criteria is that I'm prioritizing their data on Time 2 (Time=T2). So if they have non-missing data on T2, I pick that row as that ID's observation. I used the code below to re-sort the data in descending "Time", so that the row where Time=T2 appears first. After that, it's easy for me to just take the first row (which should be the T2 row if they have 2 rows of data) as the observation for each ID. proc sort data=test; by ID descending time; run; data want; set test; by ID; if first.ID; run; My issue now are the data where they have missing data on T2, but not on T1. Looking at the sample data, if I proceeded with my code above ^, the ID "9000" (as well as ID 8888), would have their T2 observation picked where "Data" is missing. However, when you look at their T1 observation, ID 9000 has existing Data in T1 that's "0". Non-missing data would mean like any value that's not ".". Therefore, Data=0 would count as non-missing data. Basically, if the ID has 2 rows, prioritize picking their T2 row - if there's data. If there isn't, look at their T1 row - if there is data, pick that row instead. If all rows for Data are missing for that ID, then stick with picking their T2 row. I'd like to hopefully produce a table like the one below: ID Time Data 1234 T2 1 5555 T1 0 6777 T2 . 9876 T2 0 1000 T1 1 2000 T2 1 8888 T1 1 9000 T1 0 1010 T2 . Any help would be very much appreciated, thanks.

asgee · ‎12-03-2020

Hi all, I have some repeated longitudinal data and I want to merge them by ID. I'm trying to add the additional data as extra rows for that ID. Data1 ID Date Age 12345 01DEC2020 25 12345 02MAR2021 26 12345 03APR2022 27 Data2 ID Date Age Data_Out XYZ_Out 12345 01DEC2027 32 YES 0.21 12345 02MAR2028 33 NO 0.99 12345 03APR2029 34 YES 0.64 12345 04AUG2030 35 . 0.85 12345 05DEC2031 36 YES 0.19 What I'd like to create is the table below where rows from Data2 is added on top of Data1 (so that they're in chronological order based on date and age): ID Date Age Data_Out XYZ_Out 12345 01DEC2020 25 12345 02MAR2021 26 12345 03APR2022 27 12345 01DEC2027 32 YES 0.21 12345 02MAR2028 33 NO 0.99 12345 03APR2029 34 YES 0.64 12345 04AUG2030 35 . . 12345 05DEC2031 36 YES 0.19 The code I have right now is not merging the way I'd like as it ends up merging horizontally and duplicates the row data. It also gives me a "WARNING: Multiple lengths were specified for the BY variable Subject by input data sets" error. data want; merge data1 (in=A) data2 (in=B); by ID; if A; run; Any help with this would be appreciated, thanks.

asgee · ‎11-09-2020

Hi all, Might be a really dumb/simple question but I'm trying to remind myself of what the spline estimates represent when you run a PROC LOGISTIC (or any model with splines) for that matter. I ran my model using the KNOTMETHOD=PERCENTILELIST(5 35 65 95), which gave me knots placed in these 4 points of my data: Knot1 = 6 Knot2 = 17 Knot3 = 26 Knot4 = 59 However, when I look at my analysis of maximum likelihood estimates table, the first 4 lines show: Intercept Spl1 Spl2 Spl3 My question is, what's the range that these Spline Estimates represent? Does "Spl1" mean the estimate between 0-6? Or 6-17? If it does mean 6-17, why is it not reporting the estimate from 0-6 (and respectively the upper range of 59 or greater)? Or is there some reason that it is omitting those ranges (could be too extreme?). Any help clarifying this would be very much appreciated!

asgee · ‎11-08-2020

Hi @Rick_SAS , thanks for your reply. The steps you've outlined makes sense - I think I'll just continue exploring other options for now and follow-up with your suggestion of plotting the pred probabilities for each subject in the data. I'll have a look at the design matrix link you've sent, seems like that's a good first step for now. Thanks!

asgee · ‎11-07-2020

Hi @Rick_SAS ! Yes I forgot to mention that. That's the next sections in my code (I run proc mianalyze and aggregate the 50 estimates). The "ParameterEstimates=Lgsparms" is that output that I sort of clean later down the line and produce one table that shows the "best" parameter estimates using the MODELEFFECTS statement: ods trace on; proc mianalyze parms=Lgsparms; modeleffects Intercept spline1 spline2 spline3 age weight eth dev product; /* 3 spline effects based on 4 knots */ run; ods trace off; ods listing close; ods output ParameterEstimates=mi_results From there I use the "mi_results" output to produce a table summarizing that one "best" set of parameter estimates. As you pointed out, my trouble really is just trying to visualize the predicted probabilities for that aggregated "best" set. I'm not sure exactly how to isolate the predicted probabilities (either through PROC LOGISTIC or PROC MIANALYZE) for that aggregated "best" set. I tried adding a "predicted=Fit" statement beside the (ods output ParameterEstimates=Lgsparms) line in PROC LOGISTIC but it gave me an error instead. I understand that if I have an output that just has the predicted probabilities for that final model, I can use the SGPLOT (Or EFFECTPLOTS??) function to visualize those results. Not sure if I'm missing an option or a data step to get to that point...

asgee · ‎11-06-2020

I'm trying to run an analysis where I have a continuous variable (serum) and binary outcome "par" (yes/no). My analyses requires that I impute my dataset (50 iterations). I'm still quite new to visualizing plots, and am having trouble trying to visualize the spline effects of my logistic model. Here's my current code: title "Restricted Splines"; title2 "Four Internal Knots"; ods select ParameterEstimates SplineKnots; proc logistic data=imputed_50; effect spl = spline(serum / details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 35 65 95)); /* RESTRICTED CUBIC SPLINE BASED ON 4 KNOTS */ class par (ref='0') dev (ref='0') eth (ref='1') product (ref='0') / param=ref; model par (event='1')= spl age weight eth dev product / selection=none covb; by _Imputation_; /* RUNNING 50 ITERATIONS OF THIS MODEL BASED ON AN IMPUTED DATASET */ ods output ParameterEstimates=Lgsparms; run; What I'm trying to replicate is this graph based on the SAS documentation of Visualizing regressions with splines: The example above is obviously using a continuous dependent variable (MPG). I'm having trouble trying to show the spline effect / spline points of "serum" based on a binary outcome "par", particularly since my modelling is based on an imputation and not just one iteration. I understand there's the "effectplots" option in PROC LOGISTIC but not sure how to implement that in my modelling that is based on an imputed dataset. Any help plotting this would be very much appreciated.

asgee · ‎11-04-2020

Hi @Reeza! Thanks for your reply. Yeah I completely missed that seemingly small yet very important detail of checking to see whether my variable was character or numeric. Seems like that was def the issue. I ended up continuing on with the code I created and it seemed to create the plots I want. Thanks again for your help!!

asgee · ‎11-04-2020

Hi @Reeza ! Yes first I thought of that but not sure if that's the same as the Y-axis as the Logit of Outcome instead of the predicted probabilities... This is the sort of graph that I'm trying to replicate using my data: Again my outcome is binary (0,1) and my variable is continuous. I checked the plots=all function on PROC LOGISTIC and cant seem to find a graph or a way to do this... I'm assuming perhaps I can just do this manually, but not sure how to logit transform just my binary outcome variable and then just plot that logit(par) and continuous fer separate from running a PROC LOGISTIC....

Online Status	Offline
Date Last Visited	‎02-27-2021 01:13 AM

Re: How to merge two datasets based on exact values across multiple va...

How to merge two datasets based on exact values across multiple variab...

Re: How do I detect and subset non-missing observations in repeated da...

Re: How do I detect and subset non-missing observations in repeated da...

Re: How do I detect and subset non-missing observations in repeated da...

Re: How do I detect and subset non-missing observations in repeated da...

Re: How do I detect and subset non-missing observations in repeated da...

How do I detect and subset non-missing observations in repeated data (...

Merging by adding rows on repeated data

How do I interpret spline range estimates?

Re: How to merge two datasets based on exact values across multiple va...

Re: How do I detect and subset non-missing observations in repeated da...

Re: How do I detect and subset non-missing observations in repeated da...

Re: How do I detect and subset non-missing observations in repeated da...

Re: How do I detect and subset non-missing observations in repeated da...

How do I delete multiple observations/rows based on one instance of a ...

Re: How to merge two datasets based on exact values across multiple va...

How to merge two datasets based on exact values across multiple variab...

Re: How do I detect and subset non-missing observations in repeated da...

Re: How do I detect and subset non-missing observations in repeated da...

Re: How do I detect and subset non-missing observations in repeated da...

Re: How do I detect and subset non-missing observations in repeated da...

Re: How do I detect and subset non-missing observations in repeated da...

How do I detect and subset non-missing observations in repeated data (...

Merging by adding rows on repeated data

How do I interpret spline range estimates?

Re: How to plot Restricted Cubic Spline in PROC LOGISTIC (BY IMPUTATIO...

Re: How to plot Restricted Cubic Spline in PROC LOGISTIC (BY IMPUTATIO...

How to plot Restricted Cubic Spline in PROC LOGISTIC (BY IMPUTATION)

Re: How to Create Empirical Logit Plots? (Proc Means in Empirical Logi...

Re: How to Create Empirical Logit Plots? (Proc Means in Empirical Logi...