About bretthouston

bretthouston · ‎03-02-2020

Thanks for the reply - I really appreciate it! As best I can tell they match? Here is an excerpt from the proc contents I've run on both of the tables. Is there a different way for me to figure out if there are non-printing characters? Could it be because of the informats or label?

bretthouston · ‎03-02-2020

Hello, I am trying to merge two tables (example below) by two variables (PatientID and AdmissionID). When I do so, the data in one of the variables (patientWeight) is dropped. If I merge exclusively on PatientID however, this isn't the case. Here is an example of table 1 (merged_txainformation): PatientID AdmissionID txaAdministrationType 1 HSC1 Infusion 1 HSC2 Infusion 2 34253 Bolus 3 32223 Infusion Here is an example of table 2 (weightinfo): PatientID AdmissionID PatientWeight 1 HSC1 68 1 HSC2 78 1 HSC3 110 2 34253 96 3 32223 . Ideally, I would like the table to look like the following: PatientID AdmissionID txaAdministrationType PatientWeight 1 HSC1 Infusion 68 1 HSC2 Infusion 78 2 34253 Bolus 96 3 32223 Infusion . I've tried the following code: data mergedTXAweight; merge merged_txaInformation (in=TXA) weightinfo (in=Ottweight); by PatientID AdmissionID; if TXA then output; run; If I merge exclusively by PatientID then the values in PatientWeight do result in the merged table, but so does the row with PatientID #1 and AdmissionID HSC3 (I only want the people in the merged_txaInformation table). If I merge by PatientID and AdmissionID, then the values in PatientWeight are all missing. Of note, PatientID is a numeric variable (length = 😎 in both tables, and AdmissionID is a character variable (length = 10) in both tables. I'm not sure if this is the issue? Thanks in advance! Brett

bretthouston · ‎12-20-2019

Thank-you so much - this is exactly what I was looking for! Works perfectly!

bretthouston · ‎12-20-2019

Hello, I am trying to compute a new variable based on the character values in the 6th and 7th position of an existing column. For example, I'd like to try to compute the column Newcode from CCIcode: PatientID CCIcode Newcode 1 1GR89QB 1GR89O 2 1AB76EA 1AB76E 3 1CG57KZ 1CG57O 4 1CG57KS 1CG57E Essentially, if position 6 & 7 of the variable CCIcode are AA-KS, then I'd like position 6 of Newcode to be "E". If position 6 & 7 of the variable CCIcode are KZ-XY (there is nothing inbetween KS to KZ) then I'd like position 6 of Newcode to be "O". I've tried a few variations of substr code, but I don't know how to specify the character ranges without having to type the letters individually. Any help here would be much much appreciated! Thanks!

bretthouston · ‎11-08-2019

Thank-you for your help! 🙂

bretthouston · ‎11-07-2019

The variables are lower case! And the file name is 'CRSummary' (in work library). Thanks, Brett

bretthouston · ‎11-07-2019

Hi Art, The other variables are transfusion_1_time_cr (in dtm format) and transfusion_1_product_cr. Thx, Brett

bretthouston · ‎11-07-2019

Hi Art, Thanks for your interest in this question - I'd love to hear what you think. The column name (in the dataset as it is) is transfusion_1_lot_num_cr, and there are actually 15 different transfusions captured (ie this extends from 1 to transfusion_15_lot_num_cr). These were unfortunately named by someone else. You're correct that I don't need the # specified in the final column, I just want to make sure that the transfusion date/time, product and lot# move together in the table (if that makes sense). Thanks, Brett

bretthouston · ‎11-07-2019

Hello, I apologize for the very basic questions, but I've tried a few variations of proc transpose and I haven't been able to figure this one out. Here is a sample of what my data set looks like: PatientID Hospital AdmissionDate Transfusion1Dtm Transfusion1Product Transfusion1Lot# Transfusion2Dtm Transfusion2Product Transfusion2Lot# Transfusion3Dtm Transfusion3Product Transfusion3Lot# 1 HSC 20OCT2018 21OCT2018:18:20:00 RBC C054016798 21OCT2018:22:00 RBC C054016799 22OCT2018:14:25:00 HAS 435710004 2 HSC 18OCT2018 18OCT2018:16:55:00 Platelet C054015830 3 SBGH 23OCT2018 26OCT2018:20:00:00 RBC C054014779 27OCT2018:23:43:00 RBC C054014786 This is what I would like it to look like: PatientID Hospital AdmissionDate TransfusionDtm TransfusionProduct TransfusionLot# 1 HSC 20OCT2018 21OCT2018:18:20:00 RBC C054016798 1 HSC 20OCT2018 21OCT2018:22:00:00 RBC C054016799 1 HSC 20OCT2018 22OCT2018:14:25:00 HAS 435710004 2 HSC 18OCT2018 18OCT2018:16:55:00 Platelet C054015830 3 SBGH 23OCT2018 26OCT2018:20:00:00 RBC C054014779 3 SBGH 23OCT2018 27OCT2018:23:43:00 RBC C054014786 Essentially, I would like the columns TransfusionDtm, TransfusionProduct and TransfusionLot# to collapse length wise. Any guidance here would be very much appreciated. Thanks!! Brett

bretthouston · ‎07-16-2019

Hello, I am trying to satisfy multiple (3) conditions in an if-then statement. I think I have the syntax wrong as I'm not getting the output I would expect. In the sample dataset below: ID Date1 Date2 Date3 Type Status 1 23Aug12 26Aug12 21Aug12 A 1 1 23Aug12 26Aug12 24Aug12 B 0 2 23Aug12 26Aug12 27Aug12 C 0 2 23Aug12 26Aug12 . . 0 3 23Aug12 26Aug12 . . 0 4 23Aug12 26Aug12 21Aug12 B 0 I've tried the following: data want; set have; if Date3<Date1 and Type='A' and Type is not missing then status=1; else status=0; run; Essentially I want to create a new variable (status) with the output as above. Status would be 1 if date3 occurs before date1, the type is B, and date3 is not missing. I think the 'is not missing' part (or the second 'and') is messing things up. Thank-you kindly in advance! Brett

bretthouston · ‎06-11-2019

Hello, I'm having an issue with how to write code for this. I have a data set that mirrors the sample set provided below. I am trying to calculate a risk score, whereby certain codes are 'binned' into groups which then have a certain weight assigned to them. By patient, each group is weighted, and the groups are added to create the final score (Charlson_score in this case). My issue is that the codes that fit within each group are not mutually exclusive (ie - two different codes which could be grouped similarly can be found for the same patient). Therefore if I use an accumulating column to add up the row_scores to calculate the final score, I am over-counting. Here is the code I've written so far: data want; set have; retain Charlson_score; if first.sPatientID then do; Charlson_score=0; Row_score=0; end; if dxCd in: ("I21","I22","I23","I24") then CC_GRP1=1; else if dxCd in: ("I43","I50","I51","I52") then CC_GRP2=1; row_score=(CC_GRP1)*2 + (CC_GRP2)*3 Charlson_score=sum(Charlson_score+Row_score) if last.sPatientID then output; run; Sample dataset: sPatient ID dxCd 1 I21 1 E53 1 I22 1 I50 2 B43 2 C87 2 I52 2 I51 I would like the Charlson_score to be 5 for sPatientID 1, and 3 for sPatientID 2. Ie - if a multiple dxCds fit into CC_GRP1, I would only like them to be counted once. The issue is that by calculating the Charlson_score using an accumulating column this obviously becomes problematic. Any advice would be much appreciated! Thanks!

bretthouston · ‎04-18-2019

Thanks this worked perfectly!

bretthouston · ‎04-18-2019

I am trying to subset data based on the frequency of a given value within a variable. I've attached a screen shot of a proc freq table, and essentially I would like to include all CCIDummy values where row percent UpdtxStatus 'yes' >/=5%. I tried the following code, but the '%' results in a syntax error: proc sql; create table TEST as select * from work.CombiningProceduresDummyVariable group by CCIdummy having (UpdtxStatus='yes')ge 5%; quit; Thanks in advance!

bretthouston · ‎04-18-2019

Thank-you - this is perfect!

bretthouston · ‎04-18-2019

I realize this is a very easy concept, but I'm having trouble assigning values to a new column when my logic involves 'if always' or 'if ever'. For example, in the following sample dataset: ID Status UpdStatus 1 yes yes 1 yes yes 1 no yes 2 no no 3 no no 4 yes yes 4 no yes I am trying to create a new column, where for a given ID if status is ever 'yes' then UpdStatus = 'yes' otherwise UpdStatus = 'no'. [Alternatively, if status is always 'no' then UpdStatus = 'no' else 'yes']. I've written sample attempts below, but I can't figure out the proper language to convey the 'ever' or 'always'. data want; set have; format UpdStatus $3.; if Status="(ever) yes" then UpdStatus = "yes"; else UpdStatus = "no"; run; or: data want; set have; format UpdStatus $3.; if Status ="(always) no" then UpdStatus="no"; else UpdStatus ="yes"; run; Thanks in advance!

Online Status	Offline
Date Last Visited	‎06-26-2022 04:23 PM

Re: Merging two sets with multiple observations

Merging two sets with multiple observations

Re: Obtaining a pooled estimate with proc freq

Obtaining a pooled estimate with proc freq

Re: Using margins macro to compute differences in probabilities within...

Using margins macro to compute differences in probabilities within str...

Coding ordinal predictors in proc psmatch

Re: Logistic regression with PROC GLIMMIX - specification of random st...

Re: Logistic regression with PROC GLIMMIX - specification of random st...

Logistic regression with PROC GLIMMIX - specification of random statem...

Re: Obtaining a pooled estimate with proc freq

Re: Using margins macro to compute differences in probabilities within...

Re: Coding ordinal predictors in proc psmatch

Re: PROC GLIMMIX - generating predicted probabilities at specified cov...

Re: Merging by two variables troubleshoot

Re: Merging by two variables troubleshoot

Merging by two variables troubleshoot

Re: Computing a new variable based on character range

Computing a new variable based on character range

Re: How to convert table from wide to long with multiple columns

Re: How to convert table from wide to long with multiple columns

Re: How to convert table from wide to long with multiple columns

Re: How to convert table from wide to long with multiple columns

How to convert table from wide to long with multiple columns

Proper syntax for multiple 'and' conditions in if-then statement

How to set group flag to 1 if multiple variables contribute

Re: How to subset / filter data based on frequency of a value within a...

How to subset / filter data based on frequency of a value within a var...

Re: How to create new column with imputed values in reference to two o...

How to create new column with imputed values in reference to two other...