About quickbluefish

quickbluefish

Sorry, the PROC SORT should be on HAVE, not WANT. And I accidentally wrote 'vax_admin_date' instead of 'vaccine_admin_date'

quickbluefish

proc sql noprint; select max(nvax) into :maxvax trimmed from (select id, count(*) as nvax from HAVE group by id); quit; proc sort data=want; by id vax_admin_date; run; data want; set have; by id; length vax1-vax&maxvax 4 n 3; format vax1-vax&maxvax date9.; array V {*} vax1-vax&maxvax; retain V n; if first.id then do; call missing(of V[*]); n=0; end; n+1; V[n]=vaccine_admin_date; if last.id then output; keep id vax1-vax&maxvax; run;

quickbluefish

The way that I've usually done this is to create a "counting process" data structure, sometimes known as "long format" longitudinal data. The idea is that there's one row per period of time during which everything is static. In your case, since there's only one drug, this is pretty simple (this assumes you're only following people while they remain on chemo). You start following everyone on day 0 (first dose of chemo) and endday is the day at which the status of one of the drugs (again, you only have one here - cardioTx) changes. For patient 1 below, they're on cardioTx during the range 0-12, then stop for 3 days (>12-15), then restart until the end of follow-up: >15-48. Patient 2 starts without cardioTx and then starts on day 20. And so on. Everyone has an arbitrary number of rows (some people might only have 1 row). patientID startday endday cardioTx CVA 1 0 12 1 0 1 12 15 0 0 1 15 48 1 0 2 0 20 0 0 2 20 34 1 0 2 34 95 0 1 The basic PHREG syntax for counting process data looks like this: proc phreg data=CP; model (startday, endday) * CVA (0) = cardioTx <other predictors> / risklimits ties=efron; run; ...where CVA is the outcome. There's a brief explanation of this method here (but a LOT more elsewhere): https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/statug/statug_phreg_details12.htm

quickbluefish

A couple pharmacokinetics questions: 1) how long do you expect the protective effect of the cardioTx drug to last? Only while the person is still taking it? 2) do you expect that the drug is immediately protective or does it take some amount of time / number of doses for protection to be achieved? If the drug is expected to be ~immediately protective and only protective while the person continues to take it, then really there's no difference between a person who starts it early (i.e., at the time of first chemo) or later on (prior to any CVA). In this case, I would suggest that time starts at first dose of cardioTx and ends at first of either stoppage of that drug, however you define that, or CVA. That's the simplest approach. Alternatively, if you think the drug is ONLY protective if it's received right at the beginning - in other words, that in order to prevent heart damage that might lead to a later cardiovascular accident, a person needs to begin the protective therapy at the same time (or before) the start of chemo - then you could run this as an intent-to-treat (ITT) study where you ignore the fact that some people start cardioTx later on. There'd be no censoring of people who start late. If you really want to treat this as a time-varying covariate, where people can be on and off cardioTx at different time intervals, then you'd need to create a "counting process" data structure (or the horizontal equivalent) and use the PHREG syntax for that (you can find this by google). In this case, a person can contribute to more than one cohort. This is certainly doable, but results are not simple to interpret, and you will need to consider whether you're allowing the possibility of multiple CVA events per person or censoring at the first event - this will determine whether you need to use the ID statement in PHREG, I believe. If you really only care about comparing people who receive cardioTx *at the time of first chemo* vs. those who do not receive treatment, then one possibility is to censor people who start off without cardioTx and then receive it later (that is, censor such people at the date they receive cardioTx). However, that is definitely prone to introducing bias unless you have some sense that the day of starting cardioTx (relative to chemo) is random. Have you looked at predictors of early receipt (day 0) receipt of cardioTx, i.e.: proc logistic data=cardio descending; model early_cardioTx = <predictors>; run; ...where predictors might be things like demographics, socioeconomics, comorbidities, etc. Another possibility (to enhance at least the first, simple or ITT approaches above) is to use inverse propensity of treatment weights (IPTW), where the weights are essentially the inverse of the probabilities that would be output (using ODS) by the logistic model above. In fantasy-land, at least, applying these weights to people and then running one of the models above simulates a clinical trial in which you, for instance, randomized people to early cardioTx vs. not and then followed them to look for CVA.

quickbluefish · ‎09-29-2024

Unless you're going to use an LLM like chatGPT, no, there's not a way to do this. Better yet, though, don't use R. The idea that R is better for visualizations is very outdated.

quickbluefish · ‎09-27-2024

Hi - sounds like you need a "counting process" dataset as input for PHREG. Are you still looking for a way to create this? I have a macro for it - let me know.

quickbluefish · ‎09-16-2024

Are you still looking for an answer to this PSMATCH question?

quickbluefish · ‎07-06-2022

Haha, on 2nd thought, you might be able to achieve basically the same thing by first randomly sorting your input dataset and then running proc psmatch as you already were. Worth a try. But being able to do the matching 'by hand' does give you a lot more flexibility.

quickbluefish · ‎07-06-2022

Are you actually making use of the propensity score or are you just using PSMATCH as a way to streamline the matching? There is (I'm almost certain) no way to do what you're asking using PSMATCH. If you need propensity scores, I would suggest just outputting them (the predicted probabilities for each person / month) from a model you run in proc genmod or proc logistic, log transforming them and then using PROC SQL to create all possible matches (left joining the treated people to the controls on your various matching variables and within whatever caliper of the log-transformed ps scores you want to use, e.g., 0.25). Since you have several matching variables and the caliper, this shouldn't result in something too gigantic. For purpose of explanation, I'll call this dataset RAW_MATCH -- this will have separate columns for the treated patient ID and the control/month patient ID. After that, you'd need to do something in a DATA step to achieve your goal (I am assuming you're matching without replacement, e.g., a Feb 2019 control record from person X cannot be matched to more than one treated person and that you're merely trying to limit the total number of times that _any_ record of person X is used as a match). The simplest way of doing this might be, for each record in the RAW_MATCH dataset, create a random number (ideally by first writing a CALL STREAMINIT(xxx); statement in the data step, where xxx is some randomly chosen integer, and then the RAND() function to create a random number). Then sort the dataset by the case patient ID variable and the random number you just created (i.e., within each case patient, sort by the random number). Then, in another data step, simply take the first N control records for each case patient, e.g., to keep up to 4 controls per treated person: data treated (keep=ptID caseID ncontrols) controls (keep=controlID controlMonth caseID rename= (controlID=ptID)) ; set raw_matches; by ptID randnum; length caseID 8 ncontrols 3; retain caseID 0 ncontrols; if first.ptID then do; caseID+1; ncontrols=0; end; if ncontrols<4 and not missing(controlID) then do; ncontrols+1; output controls; end; if last.ptID then output treated; run; data final_matches; set treated (in=A) controls (in=B) ; length isTreated 3; isTreated=A; run; proc sort data=final_matches; by caseID descending isTreated; run; There are more sophisticated things you could do, of course, but that might get you what you're after.

quickbluefish · ‎06-23-2021

You could try this - not tested - might have bugs. Here, I am calling your dataset 'stuff' and I'm calling your variables 'ID' (for the oo1, oo2... column) and 'value' for the 2nd column. * make a row variable so that you can sort by ID (oo1, oo2, oo3 etc.) without ruining the order of the integer variable (which I am calling 'value') ; data stuff; set stuff; row=_N_; run; proc sort data=stuff; by id row; run; proc sql noprint; select max(nrows) into :maxrows from (select ID, count(*) as nrows from stuff group by ID); quit; * set to whatever value it is you are trying to identify between two matching values ; %let SEARCHVAL=7; data stuff; length row 3; set stuff (drop=row); by id; array T {&maxrows} _temporary_; array f {&maxrows} _temporary_; retain row 0; if first.id then do; call missing(of T[*], of f[*]); x=0; end; x+1; T[x]=value; f[x]=0; if last.id then do; i=2; do while(i<x); i+1; if T[i-2]=T[i] and T[i-1]=&SEARCHVAL then f[i-1]=1; end; do i=1 to x; row+1; value=T[x]; flag=f[x]; output; end; end; keep row ID value flag; run;

quickbluefish · ‎11-27-2020

Thanks! For some reason, I had in my mind that the 'trimmed' option only worked when you were creating a range (like, select x into :x1-x10 trimmed from...) - thanks for the tip!

quickbluefish · ‎11-27-2020

Hi there - You can just do this with a DATA step with temporary arrays -- first get the start and end year into macro variables, then the data step that follows. The first data step just produces some test input - just swap out TEST for your actual data. Hope this helps. /* this is just example input -- replace with your actual data */ data test; length dt val 8; format dt date9. val dollar8.2; do dt=10000 to 15000; val=ranuni(0)*1000; output; end; run; /* end of example input */ proc sql noprint; select year(min(dt)) into :sy from test; select year(max(dt)) into :ey from test; quit; %let sy=%sysfunc(compress(&sy)); %let ey=%sysfunc(compress(&ey)); data monyr; set test end=last; length mon 3 year&sy-year&ey 8; array t {12,&sy:&ey} _temporary_; t[month(dt),year(dt)]+val; if last then do; array y {&sy:&ey} year&sy-year&ey; start=0; do mon=1 to 12; call missing(of y[*]); do yr=lbound(t,2) to hbound(t,2); if start then y[yr]=(t[mon,yr]-t[mod(mon-13,12)+12,yr-(mon=1)])/t[mon,yr]; start=1; end; output; end; end; keep mon year:; run; proc print data=monyr width=min; run;

quickbluefish · ‎06-03-2020

Thank you! Yeah, I am mostly wondering _why_ trim is necessary for those that end with % vs. not (even if length is the same, e.g., C5% and C56, to give a fake example) and that the problem cannot be resolved by pre-processing the CODELIST file -- trimming it there, either in a DATA step or subquery -- rather than in the ON clause. I am wondering if it's got something to do with the fact that % also serves as an escape character, and it's actually somehow causing SAS to hang on to an extra bit of whitespace for, e.g., C5% and not C56 ('C5% ' vs. 'C56'), forcing you to have to explicitly trim it off. Unrelated - I saw your comment about working with SAS EG being like walking on glass shards (in your post about file locks) - could not agree more. It inspires such atrocious programming habits in virtually every new hire we've had.

quickbluefish · ‎06-03-2020

That is really interesting - I will give it a try and compare processing times. I also like the 'CALL ZERO' idea - I have had this thought many times and never realized that this site had anything like a ballot to submit ideas. Thanks!

quickbluefish · ‎06-02-2020

Thanks, yes, I would prefer a format as well, maybe even using the regex options, but the codelist is both messy and dynamic... and potentially too large for a format as well. Appreciate the reply!

Online Status	Offline
Date Last Visited	9 hours ago

Re: Print variable only on first line

Re: Print variable only on first line

Re: Time-varying covariate in Cox model (no drug, receiving drug at ba...

Re: Time-varying covariate in Cox model (no drug, receiving drug at ba...

Re: Analytics Code Conversion/Replication Tool

Re: Coding time varying covariate for Cox proportional hazards regress...

Re: How to Use PROC PSMATCH to Match on Age with a Tolerance of 0.1

Re: proc psmatch question: Possible to limit number of matches to same...

Re: proc psmatch question: Possible to limit number of matches to same...

Re: Help with checking preceding and following value

Re: proc psmatch question: Possible to limit number of matches to same...

SAS Studio 'code' tab focus

Re: Programming 1/2 content: removed INFILE and added macro vars?

Allow colon wildcards to work with variables with common suffixes, or ...

add a CALL CONSTANT call routine (and maybe also CALL CONSTANTC)

Re: proc psmatch question: Possible to limit number of matches to same...

Re: proc psmatch question: Possible to limit number of matches to same...

Re: PROC PHREG survival interactiuon between multiple groups

Re: Vertical Summation with a Condition

Re: Print variable only on first line

Re: Print variable only on first line

Re: Time-varying covariate in Cox model (no drug, receiving drug at ba...

Re: Time-varying covariate in Cox model (no drug, receiving drug at ba...

Re: Analytics Code Conversion/Replication Tool

Re: Coding time varying covariate for Cox proportional hazards regress...

Re: How to Use PROC PSMATCH to Match on Age with a Tolerance of 0.1

Re: proc psmatch question: Possible to limit number of matches to same...

Re: proc psmatch question: Possible to limit number of matches to same...

Re: Help with checking preceding and following value

Re: Summarise data with months down the side, years across the top, an...

Re: Summarise data with months down the side, years across the top, an...

Re: SQL LIKE operator and whitespace

Re: SQL LIKE operator and whitespace

Re: SQL LIKE operator and whitespace