About chuakp

chuakp · ‎12-13-2020

I have a dataset with variable called drug_name. I'm want to subset to observations in which drug_name contains one of 200 strings corresponding to individual molecules. The brute force method is to use index repeatedly. For example, if there were only 3 strings of interest, I could do something like this: data want; set have; where index(drug_name, "ACETAMINOPHEN") > 0 or index(drug_name, "IBUPROFEN") > 0 or index(drug_name, "DIPHENHYDRAMINE") > 0; run; What is the more efficient way to do this? Thanks.

chuakp · ‎02-03-2020

Thanks - I tried this, but it didn't quite work. To answer your question, I am looking for an output dataset with one row per person_id, a column for person_id, and a column for the first date on which all claims had zero payment and on which all subsequent claims had zero payment.

chuakp · ‎02-03-2020

Thank you. I believe I have to add this code if payment NE 0 then first_0 = .; This way, the program finds the first instance of a payment = 0 for a person_id. The program looks at the next observation and determines if payment = 0. If so, then first_0 stays the same. If not, that means that first_0 cannot be the first date in 2019 in which all subsequent claims have zero payments, so we set first_0 back to missing. Without this piece of code, I believe the program would just output the last observation in which payment equaled zero; if there is no such observation, first_0 will be missing. data want; set have; by person_id claim_date; if first.person_id or not missing(payment) then first_0 = .; retain first_0; if first_0=. and payment = 0 then first_0 = claim_date; if payment NE 0 then first_0 = .; keep person_id first_0; if last.person_id and not missing(first_0); run;

chuakp · ‎02-03-2020

I have a claims dataset that looks like this: person_id claim_date claim_id payment 1 01/01/2019 100 50 1 01/03/2019 101 40 1 02/09/2019 102 0 1 12/20/2019 103 0 2 10/20/2019 201 50 2 10/21/2019 202 40 2 11/30/2019 203 30 3 04/02/2019 301 20 3 04/05/2019 302 20 3 05/20/2019 304 0 3 12/30/2019 305 0 3 12/31/2019 306 50 Note that there is only one claim allowed per date in this dataset, the dataset is already sorted by claim_date, and the dataset only contains claims from 2019. The goal is to identify the first claim_date, if any, for which both of the following are true: 1) payment equals 0; and 2) all subsequent observations for the person_id also had payment equal to 0. So for person_id = 1, the date would be 02/09/2019; for person_id = 2, no such date exists because all of their claims had non-zero payments; and for person_id = 3, no such date exists because of the positive payment on 12/31/2019. It's straightforward to identify the first claim-date with payment = 0 for each person_id (the first criteria), but I'm struggling with how to operationalize the second criteria. I'd appreciate any help the community can provide.

chuakp · ‎01-29-2019

Thanks so much - this worked. I also appreciated the suggestion regarding the hash tag approach.

chuakp · ‎01-29-2019

I have two datasets. One, called "search", has 10 observations with a variable called search_term containing different strings. The other one, called "names", has 100 observations with a string variable called "name", each of which has a different drug name. I would like to use the index function to search variable "name" for each of the 10 search terms in "search." To be specific, I don't want to merge name with search_term (which would require an exact match) - I merely want to create an indicator 1 or 0 if any of the 10 values of search_term can be found inside the string "name." For example, if "name" equals "BOTOX A" and if one of the 10 values of search_term is "BOTOX" , the indicator should be set to 1 (and similar if "name" equals "BOTOX B." I'd appreciate any help with this. Thanks.

chuakp · ‎04-13-2017

I have a SAS database with a single variable and 450 observations, each one corresponding to an ICD-9 diagnosis code. I also have a large healthcare claims database in which each observation is a claim with up to 4 diagnosis code fields (dx1-dx4). I have written code that sets all the 450 observations into global variables. In a macro, I set up a 4-element array consisting of dx1-dx4, and then use a do loop to set an indicator to 1 if any of the elements of the array match the 450 diagnosis codes. If 1, then the observation will output. This code, however, is taking an extremely long time to run and I am wondering if there is a more efficient alternative. Here it is below. Note that the 450 global variables are called dxcode1, dxcode2...dxcode450. %let code = dxcode; %macro search; data out; set claims; array a_dx dx1-dx4; flag = 0; %do j = 1 %to 450; %do i = 1 %to 4; if a_dx(&i) = "&&&code.&j." then flag = 1; %end; %end; if flag > 0 then output; run; %mend search; Thanks!

chuakp · ‎03-16-2017

Thanks so much. I have played around with this code and believe this is an elegant solution.

chuakp · ‎03-13-2017

Hi, this is a great approach, but as you suspected I am running into memory issues. Is there a workaround? Thanks.

chuakp · ‎03-13-2017

Thanks for this idea. Looking for PROCDATE to be between the minimum of START and the maximum of END is fine in most cases, but as you point out, you could have a weird situation where the dates are not contiguous from one observation to the next. For example: ID PROCDATE START END 5 3/31/2015 3/1/2015 3/31/2015 5 3/31/2015 5/1/2015 5/30/2015 Here the minimum of START is 3/1/2015 and the maximum of END is 5/30/2015. PROCDATE is between these two but the person was not enrolled in the 14 days after 3/31/2015. Thanks.

chuakp · ‎03-09-2017

Sorry for any confusion. Basically, for ID 1, PROCDATE is 3/31/2015. You can see from observation 3 that this person was enrolled between 3/1/2015 and 3/31/2015, and from observation 4 that this person was enrolled from 4/1/2015 and 4/30/2015. The 14-day period after PROCDATE is 3/31/2015-4/13/2015, so this person would be kept in the dataset since they had 14 days of coverage after the procedure. The tricky part is trying to tell SAS when to look at the next observation's START and END dates when the 14 day period crosses months. One thought would be to set up variables describing the periods of enrollment for each ID (e.g., for ID 1, this person was continuously enrolled for 365 days; the start date of enrollment would be 1/1/2015 and the end date would be 12/31/2015. If I had the data set up this way, I could just look in between the start and end dates to see if the 14-day period after PROCDATE was contained within these dates. Hope this makes sense. Thanks.

chuakp · ‎03-09-2017

I have an insurance claims dataset and am looking to use the enrollment file to subset to people who were enrolled for at least 14 days after a particular procedure (the date of the procedure is variable PROCDATE). The enrollment file is set up where each observation is a particular enrollment period for a month, demarcated by the variables START and END. People who are enrolled in all twelve months have twelve observations, one for each month. However, in some cases it's also possible for someone to have more than twelve observations (for example, if they are enrolled January 1, 2010 to January 10, 2010 and then January 20, 2010 to January 31, 2010, they would have two observations for the month of January). Here's what the data look like. ID 1 and 2 should be included, as should ID 4, but ID 3 should be excluded since fourteen days beyond the procedure date of 8/29/2015 is 9/12/2015, and their enrollment stopped 8/31/2015. I've tried playing with proc transpose but no luck. Any advice would be appreciated. Thanks! ID PROCDATE START END 1 3/31/2015 1/1/2015 1/31/2015 1 3/31/2015 2/1/2015 2/28/2015 1 3/31/2015 3/1/2015 3/31/2015 1 3/31/2015 4/1/2015 4/30/2015 1 3/31/2015 5/1/2015 5/31/2015 1 3/31/2015 6/1/2015 6/30/2015 1 3/31/2015 7/1/2015 7/31/2015 1 3/31/2015 8/1/2015 8/31/2015 1 3/31/2015 9/1/2015 9/30/2015 1 3/31/2015 10/1/2015 10/31/2015 1 3/31/2015 11/1/2015 11/30/2015 1 3/31/2015 12/1/2015 12/31/2015 2 7/25/2015 1/1/2015 1/10/2015 2 7/25/2015 1/20/2015 1/31/2015 2 7/25/2015 2/1/2015 2/28/2015 2 7/25/2015 3/1/2015 3/31/2015 2 7/25/2015 4/1/2015 4/30/2015 2 7/25/2015 5/1/2015 5/31/2015 2 7/25/2015 6/1/2015 6/30/2015 2 7/25/2015 7/1/2015 7/31/2015 2 7/25/2015 8/1/2015 8/31/2015 2 7/25/2015 9/1/2015 9/30/2015 2 7/25/2015 10/1/2015 10/31/2015 2 7/25/2015 11/1/2015 11/30/2015 2 7/25/2015 12/1/2015 12/31/2015 3 8/29/2015 8/1/2015 8/31/2015 4 1/20/2015 1/1/2015 1/31/2015 4 1/20/2015 2/1/2015 2/28/2015

chuakp · ‎03-13-2016

Thanks for the suggestion. This doesn't seem to achieve what I want to do. Perhaps this is not easy to achieve in PROC SQL and I just need to use an additional step. Are there are concerns with using proc sort data = antibiotics_bronchitis nodupkey; by claimid; run; This should eliminate any duplicate drug claims that happened to be associated with more than one claim with a diagnosis of bronchitis on the same day or in the previous three days.

chuakp · ‎03-12-2016

I'm working with a health insurance claims database and am trying to identify instances in which there was a drug claim for an antibiotic that occurred either on the same day as a claim with a diagnosis code of bronchitis or within the 3 days after of a claim with a diagnsois code of bronchitis. I am working with two tables in PROC SQL, the table antibiotics (which has all drug claims for antibiotics for people in my sample) and the table bronchitis, which has all claims with a diagnosis of bronchitis for people in my sample. If there was more than one claim with a diagnosis of bronchitis on the same day for a particular person, I consider that to be one instance of a bronchitis diagnosis, and I only want to allow a maximum of one antibiotic per instance of a bronchitis diagnosis. Here is my code: proc sql; create table antibiotics_bronchitis as select L.claimid, L.servicedate, L.drugname, R.claimid as claimid_bronchitis, R.servicedate as svcdate_bronchitis, from work.antibiotics as L inner join work.bronchitis as R on L.enrolid=R.enrolid where L.servicedate-3 <= servicedate_bronchitis <= L.servicedate order by L.claimid; quit; Using this code, I get many observations with duplicate claimid when a particular drug claim (say one that occurred January 4, 2015) is matched with more than one claim with a diagnosis of bronchitis with a service date of January 1, January 2, January 3, or January 4 (e.g.,this would happen if there were two claims with a diagnosis of bronchitis of January 1, one claim with a diagnosis of bronchitis on January 1 and one claim with a diagnosis of bronchitis on January 2, etc.). Using the "distinct" term in the select statement doesn't do anything because observations with duplicate claimid will have different claimid_bronchitis. I could just use proc sort NODUPKEY by claimid after this code, but I was wondering if there is an alternative method within SQL. Thanks.

chuakp · ‎01-20-2016

Thanks, Reeza - this worked. Also, thanks to mohamed_zaki - your solution appears to have worked as well.

Online Status	Offline
Date Last Visited	‎10-26-2022 07:20 PM

Re: Adding overlapping dates to the end of a date range

Re: Adding overlapping dates to the end of a date range

Re: Adding overlapping dates to the end of a date range

Adding overlapping dates to the end of a date range

Re: Expanding a range of numbers

Re: Expanding a range of numbers

Re: Expanding a range of numbers

Re: Expanding a range of numbers

Expanding a range of numbers

Re: Reading in all GZ files in a folder

Re: Search string variable in one table with string variable from anot...

Search string variable in one table with string variable from another ...

Searching a variable for multiple strings

Re: Find first observation for which all subsequent observations have ...

Re: Find first observation for which all subsequent observations have ...

Find first observation for which all subsequent observations have a pa...

Re: Search string variable in one table with string variable from anot...

Search string variable in one table with string variable from another ...

Efficient code for searching through a list

Re: Searching a range of dates

Re: Searching a range of dates

Re: Searching a range of dates

Re: Searching a range of dates

Searching a range of dates

Re: Limiting duplicates in Proc SQL

Limiting duplicates in Proc SQL

Re: Eliminating observations with datasets in long form