About chuakp

chuakp · ‎10-26-2022

Tom, thanks for your reply. I was able to figure it out using your code for grouping prescriptions together.

chuakp · ‎10-24-2022

Thanks Tom. You've pointed me in the right direction for this problem. One remaining issue is the creation of the groups. In the example data, Rx B overlaps with Rx A and Rx C overlaps with Rx B, so you can just compare the current observation to the previous one to group prescriptions together. data have; input PATIENT_ID RX_ID $ DISPENSE_DATE :yymmdd. END_DATE :yymmdd. GROUP_NO ; format DISPENSE_DATE END_DATE yymmdd10.; cards; 1 A 2022-01-01 2022-01-08 1 1 B 2022-01-07 2022-01-09 1 1 C 2022-01-08 2022-01-11 1 1 D 2022-01-19 2022-01-25 2 1 E 2022-01-26 2022-01-30 3 ; But imagine you instead had data like like this, in which C doesn't overlap with A but C does overlap with B. We still want to group C together with A and B, but you can't do that by comparing the current observation to the previous observation. data have; input PATIENT_ID RX_ID $ DISPENSE_DATE :yymmd. END_DATE :yymmdd.; format DISPENSE_DATE END_DATE yymmdd10.; cards; 1 A 2022-01-01 2022-01-08 1 A 2022-01-04 2022-01-06 1 C 2022-01-07 2022-01-11 1 D 2022-01-19 2022-01-25 1 E 2022-01-26 2022-01-30 ; run;

chuakp · ‎10-22-2022

I appreciate the reply. To clarify, I am not trying to calculate the maximum end_date for each PATIENT_ID / GROUP_NO. I am trying to determine the number of days of overlap within GROUP_NO's, then add that number to the end of the the maximum end_date for each PATIENT/GROUP_NO. Part of my difficulty is determining the number of dates of overlap.

chuakp · ‎10-21-2022

I have a database of prescription claims that contains the dispensing date and end date of each prescription, as defined by days supplied. The goal is to calculate how many whether the patient has continuous coverage without more than a 7-day gap. The issue is that there are times in which a patient may have 2 ore more prescriptions in which the dispensing and end dates overlap. For example, the data might look like this: PATIENT_ID RX_ID DISPENSE_DATE END_DATE 1 A 1/1/2022 1/8/2022 1 B 1/7/2022 1/9/2022 1 C 1/8/2022 1/11/2022 1 D 1/19/2022 1/25/2022 1 E 1/26/2022 1/30/2022 If you only consider the dispense_date and end_date, it looks like consider there is a 8-day gap between C and D. However, RX_ID B overlaps with A for 2 days (1/7/2022-1/8/2022), and C overlaps with A for 1 day (1/8/2022), and I would like to consider the patient as having coverage from 1/1/2022 through 1/14/2022 (i.e., 1/11/2022 plus the number of overlapping dates), in which case there is only a 5-day gap before D. Basically, I want to turn the above database into something like the following (the observations for D and E would stay the same because there is no overlap). PATIENT_ID RX_ID DISPENSE_DATE END_DATE_COMBINED 1 A 1/1/2022 1/14/2022 1 D 1/19/2022 1/25/2022 1 E 1/26/2022 1/30/2022 Note: while not in this example data, I'm aware that there could be situations in which prescription D could have started on 1/13/2022, in which case it overlaps with the new range of A after stitching A/B/C together. However, I don't care about that for the purposes of this analysis. I'd appreciate any advice. Thanks.

chuakp · ‎04-12-2021

Thanks. This code works and is very helpful.

chuakp · ‎04-12-2021

Never mind, I answered my own question.

chuakp · ‎04-12-2021

Thanks - this works, but could you help me understand the logic of this code - what does it mean for "if flag" to be true versus false? if flag then do number = input(scan(range,1,"'-"),8.) to input(scan(range,2,"'-"),8.);

chuakp · ‎04-12-2021

The variable is a string in which the numbers are surrounded by quotes.

chuakp · ‎04-12-2021

I have data that contains a range of numbers with a variable called flag for each range. It looks like this: range flag '10040-10040' 0 '10060-10061' 1 I'd like to expand this to a dataset that looks like this. number flag 10040 0 10060 1 10061 1 It's easy to define the beginning and end of the range but not sure exactly how to write the right do loop for this. I'd appreciate any advice.

chuakp · ‎03-28-2021

They're already pre-sorted, which is why I would need to read them in a specific order.

chuakp · ‎03-28-2021

Thanks Tom. Yes, when I tried to run the vendor's suggested read-in code, I got an error: 'gzip' is not recognized as an internal or external command, operable program or batch file. Also, the data do all start on the first line - there are no headers. I expanded the CSV and used your code for reading in the CSVs. It works, but it seems to read the files in a random order, e.g, file part 00540 before 00001. Is there a way to make it read it in a particular order? Thanks.

chuakp · ‎03-28-2021

Hello, I recently received an insurance claims dataset and a SAS read-in file provided by the vendor. The data are in 596 .gz files (which expand to CSV) in a single folder with path "M:/Data/data_2019". The .gz files have names like this: part-00000-bb75630-d120.csv part-00001-bb75630-d120.csv .... part-00596-bb75630-d120.csv I am running SAS 9.4 mount 2, which I believe doesn't have the native support for running gzip. Unfortunately, the vendor's read-in file recommends using the approach below, which relies on being able to run the gzip command. The code in the read-in file is pasted below (note I've truncated the number of variables read into the data step for brevity and replaced the path to M:/Data/data_2019. Note also that this file seemingly would combine all 596 GZ files into a single output file, which is what I want). FILENAME f1 PIPE 'gzip -cd M:/Data/data_2019/*.gz' LRECL=800; DATA out.data_2019; INFILE f1 DLM = '|' DSD END=EOF MISSOVER TRUNCOVER ; INFORMAT id $16. claimnumber $16.; input id claimnumber; run; An alternative would be to expand the 596 GZ files to 596 CSV files, but in that situation, I'm not sure how to modify the above code to read in all the CSVs in the folder. I'd appreciate any suggestions. Thanks.

chuakp · ‎12-14-2020

Thanks. I think this would work, but the challenge would be that I would have to create a format with 200 values since I'm looking for 200 strings.

chuakp · ‎12-14-2020

This code works, thanks very much!

chuakp · ‎12-13-2020

To clarify, it does not have to be an exact match. I just want to subset to observations where one of the 200 strings appears somewhere within the variable's value.

Online Status	Offline
Date Last Visited	‎10-26-2022 07:20 PM

Re: Adding overlapping dates to the end of a date range

Re: Adding overlapping dates to the end of a date range

Re: Adding overlapping dates to the end of a date range

Adding overlapping dates to the end of a date range

Re: Expanding a range of numbers

Re: Expanding a range of numbers

Re: Expanding a range of numbers

Re: Expanding a range of numbers

Expanding a range of numbers

Re: Reading in all GZ files in a folder

Re: Search string variable in one table with string variable from anot...

Search string variable in one table with string variable from another ...

Re: Adding overlapping dates to the end of a date range

Re: Adding overlapping dates to the end of a date range

Re: Adding overlapping dates to the end of a date range

Adding overlapping dates to the end of a date range

Re: Expanding a range of numbers

Re: Expanding a range of numbers

Re: Expanding a range of numbers

Re: Expanding a range of numbers

Expanding a range of numbers

Re: Reading in all GZ files in a folder

Re: Reading in all GZ files in a folder

Reading in all GZ files in a folder

Re: Searching a variable for multiple strings

Re: Searching a variable for multiple strings

Re: Searching a variable for multiple strings