About JH74

JH74

Thank you. I appreciate your help. Pretty much the same answer as the other user added. It worked.

JH74

Hi. I have a data set that contains some variables that I need to identify if it's the first and only occurrence, or first of multiple, and create new fields in a new data set that can be reviewed further. The data sets have thousands of lines. Here's an example of what I am starting with: id line exclude_ind reimb_amt 12345 01 00 20 12345 02 00 0 54321 01 01 10 54321 02 01 10 54321 03 00 5 34567 01 05 0 34567 02 05 15 34567 03 09 20 34567 04 03 10 From the above data set, I want a field named id_count, which identifies unique id's. Each unique id gets a value = 01. I want to create a field called exclude_ind_count, which is made of a value = 01 for each unique value in the exclude_ind field, within the unique id field. So, the resulting new data set looks like this: id id_count line exclude_ind exclude_ind_count reimb_amt 12345 01 01 00 01 20 12345 00 02 00 00 0 54321 01 01 01 01 10 54321 00 02 01 00 10 54321 00 03 00 01 5 34567 01 01 05 01 0 34567 00 02 05 00 15 34567 00 03 09 01 20 34567 00 04 03 01 10 So, each id_count field should, when summed, = 1 for each unique id. Basically I want a count of how many unique id's are in my data set. And, as shown, within each unique id, each unique instance of exclude_ind has a resulting exclude_ind_count = 01. So, when summed, I can get an idea of how many id's have either 1 exclude_ind, or multiple exclude_ind.

JH74 · ‎09-23-2023

Hi. I have a data set that contains the fields claim_number claim_line_number, and dialysis. I need to create a new varible named remove_clm and this variable should be populated with a "Y" for each claim line for the claim number, if ANY line has dialysis = "Y". Data Example: Claim_Number Claim_Line_Number Dialysis 1234567 1 Y 1234567 2 Y 1234567 3 N 1234567 4 N 7654321 1 N 7654321 2 N 7654321 3 N 7654321 4 N 7654321 5 N 7654321 6 Y 2468100 1 Y 2468100 2 Y 2468100 3 Y 3456789 1 Y 3456789 2 N 3456789 3 Y 3456789 4 Y 3456789 5 N I have tried this code: proc sort data=save.dataset out=dataset_srt; by claim_number dialysis; run; data test; set dataset_srt; by claim_number dialysis; retain remove_claim; if first.claim_number then remove_claim='N'; if dialysis = "Y" then remove_claim = 'Y'; run; But, the above code is not giving me a "Y" for each line of the claim, only the line where Dialysis = "Y". Like this: Claim_Number Claim_Line_Number Dialysis Remove_Claim 1234567 1 Y Y 1234567 2 Y Y 1234567 3 N N 1234567 4 N N 7654321 1 N N 7654321 2 N N 7654321 3 N N 7654321 4 N N 7654321 5 N N 7654321 6 Y Y 2468100 1 Y Y 2468100 2 Y Y 2468100 3 Y Y 3456789 1 Y Y 3456789 2 N N 3456789 3 Y Y 3456789 4 Y Y 3456789 5 N N The output I am trying to get is this: Claim_Number Claim_Line_Number Dialysis Remove_Claim 1234567 1 Y Y 1234567 2 Y Y 1234567 3 N Y 1234567 4 N Y 7654321 1 N Y 7654321 2 N Y 7654321 3 N Y 7654321 4 N Y 7654321 5 N Y 7654321 6 Y Y 2468100 1 Y Y 2468100 2 Y Y 2468100 3 Y Y 3456789 1 Y Y 3456789 2 N Y 3456789 3 Y Y 3456789 4 Y Y 3456789 5 N Y I appreciate any help. Thanks.

JH74 · ‎09-11-2023

Thank you for this. I've ran this and I do get results showing matches. I want to output all matching values into a new dataset (move values from the h1 dataset that match the h2 values). Using the code you provided, is there a step I can add to output the results? Or can I do this in a different way that can output a new dataset that contains the matches?

JH74 · ‎09-08-2023

Hello. I'm not sure if this is even possible but thought I'd check. I have 2 datasets. They both have multiple variables, but the one that is similar between the two are address variables, and those are the variables I need to try and match. This issue is, one set has values such as: 193 FAIRVIEW LN 1006 NUT TREE ROAD 6121 PASEO DEL NORTE 9401 PAINTER AVE 3744 LONG BEACH BLVD 11801 PIERCE ST FL While the other has values similar to this: 193 FAIRVIEW LN STE 100 1006 NUT TREE ROAD APT 5 6121 PASEO DEL NORTE BLDNG 2 9401 PAINTER AVE SUITE 2A 3744 LONG BEACH BLVD SUITE 224 11801 PIERCE ST FL APT 315 I want to create a format from one dataset and use that format on the other data set to create a new variable (say addr_match), where if theres a match, then addr_match = 1. Is there a way to match these variables where if part of the addresses match then it'd be addr_match = 1, or do the values in the address fields need to be exactly the same? There are thousands of rows in each dataset.

JH74 · ‎06-30-2023

Thank you so much for your reply. I neglected to indicate in my original post that the fields are all Character, no date field. So, intck won't work.

JH74 · ‎06-30-2023

I neglected to indicate that the Acct_Nbr and Cont_Mnth fields are CHARACTER, not DATE. Also, the last 3 records I had the wrong Cont_Mnth values. They should be 2022/01, 2022/02, and 2022/03.

JH74 · ‎06-30-2023

Thank you for your reply. And, yes my apologies. The correct example data is: Acct_Nbr Cont_Mnth 6_Mos_Cons 123456 2022/01 Y 123456 2022/02 123456 2022/03 123456 2022/04 123456 2022/05 123456 2022/06 234567 2022/01 N 234567 2022/02 234567 2022/04 234567 2022/07 Y 234567 2022/08 234567 2022/09 234567 2022/10 234567 2022/11 234567 2022/12 345678 2021/10 Y 345678 2021/11 345678 2021/12 345678 2022/01 345678 2022/02 345678 2022/03 So, you can see here why 345678 is a Y. Also, the Cont_Mnth field is not in date format, it's in character format. I believe this will not allow intck to work correctly? But I did run what you suggested and it looks like it's close to working. I got this: Acct_Nbr Cont_Mnth history cont_mnth_lag5 cont_mnth Cons_Mnth_6 123456 2022/01 1 . 123456 2022/02 2 . 123456 2022/03 3 . 123456 2022/04 4 . 123456 2022/05 5 . 123456 2022/06 6 2022/01 . 123456 2022/07 7 2022/02 . 123456 2022/08 8 2022/03 . 123456 2022/09 9 2022/04 . 123456 2022/10 10 2022/05 . 123456 2022/11 11 2022/06 . 123456 2022/12 12 2022/07 . 123456 2022/01 13 2022/08 . 123456 2022/02 14 2022/09 . 123456 2022/03 15 2022/10 . 123456 2022/04 16 2022/11 . 789010 2022/01 1 2022/12 . 789010 2022/02 2 2023/01 .

JH74 · ‎06-30-2023

Hello. I need to identify continuous enrollment of at least 6 months or more for members. The data spans multiple years, and a consecutive 6 months can be from one year to the next. I would like a new variable to be created that has either a Y (6 consecutive months or >), or N (not conscutive 6 months or >) consecutive. In addition, a member can have multiple periods of 6 consecutive months in the plan. Below is how I would like the data to look after running the code to determine 6 consecutive months. The data I have now DOES NOT have the 6_Mos_Cons variable, that's what I want created. Data example is this: Acct_Nbr Cont_Mnth 6_Mos_Cons 123456 2022/01 Y 123456 2022/02 123456 2022/03 123456 2022/04 123456 2022/05 123456 2022/06 234567 2022/01 N 234567 2022/02 234567 2022/04 234567 2022/07 Y 234567 2022/08 234567 2022/09 234567 2022/10 234567 2022/11 234567 2022/12 345678 2021/10 Y 345678 2021/11 345678 2021/12 345678 2022/03 345678 2022/04 345678 2022/05 The Y or N for the created variable can either be on the first line of a consecutive or non-consecutive span of months, or it can be on the last. Also, if it is easier, it can be numeric, ie 1 = consecutive and 0 = non-consecutive.

JH74 · ‎06-14-2023

Thanks for your reply. I have a question about the cards part. I'm not familiar with that. But, I will not know all of the values in the record or date_start field. There are thousands of records in the dataset. Does that impact the cards code where you listed out the values?

JH74 · ‎06-13-2023

The input dataset has 3 total variables. Only these 2 are relevant though, record and date_start.

JH74 · ‎06-13-2023

Thanks for your question. The data can have any number of combinations. The record can be one line or 50 lines. However, the record value is always consistent whether 1 line or 50. Each record can have either the same date_start value, or multiple different date_start values. RECORD Date_Start 1234560 6/1/2023 1234561 6/1/2023 1234561 6/1/2023 1234567 6/1/2023 1234567 6/2/2023 1234568 6/1/2023 1234568 6/2/2023 1234568 6/3/2023 1234569 6/1/2023 1234569 6/2/2023 1234569 6/2/2023 1234569 6/3/2023 1234569 6/3/2023 1234569 6/3/2023 So, for records where the date_start is the same, those need to go to data set A. For records where the date_start varies within the same record, those records need to go to data set B. So I will have 2 data sets. One contains records where the date_start never changes for any unique record. The other contains records where the date_start varies for the unique record. Example of what is needed: Data Set A RECORD Date_Start 1234560 6/1/2023 1234561 6/1/2023 1234561 6/1/2023 Data Set B RECORD Date_Start 1234567 6/1/2023 1234567 6/2/2023 1234568 6/1/2023 1234568 6/2/2023 1234568 6/3/2023 1234569 6/1/2023 1234569 6/2/2023 1234569 6/2/2023 1234569 6/3/2023 1234569 6/3/2023 1234569 6/3/2023

JH74 · ‎06-13-2023

Thanks for this. I have done this and I think I have not made my initial question clear enough. I have to get two separate subsets. One where the subset consists of instances like this: RECORD Date_Start 123456 6/1/2023 123456 6/1/2023 123456 6/1/2023 So, this scenario would go to a new dataset called A. Then instances where it is like this (so one record and multiple start dates for that record): RECORD Date_Start 1234567 6/1/2023 1234567 6/2/2023 1234567 6/3/2023 This would go to a new dataset called B. So, I need to new sets, one that contains instances where one record and consistent dates. Another that contains instances where one record and various dates. I apologize for not being more detailed.

JH74 · ‎06-13-2023

Hello. I'm having trouble with writing code to get a subset of data. Here's what I have: Data looks like this (I have thousands of records, but an example). RECORD Date_Start 123456 6/1/2023 123456 6/1/2023 123456 6/1/2023 I need to find unique values for the record and the date_start. So, for this example, I would want only 1 unique record in a subset to have the record and the date_start, like this: RECORD Date_Start 123456 6/1/2023 I also have this: RECORD Date_Start 1234567 6/1/2023 1234567 6/2/2023 1234567 6/3/2023 For this, I'd expect to have 3 unique records in the new subset, since all are unique with respects to the record and the date_start. I'm relatively new to SAS, so it's a bit of a challenge for me. I've tried sorting the set by record and date_start. Then creating a new variable that's retained, using first. etc, but I'm getting every record of the main data set exported to my subset data set. Any ideas are appreciate.

JH74 · ‎04-26-2023

Hi. I'm trying to create a new variable for later summing to get a total count for unique conditions. My data looks like this: ID from_dt to_dt 123 06JAN2021 06JAN2021 123 13JAN2021 13JAN2021 123 20JAN2021 20JAN2021 123 27JAN2021 27JAN2021 456 03FEB2021 03FEB2021 456 04FEB2021 04FEB2021 789 06MAR2021 06MAR2021 789 06MAR2021 06MAR2021 789 06MAR2021 06MAR2021 My code is this: data services_test; set services; by id from_dt to_dt ; if first.id and first.from_dt and first.to_dt then visits = 1; run; It produces this: ID from_dt to_dt Visits 123 06JAN2021 06JAN2021 1 123 13JAN2021 13JAN2021 . 123 20JAN2021 20JAN2021 . 123 27JAN2021 27JAN2021 . 456 03FEB2021 03FEB2021 1 456 04FEB2021 04FEB2021 . 789 06MAR2021 06MAR2021 1 789 06MAR2021 06MAR2021 . 789 06MAR2021 06MAR2021 . I understand why I'm getting the output I am, but I'm trying to get this result: ID from_dt to_dt Visits 123 06JAN2021 06JAN2021 1 123 13JAN2021 13JAN2021 1 123 20JAN2021 20JAN2021 1 123 27JAN2021 27JAN2021 1 456 03FEB2021 03FEB2021 1 456 04FEB2021 04FEB2021 1 789 06MAR2021 06MAR2021 1 789 06MAR2021 06MAR2021 . 789 06MAR2021 06MAR2021 . I need a visits = 1 for each ID that has a different from/to dates, but I only need a visits = 1 when the ID has the same from/to dates.

Online Status	Offline
Date Last Visited	3 weeks ago

Re: Identify Unique Values in Data Set and Create New Fields

Identify Unique Values in Data Set and Create New Fields

Retain flag for claim id based on dialysis variable

Re: Match two datasets on a partial match

Match two datasets on a partial match

Re: Identify continuous enrollment

Re: Identify continuous enrollment

Re: Identify continuous enrollment

Identify continuous enrollment

Re: Unique records within a group of records

Re: Identify Unique Values in Data Set and Create New Fields

Re: Identify Unique Values in Data Set and Create New Fields

Re: Retain flag for claim id based on dialysis variable

Re: Retain flag for claim id based on dialysis variable

Re: Retain flag for claim id based on dialysis variable

Re: Identify Unique Values in Data Set and Create New Fields

Identify Unique Values in Data Set and Create New Fields

Retain flag for claim id based on dialysis variable

Re: Match two datasets on a partial match

Match two datasets on a partial match

Re: Identify continuous enrollment

Re: Identify continuous enrollment

Re: Identify continuous enrollment

Identify continuous enrollment

Re: Unique records within a group of records

Re: Unique records within a group of records

Re: Unique records within a group of records

Re: Unique records within a group of records

Unique records within a group of records

Assign value for counting