About A_Swoosh

A_Swoosh · ‎02-20-2021

You can achieve this by using a compress function to eliminate non-numeric values. want= compress(string,'','A');

A_Swoosh · ‎02-19-2021

Hello, I am trying to iterate through my variables and assign a value of 0 based on the previous variable that corresponds to that quarter and year. Here is a snippet of the data: data have; infile datalines4 dlm='|' missover dsd; input id q1_2018_units q1_2018_stdunits q2_2018_units q2_2018_stdunits q3_2018_units q3_2018_stdunits q4_2018_units q4_2018_stdunits; datalines; 1|214252|214252|149115||50000|50000|0|| 2|12||0||50000||0|| 3|21252|21252|||5000|.|0|| 4|214252|.|0||5000|5000|0|| 5|2252|.|14115||500|500|0|| ; run; I want to assign a value of 0 to stdunits based on if there is a 0 in the units column for each corresponding quarter. That way I can pick out the 0 cases and compute a stdunit for those that aren't 0. To note, I have other quarters and year combinations but if I can apply to this subset then I can apply elsewhere. For ID 2, there should be a . for q1_2018_stdunits and q3_2018_stdunits but a 0 value for q2_2018_stdunits and q4_2018_stdunits. I was thinking of creating a var list, like below but for each quarter/year combination, to separate quarters to make it easier to go through? /*GET LIST OF UNIT VARIABLES*/ PROC SQL NOPRINT; CREATE TABLE VAR_NAMES AS SELECT NAME FROM DICTIONARY.COLUMNS WHERE LIBNAME = 'WORK' AND MEMNAME = 'HAVE' AND NAME CONTAINS '_units'; SELECT COMPRESS(NAME) INTO :VAR_LIST SEPARATED BY " " FROM VAR_NAMES; SELECT COUNT(NAME) INTO :NUM SEPARATED BY " " FROM VAR_NAMES; QUIT; I think the next step would be to create an array and do loop but I'm not sure what the parameters are and what type of do statement? I also saw whichn but I have no experience with that function. data want; set have; ARRAY X[*] &VAR_LIST; array Y[*] &VAR_LIST1; ??? I then want to iterate through each stdunit to drop out cases where stdunit is missing so I can just examine id's with missing stdunits that require computation. Any help would be much appreciated...

A_Swoosh · ‎08-08-2020

I was trying to present another example so I'm clear about the syntax involved with perl expressions since I'm new to those expressions. If I have another dataset where I'm trying to identify data sample; infile datalines dlm="|" missover dsd; input CATEGORY $ ID $; datalines; Physician|A123242 |0000000 PS220|A123456 run; Case 3 has the proper format for each variable while case 2 has both wrong, and case 1 has CATEGORY wrong.

A_Swoosh · ‎08-07-2020

Yes, this is exactly what I'm trying to accomplish. Thank you for the help. I don't have much experience with Perl Regular Expressions but is there a way to find a series of values that is the length of the variable and if it matches then identify as bad? I want to match repeating but instead of n it's the length of the variable? I also want to identify an ID that is not alphanumeric. For example the ID is a description instead of the A235324. Case 1: ID: 0000000. Case 2: ID: Physician Case 3: ID: A235324

A_Swoosh · ‎08-06-2020

Sorry, my apologies. It was a sample and I forgot to change the length. It should be 2 and 1, respectively.

A_Swoosh · ‎08-05-2020

Hi SAS Community, I am importing pipe delimited .txt and .csv files into SAS and want to run a series of checks: 1. Count the number of records 2. Count the number of distinct records for ID 3. Whether the variables match a list of set variables 4. Are the values expected or in the correct format Is there a way to output 1 dataset with all this information? As it currently stands,I produce these results into a text file that are the results of multiple proc frequencies and proc sql. These are my expected variables: data VARS; input vars $; datalines; ID BDAY GENDER ADDRESS1 ADDRESS2 CITY STATE ZIP ZIP4 COUNTY_CODE ; run; This is my sample data: data sample; infile datalines dlm="|" missover dsd; input ID $2 BDAY $10 GENDER $ ADDRESS1 $ ADDRESS2 $10. CITY $ STATE $ ZIP $ ZIP4 $ COUNTY_CODE $; datalines; A1|20200420|M|123 Main St.|Suite 201|Juneau|AK|99802||02112 B2|4/20/2020|Male|124 Main St.||Juneau|AK|99802|Juneau C3|4/20/2020|M|125 Main St.||Juneau|AK|99802||02112 4-1|20200420|Male|126 Main St.|Suite 101|Juneau|AK|99802||Juneau ; run; Conditions: ID:should be alpha numeric and 2 characters BDAY: should be YYYYMMDD format Gender: should be 1 characters Address1 and Address 2 should be character City should character State should be 2 characters zip should be 5 characters zip4 should be no more than 4 characters county should be the fips code which is 5 characters Results: Line 1: accurate and meets all criteria Line 2: dob is not in correct format, gender is not 2 characters, does not have zip4 variable, county does not have 5 character Line 3: dob is not in correct format Line 4: id is not alpha numeric with no special characters, and gender is not 2 characters So far I have: /*Variables match list? */ proc sql; create table plan_vars as select strip(upcase(name)) as vars from sashelp.vcolumn where libname='WORK' and memname =SAMPLE; quit; proc sql; create table comparevar_&file. as select a.*, b.*, case when a.vars = b.varsm then 'Match' else 'No' end as var_match from VARS as a full join plan_vars(rename=(vars=varsm)) as b on a.vars=b.varsm; quit; /*output records */ ods rtf file="&output.\FileReview.rtf"; ods noptitle; options nodate nonumber; proc sql; create table ctrltot_&file. as select count(*) as Total_number_of_records, count(distinct ID) as Total_number_of_unique_ID, count (ID) as tot_ID from sample; quit; proc freq data = sample; tables _all_; format _numeric_ _character_ $miss.; run; proc freq data = ctrltot_&file.; tables _all_; run; proc freq data=comparevar_&file.; tables _all_; where var_match = 'No'; run;

A_Swoosh · ‎07-28-2020

I'm not sure if I'm on the right track but this is what I have so far: %let mstart = 01JAN; %let mend = 31DEC; %let ystart = 2019; %let yend = 2019; proc sort data = history out = cases (keep = patid LOB aid PLAN StartDate EndDate eligible); by patid StartDate EndDate; where EndDate > "&mstart.&ystart."d and StartDate < "&mend.&yend."d and LOB = "MC" and (intnx('year',DOB,18,'same')) > "&mstart.&ystart."d and eligible=1 run; data cases_test; retain gap_count last_start service_end service_began official_start official_end; set cases; by patid; format last_start service_end service_began official_start official_end date9.; if first.patidthen do; if StartDate >= "&mstart.&ystart."d then service_began = StartDate; else service_began = "&mstart.&ystart."d; if EndDate <= "&mend.&yend."d then service_end = EndDate; else service_end = "&mend.&yend."d; if StartDate >= "&mstart.&ystart."d then last_start=StartDate; else last_start = "&mstart.&ystart."d; gap_count=0; end; else do; if StartDate < service_began then do; if StartDate >= "&mstart.&ystart."d then service_began = StartDate; else service_began = "&mstart.&ystart."d; end; else service_began = service_began; if EndDate > service_end then do; if EndDate <= "&mend.&yend."d then service_end = EndDate; else service_end = "&mend.&yend."d; end; else service_end=service_end; if intck('day',EndDate,last_start) > 0 then gap_count=gap_count+(intck('day',EndDate,last_start)-1); else gap_count=gap_count; if StartDate >= "&mstart.&ystart."d then last_start=StartDate; else last_start = "&mstart.&ystart."d; end; official_start="&mstart.&ystart."d; official_end="&mend.&yend."d; run; I think from here I may be able to use a lag from the previous row then create gap variables to count the different gaps. Not sure if there is a better approach...

A_Swoosh · ‎07-28-2020

@mkeintz 1. 9999 end date means it’s current and basically has no gaps. It extends beyond the measurement year. 2. A gap is continuously enrolled for the entire calendar year with no more than 1 gap of 45 days. Edit: I corrected the instance for patient 2. They have 2 gaps; one from May to June and another from 12/17-12/31 3. Plan is only needed for this instance; The plan they begin the measurement year with (2019). Any breaks of more than 45 days (eg row 2 for patient 1) would be considered a gap. So for patient 1, they would be in the case group because their aid is LIS, they are eligible and they have 1 gap (row 2 equaling less than 45 days).

A_Swoosh · ‎07-27-2020

Hi all, I've seen other threads in the SAS programming community that address the issue of continuous enrollment with various lags. I wanted to present a more complex request involving additional variables aside from date. I have a subset example of a sample dataset as follows: data History; input patid plan lob $ aid $ eligible start end; informat start end mmddyy10.; format start end date9.; cards; 1 1 MC LIS 1 10/1/2019 12/31/9999 1 2 MC LIS 1 9/1/2019 9/30/2019 1 1 MC LIS 1 10/1/2018 08/31/2019 1 . FFS LIS 1 4/1/2018 12/31/9999 1 1 MC LIS 1 4/1/2018 9/30/2018 1 . FFS noLIS 0 8/1/2015 3/31/2018 1 1 MC noLIS 0 8/1/2015 3/31/2018 2 . FFS LIS 1 12/18/2019 12/31/9999 2 3 MC LIS 1 6/1/2019 12/17/2019 2 . FFS LIS 1 5/1/2019 12/31/9999 2 3 MC LIS 1 8/1/2018 4/30/2019 2 3 MC LIS 1 7/1/2018 7/31/2018 2 . FFS LIS 1 5/1/2018 4/30/2019 3 2 MC noLIS 0 8/1/2018 12/31/9999 3 2 MC noLIS 0 8/1/2017 7/31/2018 3 . FFS noLIS 0 5/1/2017 12/31/9999 3 . FFS noLIS 0 12/1/2012 4/30/2017 3 2 MC noLIS 0 12/1/2012 4/30/2017 ; run; This dataset has: 1) patient id 2) Plan 3) Line of Business (e.g. Medicare FFS or Managed Care (MC) 4) Aid (LIS vs. noLIS) 5) Eligibility Criteria 6) Start Date 7) End Date Want: I want to create 2 datasets; continuously enrolled for the calendar year with no more than 1 gap of 45 days By aid (LIS)--like a case vs control with the aid category being the identifier Results: Patient 1: Would fall into the case group; has only 1 gap of 30 days so it is included Patient 2: Would be removed since they have 2 gaps in enrollment Patient 3: Would fall into the control group; no gap and has noLIS

A_Swoosh · ‎07-21-2020

ProvCAT is the final value that I want to obtain which is currently blank in dataset 2. The ProvCAT value would be derived using a combination of Provider Specialty (ProvSpec1--ProvSpec3) and ProvType. The first block of information is a large dataset with: 1. All my possible values found in my data (my reference) 2. The valuetype (the category it applies to--i.e., ProvType variable or ProvSpec variable in the corresponding dataset) 3. The program it applies to (crosswalk type) 4. And, the conditional logic I want to end up applying to derive my ProvCat Step 1: I want to take my values (VALUE) and apply that to my actual dataset values for the corresponding variables (Valuetype in Dataset1 which corresponds to ProvType/ProvSpec array in Dataset2). Using this list of values, I want to match them together or assign a format to show they meet. Step 2: I want to then create a flag of 1/0 when/if they meet those conditions. So, if the taxonomy value falls in that list, then flag. Same for ProvType, and same for ProvSpec. Go through my list of ProvSpec1-ProvSpec3 and if it contains a value in that set, flag. This is why I was thinking to create a separate flag for each variable type (ProvType,Tax,Spec). Step 3: Then I will create a conditional logic using the conditional string portion to create my ProvCAT. So for example, if Type_Spec=1 or Type_Prov=1 then ProvCAT=Hospital

A_Swoosh · ‎07-21-2020

A_Swoosh · ‎07-07-2020

Hi Tom, Thank you for your response. You are right; I am having trouble communicating the problem. Step 1 and 2 are no issues. I am able to generate a list of variable names, review the list, and identify the renaming that is required as seen below. Dataset 1: Dataset 2: data new; set contents_&file.; if upcase(name) in ('ADDRESS1' 'ADDRESS2' 'BUSNAME' 'CITY' 'COUNTY' 'FNAME' 'LNAME' 'MI' 'STATE' 'ZIP' 'ZIP4') then basename='Prov'||name; else basename=name; run; I want to rename txnmy_cd and spec_cd using a numeric suffix since they have more than one instance of the variable beginning with: Taxonomy, Taxonomy2, Taxonomy3, etc. ProvSpec1, ProvSpec2, ProvSpec3, etc. I also want to remove the underscore from prov_type also. In addition, I am trying to remove those weird naming conventions in dataset 2 (e.g. CountyA_B, PCP_flagD) to follow the format of dataset 1. I was considering the use of index function to do this. I'm not sure if there is a more efficient way. Also, dataset 2 is slightly different so the code listed above won't work on that dataset. Using this function below, I don't know if this quite works for my address and zip variables. basename = prxchange('s/\d*$//',-1,trim(name)); I think once I am able to produce a column right next to the original name column, I can use proc sql to rename the column variables as the new column variables.

A_Swoosh · ‎07-05-2020

Have: Dataset 1: ProvID NPI Txnmy_cd1 Txnmy_cd2 Spec_cd1 Spec_cd2 Address1 Address2 City State Zip Zip4 Prov_type Pcp_flg Dataset 2: ProvID NPI txnmy_cd txnmy_cd2 Spec_cd Spec_cd1 Address1 Address2 City State Zip Zip4 Prov_type Pcp_flg and so on... (all datasets follow similar structure) Want: Dataset 1: ProvID NPI Taxonomy Taxonomy2 ProvSpec1 ProvSpec2 ProvAddress1 ProvAddress2 ProvCity ProvState ProvZip ProvZip4 Provtype PCP_flg Dataset 2: ProvID NPI Taxonomy Taxonomy2 ProvSpec1 ProvSpec2 ProvAddress1 ProvAddress2 ProvCity ProvState ProvZip ProvZip4 Provtype pcp_flg I'm attempting to make this data driven as much as possible but to avoid using informat, format, or manually renaming statements since future years may have more datasets and/or variables or the layout may change. From what I was told my the person overlooking the project is that I should consider a counter variable for the taxonomy and spec_cd, an index with a combined if/else statement for address variables, and then a rename structure (proc sql) to rename the variables.

A_Swoosh · ‎07-02-2020

There is some pattern with the names. Taxonomy codes and spec_cd both either start with txnmy_cd or txnmy_cd1 and spec_cd or spec_cd1 and continue for n+1. underscores should be removed for everything Some variables are not changed at all (e.g. ID, PIN) other variables (address) have a prefix 'Prov' There will be future iterations of the process which may include new datasets but they will follow similar naming conventions.

A_Swoosh · ‎07-02-2020

While the solutions posted previously in my thread and suggested here by Patrick and Shmuel were great, I was advised from the person looking over my data that I should avoid manually creating a list of rename, manually creating an array listing out my variables, and/or creating a format. Instead, I was advised to approach the renaming of my variables differently: Create a counter for my variables where they start with either txnmy_cd/txnmy_cd1 and go on to n+1. Create a rename structure to provide a prefix to other variables (e.g. Address1 to ProvAddress1) and renaming other instances like Prov_type to Provtype. I think this can be accomplished with either a combination of proc contents to obtain a variable list and/or proc sql dictionary columns to obtain my list of variables. The issue I encounter is creating a counter then do loop after I run my proc contents or dictionary.columns approach.

Online Status	Offline
Date Last Visited	‎08-24-2022 04:15 AM

Re: Harmonizing Address

Harmonizing Address

Scan across array and count into new column

Re: Keep based on gaps

Re: Keep based on gaps

Re: Keep based on gaps

Re: Keep based on gaps

Re: Keep based on gaps

Keep based on gaps

Re: Insert new rows into data based on existing rows

Re: Harmonizing Address

Re: Harmonizing Address

Re: Harmonizing Address

Re: Scan across array and count into new column

Re: Keep based on gaps

Harmonizing Address

Re: Breaking apart addresses

Re: take only numeric (numbers) from a string

Re: take only numeric (numbers) from a string

Iterative Array Do loop based on quarters

Re: File Review: check valid values

Re: File Review: check valid values

Re: File Review: check valid values

File Review: check valid values

Re: Continuous enrollment gap by multiple categories

Re: Continuous enrollment gap by multiple categories

Continuous enrollment gap by multiple categories

Re: proc format cntlin from a raw dataset

proc format cntlin from a raw dataset

Re: Changing variable names across datasets

Re: Changing variable names across datasets

Re: Changing variable names across datasets

Re: Changing variable names across datasets