SAS Support Communities

SM1

@ballardw - thanks for your reply. The ultimate objective is to correctly identify records that accurately document demotions across comprehensive employee administrative records as indicated by (1) the change in the pay grade variable (compared to the previous record) and (2) the continuation of the new pay grade variable in subsequent records. Some employees have 2+ demotions across their tenure and some records extend for 15+ years. I previously tried using syntax (see below) that used a character version of the pay grade variable ('pg') to concatenate the value across each encounter and then compress the consecutive occurrences of the same code to a single value, but I couldn't distinguish between employees with good data (like id=23456) and 1-off bad records (like id=12345). proc sort data=have; by id date; run; data want; retain id; length comb $110; do until(last.id); set have; by id pg notsorted; if first.pg then comb=cats(comb, char(pg,1)); end; run;

SM1

Hello, I am working with personnel records for 80,000+ individuals across about 15 years. A minimum of 1 record per month is generated for each employee, but multiple additional records can also be generated during any month. The records include a pay grade variable (coded 0 to 10, with 0=most junior and 10=most senior). I need to accurately identify records that document demotions. As with most administrative records, the data are mostly clean - except when they are not. I've included some sample datalines below that reflect the 3 most common data patterns: 1. a single pay grade change that reflects a demotion (see id=23456) 2. a single pay grade change that appears to reflect a random bad record, but not a demotion (see id=12345) 3. a flurry of records that bounce between 2 pay grades, but finally begin to reflect the same pay grade consistently, which appears to reflect a demotion (see id=34567) I am using a lag variable to identify records where paygrade < lag_paygrade, and the resulting new variable "demotion" currently acts like a general flag for records that require further QA checks. If anyone has a recommendation for how to better differentiate between the 3 patterns of pay grade values, I'd really appreciate the help. Thanks! data have; input id date paygrade; format date date9.; datalines; 23456 01JAN2021 7 23456 01FEB2021 7 23456 01MAR2021 7 23456 01APR2021 7 23456 05APR2021 7 23456 10APR2021 7 23456 15APR2021 6 23456 01MAY2021 6 23456 01JUN2021 6 12345 01JAN2021 3 12345 01FEB2021 3 12345 01MAR2021 3 12345 01APR2021 3 12345 05APR2021 3 12345 10APR2021 2 12345 15APR2021 3 12345 01MAY2021 3 12345 01JUN2021 3 34567 01JAN2021 5 34567 01FEB2021 5 34567 01MAR2021 5 34567 01APR2021 4 34567 05APR2021 5 34567 10APR2021 5 34567 15APR2021 4 34567 01MAY2021 4 34567 01JUN2021 4 run; proc sort data=have; by id date; run; data have1; set have; by id date; lag_paygrade=lag(paygrade); if first.ssn then lag_paygrade=.; if paygrade<lag_paygrade then demotion='yes'; run;

SM1 · ‎12-13-2019

@mkeintz, I couldn't figure out how I had missed so many chronology errors when I reviewed the results generated by @gamotte's syntax, so I re-ran everything. I'm not sure why, but your suggested syntax is generating an incorrect datetime variable (DTME1) while the syntax of datetim=dhms(date,0,0,time) is generating a good var (DateTimeE1). It would seem best to use a more precise approach to creating a datetime variable, but I can't figure out why DTME1 looks so different from DateTimeE1. RRForm RRID DateE1 TimeE1 DTME1 DateTimeE1 2018 1 03NOV2018 9:00 16NOV18:12:00:00 03NOV18:09:00:00 2018 2 18NOV2018 10:00 03DEC18:00:00:00 18NOV18:10:00:00 2018 3 29NOV2018 13:00 18DEC18:12:00:00 29NOV18:13:00:00 2018 4 08DEC2018 19:00 05JAN19:12:00:00 08DEC18:19:00:00 2018 6 09NOV2018 21:30 11DEC18:06:00:00 09NOV18:21:30:00 2018 9 22SEP2018 17:25 18OCT18:03:00:00 22SEP18:17:25:00 2018 10 22SEP2018 17:25 18OCT18:03:00:00 22SEP18:17:25:00 2018 11 22SEP2018 17:25 18OCT18:03:00:00 22SEP18:17:25:00 2018 12 22SEP2018 17:25 18OCT18:03:00:00 22SEP18:17:25:00 2018 13 22SEP2018 17:25 18OCT18:03:00:00 22SEP18:17:25:00 2018 14 22SEP2018 9:50 06OCT18:18:00:00 22SEP18:09:50:00 2018 15 06NOV2018 18:00 03DEC18:00:00:00 06NOV18:18:00:00 2018 16 06NOV2018 18:00 03DEC18:00:00:00 06NOV18:18:00:00 2018 17 16SEP2018 15:00 08OCT18:12:00:00 16SEP18:15:00:00 2018 18 17SEP2018 17:00 12OCT18:12:00:00 17SEP18:17:00:00 2018 19 25SEP2018 12:00 13OCT18:00:00:00 25SEP18:12:00:00 2018 20 27SEP2018 13:36 17OCT18:10:00:00 27SEP18:13:36:00 2018 22 28SEP2018 18:03 25OCT18:02:20:00 28SEP18:18:03:00 2018 23 28SEP2018 18:03 25OCT18:02:20:00 28SEP18:18:03:00

SM1 · ‎12-13-2019

@mkeintz, Your approach did catch a few cases that were not flagged by @gamotte's syntax and that I missed when I was reviewing the earlier results. The event time variables are limited to HHMM, so thank you for suggesting syntax that will better transform the discrete date and time variables into an aggregate datetime var. I really appreciate your assistance. I will be able to reuse this syntax as I continue to work with our historic database. Thanks again!

SM1 · ‎12-13-2019

@gamotte, Thank you! The syntax worked perfectly and it's an elegant solution. I'm assuming there's really no other way to deal with missing date/time values. This will be a really helpful example going forward. Thanks again.

SM1 · ‎12-12-2019

Hello, I'm working with data that includes up to 5 event date & time variables per row. Missing data is an issue. The 5 events should have occurred in chronological order, (e.g., Event1 before Event2 before Event3 . . .). For each event, there may be a SAS date variable only, a SAS time variable only, both variables or missing data. I need to flag observations if the events are not in chronological order. For observations with complete information for each event, this is not a problem. I can't figure out how to work with rows that have missing data. I would like to generate 2 variables: (1) flag if there is good chronological order for the available date & time variables and (b) flag for good chronological order for the available SAS dates only. Any suggestions will be most appreciated. Sample data: ID E1Date E1Time E2Date E2Time E3Date E3Time E4Date E4Time E5Date E5Time 1 05JAN15 1715 05JAN15 2130 05JAN15 2355 06JAN15 0100 06JAN15 0300 2 15MAY11 1300 15MAY11 1523 15MAY11 1633 3 09FEB13 09FEB13 09FEB13 0800 09FEB13 1015 09FEB13 1145 4 18JUN17 18JUN17 18JUN17 data want; data have; length GoodDateTimeChron $8.; GoodDateTimeChron = 'NO'; DateTimeE1 = dhms(DateE1, 0, 0, TimeE1); /*create SAS datetime vars*/ DateTimeE2 = dhms(DateE2, 0, 0, TimeE2); DateTimeE3 = dhms(Date33, 0, 0, TimeE3); DateTimeE4 = dhms(DateE4, 0, 0, TimeD4); DateTimeE5 = dhms(DateE5, 0, 0, TimeE5); format DateTimeE1 DateTimeE2 DateTimeE3 DateTimeE4 DateTimeE5 datetime20.; /*this correctly flags row 1 by chronology, but none of the other rows due to missing data*/ if not missing(DateTimeE1) >=not missing(DateTimeE2) >=not missing(DateTimeE3) >=not missing(DateTimeE4) >=not missing(DateTimeE5) then GoodDateTimeChron = 'YES'; /*this doesn't directly flag for chronology, but it counts the number of distinct date values, including missing as a distinct value */ array event {5} DateE1 DateE2 DateE3 DateE4 DateE5 ; array new {5} $20 _temporary_; do _n_=1 to 5; new{_n_} = event{_n_}; end; call sortc(of new{*}); count = (new{1} > ' '); do _n_=2 to 5; if new{_n_} ne new{_n_-1} then count + 1; end; run;

SM1 · ‎12-21-2017

@s_lassen I ran the suggested syntax; no errors were reported. The PRXMATCH syntax properly flagged the observations (n=14) that met the criteria. The FINDW syntax didn't flag any of the observations (n=0), although the new variable FLAG was created. It seems like there should be a way to specify more than 1 parameter to match without having to specify the universe of acceptable temporal orders for the parameters.

SM1 · ‎12-21-2017

@s_lassen Why is the syntax for PRXMATCH so complicated if you want "match only if 'sub-stringA' and 'sub-stringB' ", rather than "match if 'sub-stringA' or 'sub-stringB' "? The syntax you provided suggests that if the flag criteria requires a match with 2 or more sub-strings, you have to actually specify the order of appearance for each sub-string. If you wanted to match only if wordA, wordB, wordC, and wordD appeared anywhere and in order in a variable, it would be really difficult to use PRXMATCH, right?

SM1 · ‎12-21-2017

@ballardw Some sample data: ID Narrative1 Narrative2 Narrative3 Narrative4 Narrative5 16012 AIRCRAFT CALLED TO ASSIST WITH OVERBOARD SAILOR FROM USS NORMANDY PARALLEL SEARCH PATTERN EMPLOYED TO LOCATE USS NORMANDY SAILOR REPORTED OVERBOARD OVERBOARD SAILOR FROM USS NORMANDY NOT LOCATED N/A AIRCRAFT RETURNED TO BASE Narrative1 and Narrative3 = OVERBOARD appears before NORMANDY Narrative2 = NORMANDY appears before OVERBOARD If I could relax the flag criteria to select if NORMANDY or OVERBOARD, then I know this syntax would work: if prxmatch ("/NORMANDY|OVERBOARD/i",findtxt(i)) then Flag = 1; But I don't know how to write the syntax so that it will match only if NORMANDY and OVERBOARD are in a variable.

SM1 · ‎12-21-2017

@novinosrin I don't think so because I'm trying to identify discrete words (i.e., 'normandy' and 'overboard') that can appear in any order instead of a string with multiple words (e.g., 'sailor fell overboard the USS Normandy'). But I'm happy to be wrong if I can just get the syntax to work properly!

SM1 · ‎12-21-2017

Hello, My data set has 5 narrative text fields that accept open-ended responses. I need to figure out a way to scan through these variables and flag observations that meet specific criteria. For example, I want to create a flag if the words "NORMANDY" and "OVERBOARD" both appear in any order in at least 1 of the 5 narrative text fields. Unsuccessful syntax: data want; set have; array findtxt (5) narrative1 narrative2 narrative3 narrative4 narrative5; do i = 1 to 5; if prxmatch ("/(NORMANDY)(OVERBOARD)/i",findtxt(i)) then Flag = 1; end; drop i; run; Sort of successful syntax: data want; set have; array findtxt (5) narrative1 narrative2 narrative3 narrative4 narrative5; do i = 1 to 5; if prxmatch ("/NORMANDY.*OVERBOARD/i",findtxt(i)) then Flag = 1; /*only flags if 'NORMANDY' comes before 'OVERBOARD'*/ end; drop i; run; Since I want to flag each observation that has 'NORMANDY' and 'OVERBOARD' in any of the 5 narrative fields, including those where 'OVERBOARD' appears before 'NORMANDY', this syntax won't work. Any suggestions for how to write the PRXMATCH function correctly? Thanks!

SM1 · ‎12-13-2017

@Reezaand @Astounding Thanks for your comments. Yes, I tried to simplify my syntax and typed it (rather than cutting and pasting). The variable names are not descriptive and quickly become confusing. My actual syntax (with additional comments) follows: data rr_rptsernum; set rr_srutype; length rptsernumA rptsernum1 $15.; /*added length statements as suggested*/ length rptsernum1a rptsernum1b rptsernum2 $10.; length rptsernum3a rptsernum3b RRnum3 $5.; rptsernum = compress(rptsernum,'-)','C'); rptsernumA = tranwrd(rptsernum,' (','/');/*replace ' (' with '/'*/ rptsernum1 = tranwrd(rptsernumA,'XX','999'); rptsernum1a = substr(rptsernum1,1,2); /*retains only initial 2 digits = year*/ rptsernum1b = substr(rptsernum1,3); /*retains everything else (rpt number and extra text)*/ rptsernum2 = scan(rptsernum1b,1,' '); /*deletes everything that follows a space*/ rptsernum2a = anyalpha(rptsernum2); rptsernum2b = anypunct(rptsernum2); rptsernum2c = anydigit(rptsernum2); if rptsernum2a = 0 and rptsernum2b = 0 then rptsernum3a = rptsernum2; else if rptsernum2a gt 0 and rptsernum2b eq 0 then rptsernum3a = substr(rptsernum2,rptsernum2c,rptsernum2a-rptsernum2c); else if rptsernum2a eq 0 and rptsernum2b gt 0 then rptsernum3a = substr(rptsernum2,rptsernum2c,rptsernum2b-rptsernum2c); else if rptsernum2a gt 0 and rptsernum2b gt 0 then rptsernum3a = substr(rptsernum2,rptsernum2c,rptsernum2b-rptsernum2c); if rptsernum2a = 0 and rptsernum2b = 0 then rptsernum3b = ' '; else if rptsernum2a gt 0 and rptsernum2b eq 0 then rptsernum3b = substr(rptsernum2,rptsernum2a,3); else if rptsernum2a eq 0 and rptsernum2b gt 0 then rptsernum3b = substr(rptsernum2,rptsernum2b+1,3); /*added '+1' as suggested*/ else if rptsernum2a gt 0 and rptsernum2b gt 0 then rptsernum3b = substr(rptsernum2,rptsernum2b+1,3); /*added '+1' as suggested*/ RRnum1 = input(rptsernum1a, 4.); RRnum2 = input(rptsernum3a, 4.); RRnum3 = rptsernum3b; run; Including the length statements and adding '+1' to account for the punctuation were both required. The syntax now properly divides the original variable into the required component parts. Thanks so much for your assistance! SM1

SM1 · ‎12-12-2017

Hello, I need to divide a string variable into several component parts. All new variables will be string variables. Sample data provided below: OriginalVar Component1 Component2 Component3 1702 17 02 1627M 16 27 M 1628SD 16 28 SD 18003 18 003 1730S3 17 30 S3 1704/1 17 04 1 I used the following syntax: data want; data have; Component1 = substr(originalvar,1,2); /*this variable is good*/ Rest=substr(originalvar,3); /*this variable contains everything from position 3 to end of OriginalVar*/ restAlpha=anyalpha(rest); /*identify position of 1st alpha in "REST"*/ restPunct=anypunct(rest); /*identify position of 1st punctuation in "REST"*/ restDigit=anydigit(rest); /*identify position of 1st digit in "REST"*/ if restAlpha = 0 and restPunct = 0 then Component2 = Rest;/*this syntax correctly creates Component2*/ else if restAlpha gt 0 and restPunct eq 0 then Component2 = substr(Rest,restDigit,restAlpha-restDigit); else if restAlpha eq 0 and restPunct gt 0 then Component2 = substr(Rest,restDigit,restPunct-restDigit); if restAlpha = 0 and restPunct = 0 thenComponent3 = ' '; /*Component3 is not properly generated*/ else if restAlpha gt 0 and restPunct eq 0 then Component3 = substr(Rest,restAlpha,3); else if restAlpha eq 0 and restPunct gt 0 then Component3 = substr(Rest,restPunct,3); end; This syntax generates the following: OriginalVar Component1 Component2 Component3 1702 17 02 1627M 16 27 M 1628SD 16 28 S /*this is wrong - Component3 should = 'SD'*/ 18003 18 003 1730S3 17 30 S /*this is wrong - Component3 should = 'S3'*/ 1704/1 17 04 / /*this is wrong - Component3 should = '1'*/ Any suggestions for how to fix my syntax? Thanks so much!

SM1 · ‎09-11-2017

@Reeza Your syntax required a slight tweak to work: minute1=INTCK('minute',dhms(event1_date, 0, 0, event1_time), dhms(event2_date, 0, 0, event2_time)); Thanks for the help!

SM1 · ‎09-11-2017

@Reeza I did check the docs and didn't find that interval listed. Subtracting and then dividing by 60 should work. I'll do that now. Thanks again!

Online Status	Offline
Date Last Visited	yesterday

SAS Support Communities

Re: Differentiate between potential errors versus good values in a lon...

Differentiate between potential errors versus good values in a long ta...

Re: Check chronology of multiple event dates within a row?

Re: Check chronology of multiple event dates within a row?

Re: Check chronology of multiple event dates within a row?

Check chronology of multiple event dates within a row?

Re: Use PRXMATCH function to determine if multiple words are in a stri...

Re: Use PRXMATCH function to determine if multiple words are in a stri...

Re: Use PRXMATCH function to determine if multiple words are in a stri...

Re: Use PRXMATCH function to determine if multiple words are in a stri...

Re: Check chronology of multiple event dates within a row?

Re: Check chronology of multiple event dates within a row?

Re: Calculate difference in minutes for events to confirm chronologica...

Impose additional constraints on PRXMATCH user-defined character class...

Re: Differentiate between potential errors versus good values in a lon...

Differentiate between potential errors versus good values in a long ta...

Re: Check chronology of multiple event dates within a row?

Re: Check chronology of multiple event dates within a row?

Re: Check chronology of multiple event dates within a row?

Check chronology of multiple event dates within a row?

Re: Use PRXMATCH function to determine if multiple words are in a stri...

Re: Use PRXMATCH function to determine if multiple words are in a stri...

Re: Use PRXMATCH function to determine if multiple words are in a stri...

Re: Use PRXMATCH function to determine if multiple words are in a stri...

Use PRXMATCH function to determine if multiple words are in a string?

Re: Divide a string variable into sub-components

Divide a string variable into sub-components

Re: Calculate difference in minutes for events to confirm chronologica...

Re: Calculate difference in minutes for events to confirm chronologica...

Follow Us

What is...