yesterday
SM1
Obsidian | Level 7
Member since
12-16-2015
- 57 Posts
- 2 Likes Given
- 0 Solutions
- 5 Likes Received
-
Latest posts by SM1
Subject Views Posted 264 3 weeks ago 386 3 weeks ago 3485 12-13-2019 03:08 PM 3486 12-13-2019 02:36 PM 3496 12-13-2019 01:20 PM 3571 12-12-2019 04:32 PM 13792 12-21-2017 05:20 PM 13797 12-21-2017 04:59 PM 13809 12-21-2017 04:38 PM 13832 12-21-2017 03:47 PM -
Activity Feed for SM1
- Posted Re: Differentiate between potential errors versus good values in a long table? on SAS Programming. 3 weeks ago
- Posted Differentiate between potential errors versus good values in a long table? on SAS Programming. 3 weeks ago
- Posted Re: Check chronology of multiple event dates within a row? on SAS Programming. 12-13-2019 03:08 PM
- Posted Re: Check chronology of multiple event dates within a row? on SAS Programming. 12-13-2019 02:36 PM
- Liked Re: Check chronology of multiple event dates within a row? for mkeintz. 12-13-2019 02:36 PM
- Posted Re: Check chronology of multiple event dates within a row? on SAS Programming. 12-13-2019 01:20 PM
- Liked Re: Check chronology of multiple event dates within a row? for gamotte. 12-13-2019 01:20 PM
- Posted Check chronology of multiple event dates within a row? on SAS Programming. 12-12-2019 04:32 PM
- Posted Re: Use PRXMATCH function to determine if multiple words are in a string? on SAS Programming. 12-21-2017 05:20 PM
- Posted Re: Use PRXMATCH function to determine if multiple words are in a string? on SAS Programming. 12-21-2017 04:59 PM
- Posted Re: Use PRXMATCH function to determine if multiple words are in a string? on SAS Programming. 12-21-2017 04:38 PM
- Posted Re: Use PRXMATCH function to determine if multiple words are in a string? on SAS Programming. 12-21-2017 03:47 PM
- Posted Use PRXMATCH function to determine if multiple words are in a string? on SAS Programming. 12-21-2017 03:35 PM
- Posted Re: Divide a string variable into sub-components on SAS Programming. 12-13-2017 01:28 PM
- Posted Divide a string variable into sub-components on SAS Programming. 12-12-2017 07:24 PM
- Got a Like for Re: Calculate difference in minutes for events to confirm chronological order?. 09-11-2017 05:27 PM
- Posted Re: Calculate difference in minutes for events to confirm chronological order? on SAS Data Management. 09-11-2017 05:24 PM
- Posted Re: Calculate difference in minutes for events to confirm chronological order? on SAS Data Management. 09-11-2017 04:59 PM
- Posted Re: Calculate difference in minutes for events to confirm chronological order? on SAS Data Management. 09-11-2017 04:47 PM
- Posted Calculate difference in minutes for events to confirm chronological order? on SAS Data Management. 09-11-2017 04:20 PM
-
Posts I Liked
-
My Liked Posts
Subject Likes Posted 2 09-11-2017 05:24 PM 2 08-21-2017 08:37 PM
3 weeks ago
@ballardw - thanks for your reply. The ultimate objective is to correctly identify records that accurately document demotions across comprehensive employee administrative records as indicated by (1) the change in the pay grade variable (compared to the previous record) and (2) the continuation of the new pay grade variable in subsequent records. Some employees have 2+ demotions across their tenure and some records extend for 15+ years. I previously tried using syntax (see below) that used a character version of the pay grade variable ('pg') to concatenate the value across each encounter and then compress the consecutive occurrences of the same code to a single value, but I couldn't distinguish between employees with good data (like id=23456) and 1-off bad records (like id=12345). proc sort data=have; by id date; run;
data want;
retain id;
length comb $110;
do until(last.id);
set have;
by id pg notsorted;
if first.pg then comb=cats(comb, char(pg,1));
end;
run;
... View more
3 weeks ago
Hello, I am working with personnel records for 80,000+ individuals across about 15 years. A minimum of 1 record per month is generated for each employee, but multiple additional records can also be generated during any month. The records include a pay grade variable (coded 0 to 10, with 0=most junior and 10=most senior). I need to accurately identify records that document demotions. As with most administrative records, the data are mostly clean - except when they are not. I've included some sample datalines below that reflect the 3 most common data patterns: 1. a single pay grade change that reflects a demotion (see id=23456) 2. a single pay grade change that appears to reflect a random bad record, but not a demotion (see id=12345) 3. a flurry of records that bounce between 2 pay grades, but finally begin to reflect the same pay grade consistently, which appears to reflect a demotion (see id=34567) I am using a lag variable to identify records where paygrade < lag_paygrade, and the resulting new variable "demotion" currently acts like a general flag for records that require further QA checks. If anyone has a recommendation for how to better differentiate between the 3 patterns of pay grade values, I'd really appreciate the help. Thanks! data have;
input id date paygrade; format date date9.;
datalines;
23456 01JAN2021 7
23456 01FEB2021 7
23456 01MAR2021 7
23456 01APR2021 7
23456 05APR2021 7
23456 10APR2021 7
23456 15APR2021 6
23456 01MAY2021 6
23456 01JUN2021 6
12345 01JAN2021 3
12345 01FEB2021 3
12345 01MAR2021 3
12345 01APR2021 3
12345 05APR2021 3
12345 10APR2021 2
12345 15APR2021 3
12345 01MAY2021 3
12345 01JUN2021 3
34567 01JAN2021 5
34567 01FEB2021 5
34567 01MAR2021 5
34567 01APR2021 4
34567 05APR2021 5
34567 10APR2021 5
34567 15APR2021 4
34567 01MAY2021 4
34567 01JUN2021 4
run;
proc sort data=have; by id date; run;
data have1; set have;
by id date;
lag_paygrade=lag(paygrade); if first.ssn then lag_paygrade=.;
if paygrade<lag_paygrade then demotion='yes';
run;
... View more
12-13-2019
03:08 PM
@mkeintz, I couldn't figure out how I had missed so many chronology errors when I reviewed the results generated by @gamotte's syntax, so I re-ran everything. I'm not sure why, but your suggested syntax is generating an incorrect datetime variable (DTME1) while the syntax of datetim=dhms(date,0,0,time) is generating a good var (DateTimeE1). It would seem best to use a more precise approach to creating a datetime variable, but I can't figure out why DTME1 looks so different from DateTimeE1. RRForm RRID DateE1 TimeE1 DTME1 DateTimeE1 2018 1 03NOV2018 9:00 16NOV18:12:00:00 03NOV18:09:00:00 2018 2 18NOV2018 10:00 03DEC18:00:00:00 18NOV18:10:00:00 2018 3 29NOV2018 13:00 18DEC18:12:00:00 29NOV18:13:00:00 2018 4 08DEC2018 19:00 05JAN19:12:00:00 08DEC18:19:00:00 2018 6 09NOV2018 21:30 11DEC18:06:00:00 09NOV18:21:30:00 2018 9 22SEP2018 17:25 18OCT18:03:00:00 22SEP18:17:25:00 2018 10 22SEP2018 17:25 18OCT18:03:00:00 22SEP18:17:25:00 2018 11 22SEP2018 17:25 18OCT18:03:00:00 22SEP18:17:25:00 2018 12 22SEP2018 17:25 18OCT18:03:00:00 22SEP18:17:25:00 2018 13 22SEP2018 17:25 18OCT18:03:00:00 22SEP18:17:25:00 2018 14 22SEP2018 9:50 06OCT18:18:00:00 22SEP18:09:50:00 2018 15 06NOV2018 18:00 03DEC18:00:00:00 06NOV18:18:00:00 2018 16 06NOV2018 18:00 03DEC18:00:00:00 06NOV18:18:00:00 2018 17 16SEP2018 15:00 08OCT18:12:00:00 16SEP18:15:00:00 2018 18 17SEP2018 17:00 12OCT18:12:00:00 17SEP18:17:00:00 2018 19 25SEP2018 12:00 13OCT18:00:00:00 25SEP18:12:00:00 2018 20 27SEP2018 13:36 17OCT18:10:00:00 27SEP18:13:36:00 2018 22 28SEP2018 18:03 25OCT18:02:20:00 28SEP18:18:03:00 2018 23 28SEP2018 18:03 25OCT18:02:20:00 28SEP18:18:03:00
... View more
12-13-2019
02:36 PM
@mkeintz, Your approach did catch a few cases that were not flagged by @gamotte's syntax and that I missed when I was reviewing the earlier results. The event time variables are limited to HHMM, so thank you for suggesting syntax that will better transform the discrete date and time variables into an aggregate datetime var. I really appreciate your assistance. I will be able to reuse this syntax as I continue to work with our historic database. Thanks again!
... View more
12-13-2019
01:20 PM
@gamotte, Thank you! The syntax worked perfectly and it's an elegant solution. I'm assuming there's really no other way to deal with missing date/time values. This will be a really helpful example going forward. Thanks again.
... View more
12-12-2019
04:32 PM
Hello, I'm working with data that includes up to 5 event date & time variables per row. Missing data is an issue. The 5 events should have occurred in chronological order, (e.g., Event1 before Event2 before Event3 . . .). For each event, there may be a SAS date variable only, a SAS time variable only, both variables or missing data. I need to flag observations if the events are not in chronological order. For observations with complete information for each event, this is not a problem. I can't figure out how to work with rows that have missing data. I would like to generate 2 variables: (1) flag if there is good chronological order for the available date & time variables and (b) flag for good chronological order for the available SAS dates only. Any suggestions will be most appreciated. Sample data: ID E1Date E1Time E2Date E2Time E3Date E3Time E4Date E4Time E5Date E5Time 1 05JAN15 1715 05JAN15 2130 05JAN15 2355 06JAN15 0100 06JAN15 0300 2 15MAY11 1300 15MAY11 1523 15MAY11 1633 3 09FEB13 09FEB13 09FEB13 0800 09FEB13 1015 09FEB13 1145 4 18JUN17 18JUN17 18JUN17 data want;
data have;
length GoodDateTimeChron $8.;
GoodDateTimeChron = 'NO';
DateTimeE1 = dhms(DateE1, 0, 0, TimeE1); /*create SAS datetime vars*/
DateTimeE2 = dhms(DateE2, 0, 0, TimeE2);
DateTimeE3 = dhms(Date33, 0, 0, TimeE3);
DateTimeE4 = dhms(DateE4, 0, 0, TimeD4);
DateTimeE5 = dhms(DateE5, 0, 0, TimeE5);
format DateTimeE1 DateTimeE2 DateTimeE3 DateTimeE4 DateTimeE5 datetime20.;
/*this correctly flags row 1 by chronology, but none of the other rows due to missing data*/
if not missing(DateTimeE1) >=not missing(DateTimeE2) >=not missing(DateTimeE3) >=not missing(DateTimeE4) >=not missing(DateTimeE5) then GoodDateTimeChron = 'YES';
/*this doesn't directly flag for chronology, but it counts the number of distinct date values, including missing as a distinct value */
array event {5} DateE1 DateE2 DateE3 DateE4 DateE5 ;
array new {5} $20 _temporary_;
do _n_=1 to 5;
new{_n_} = event{_n_};
end;
call sortc(of new{*});
count = (new{1} > ' ');
do _n_=2 to 5;
if new{_n_} ne new{_n_-1} then count + 1;
end;
run;
... View more
12-21-2017
05:20 PM
@s_lassen I ran the suggested syntax; no errors were reported. The PRXMATCH syntax properly flagged the observations (n=14) that met the criteria. The FINDW syntax didn't flag any of the observations (n=0), although the new variable FLAG was created. It seems like there should be a way to specify more than 1 parameter to match without having to specify the universe of acceptable temporal orders for the parameters.
... View more
12-21-2017
04:59 PM
@s_lassen Why is the syntax for PRXMATCH so complicated if you want "match only if 'sub-stringA' and 'sub-stringB' ", rather than "match if 'sub-stringA' or 'sub-stringB' "? The syntax you provided suggests that if the flag criteria requires a match with 2 or more sub-strings, you have to actually specify the order of appearance for each sub-string. If you wanted to match only if wordA, wordB, wordC, and wordD appeared anywhere and in order in a variable, it would be really difficult to use PRXMATCH, right?
... View more
12-21-2017
04:38 PM
@ballardw Some sample data: ID Narrative1 Narrative2 Narrative3 Narrative4 Narrative5 16012 AIRCRAFT CALLED TO ASSIST WITH OVERBOARD SAILOR FROM USS NORMANDY PARALLEL SEARCH PATTERN EMPLOYED TO LOCATE USS NORMANDY SAILOR REPORTED OVERBOARD OVERBOARD SAILOR FROM USS NORMANDY NOT LOCATED N/A AIRCRAFT RETURNED TO BASE Narrative1 and Narrative3 = OVERBOARD appears before NORMANDY Narrative2 = NORMANDY appears before OVERBOARD If I could relax the flag criteria to select if NORMANDY or OVERBOARD, then I know this syntax would work: if prxmatch ("/NORMANDY|OVERBOARD/i",findtxt(i)) then Flag = 1; But I don't know how to write the syntax so that it will match only if NORMANDY and OVERBOARD are in a variable.
... View more
12-21-2017
03:47 PM
@novinosrin I don't think so because I'm trying to identify discrete words (i.e., 'normandy' and 'overboard') that can appear in any order instead of a string with multiple words (e.g., 'sailor fell overboard the USS Normandy'). But I'm happy to be wrong if I can just get the syntax to work properly!
... View more
12-21-2017
03:35 PM
Hello, My data set has 5 narrative text fields that accept open-ended responses. I need to figure out a way to scan through these variables and flag observations that meet specific criteria. For example, I want to create a flag if the words "NORMANDY" and "OVERBOARD" both appear in any order in at least 1 of the 5 narrative text fields. Unsuccessful syntax: data want; set have; array findtxt (5) narrative1 narrative2 narrative3 narrative4 narrative5; do i = 1 to 5; if prxmatch ("/(NORMANDY)(OVERBOARD)/i",findtxt(i)) then Flag = 1; end; drop i; run; Sort of successful syntax: data want; set have; array findtxt (5) narrative1 narrative2 narrative3 narrative4 narrative5; do i = 1 to 5; if prxmatch ("/NORMANDY.*OVERBOARD/i",findtxt(i)) then Flag = 1; /*only flags if 'NORMANDY' comes before 'OVERBOARD'*/ end; drop i; run; Since I want to flag each observation that has 'NORMANDY' and 'OVERBOARD' in any of the 5 narrative fields, including those where 'OVERBOARD' appears before 'NORMANDY', this syntax won't work. Any suggestions for how to write the PRXMATCH function correctly? Thanks!
... View more
12-13-2017
01:28 PM
@Reezaand @Astounding Thanks for your comments. Yes, I tried to simplify my syntax and typed it (rather than cutting and pasting). The variable names are not descriptive and quickly become confusing. My actual syntax (with additional comments) follows: data rr_rptsernum; set rr_srutype; length rptsernumA rptsernum1 $15.; /*added length statements as suggested*/ length rptsernum1a rptsernum1b rptsernum2 $10.; length rptsernum3a rptsernum3b RRnum3 $5.; rptsernum = compress(rptsernum,'-)','C'); rptsernumA = tranwrd(rptsernum,' (','/');/*replace ' (' with '/'*/ rptsernum1 = tranwrd(rptsernumA,'XX','999'); rptsernum1a = substr(rptsernum1,1,2); /*retains only initial 2 digits = year*/ rptsernum1b = substr(rptsernum1,3); /*retains everything else (rpt number and extra text)*/ rptsernum2 = scan(rptsernum1b,1,' '); /*deletes everything that follows a space*/ rptsernum2a = anyalpha(rptsernum2); rptsernum2b = anypunct(rptsernum2); rptsernum2c = anydigit(rptsernum2); if rptsernum2a = 0 and rptsernum2b = 0 then rptsernum3a = rptsernum2; else if rptsernum2a gt 0 and rptsernum2b eq 0 then rptsernum3a = substr(rptsernum2,rptsernum2c,rptsernum2a-rptsernum2c); else if rptsernum2a eq 0 and rptsernum2b gt 0 then rptsernum3a = substr(rptsernum2,rptsernum2c,rptsernum2b-rptsernum2c); else if rptsernum2a gt 0 and rptsernum2b gt 0 then rptsernum3a = substr(rptsernum2,rptsernum2c,rptsernum2b-rptsernum2c); if rptsernum2a = 0 and rptsernum2b = 0 then rptsernum3b = ' '; else if rptsernum2a gt 0 and rptsernum2b eq 0 then rptsernum3b = substr(rptsernum2,rptsernum2a,3); else if rptsernum2a eq 0 and rptsernum2b gt 0 then rptsernum3b = substr(rptsernum2,rptsernum2b+1,3); /*added '+1' as suggested*/ else if rptsernum2a gt 0 and rptsernum2b gt 0 then rptsernum3b = substr(rptsernum2,rptsernum2b+1,3); /*added '+1' as suggested*/ RRnum1 = input(rptsernum1a, 4.); RRnum2 = input(rptsernum3a, 4.); RRnum3 = rptsernum3b; run; Including the length statements and adding '+1' to account for the punctuation were both required. The syntax now properly divides the original variable into the required component parts. Thanks so much for your assistance! SM1
... View more
12-12-2017
07:24 PM
Hello, I need to divide a string variable into several component parts. All new variables will be string variables. Sample data provided below: OriginalVar Component1 Component2 Component3 1702 17 02 1627M 16 27 M 1628SD 16 28 SD 18003 18 003 1730S3 17 30 S3 1704/1 17 04 1 I used the following syntax: data want; data have; Component1 = substr(originalvar,1,2); /*this variable is good*/ Rest=substr(originalvar,3); /*this variable contains everything from position 3 to end of OriginalVar*/ restAlpha=anyalpha(rest); /*identify position of 1st alpha in "REST"*/ restPunct=anypunct(rest); /*identify position of 1st punctuation in "REST"*/ restDigit=anydigit(rest); /*identify position of 1st digit in "REST"*/ if restAlpha = 0 and restPunct = 0 then Component2 = Rest;/*this syntax correctly creates Component2*/ else if restAlpha gt 0 and restPunct eq 0 then Component2 = substr(Rest,restDigit,restAlpha-restDigit); else if restAlpha eq 0 and restPunct gt 0 then Component2 = substr(Rest,restDigit,restPunct-restDigit); if restAlpha = 0 and restPunct = 0 thenComponent3 = ' '; /*Component3 is not properly generated*/ else if restAlpha gt 0 and restPunct eq 0 then Component3 = substr(Rest,restAlpha,3); else if restAlpha eq 0 and restPunct gt 0 then Component3 = substr(Rest,restPunct,3); end; This syntax generates the following: OriginalVar Component1 Component2 Component3 1702 17 02 1627M 16 27 M 1628SD 16 28 S /*this is wrong - Component3 should = 'SD'*/ 18003 18 003 1730S3 17 30 S /*this is wrong - Component3 should = 'S3'*/ 1704/1 17 04 / /*this is wrong - Component3 should = '1'*/ Any suggestions for how to fix my syntax? Thanks so much!
... View more
09-11-2017
05:24 PM
2 Likes
@Reeza Your syntax required a slight tweak to work: minute1=INTCK('minute',dhms(event1_date, 0, 0, event1_time), dhms(event2_date, 0, 0, event2_time)); Thanks for the help!
... View more
09-11-2017
04:59 PM
@Reeza I did check the docs and didn't find that interval listed. Subtracting and then dividing by 60 should work. I'll do that now. Thanks again!
... View more