About legends1337

legends1337 · ‎05-03-2022

I have the following dataset: data have; infile datalines delimiter="|"; input attrib :$30. multiple_attr :$1. id :$30. attrib_id :8. member_value :$100. type :$5. dt_event :datetime18.; format dt_event datetime20.; datalines; TYPE|N|ABC123|111|MEDIUM|Start|01DEC2014:00:00:00 TYPE|N|ABC123|111|MEDIUM|End|18APR2021:00:00:00 TYPE|N|ABC123|111|BIG|Start|19APR2021:00:00:00 TYPE|N|ABC123|111|BIG|End|31DEC2030:00:00:00 POSITION|N|ABC123|222|TOP|Start|01DEC2014:00:00:00 POSITION|N|ABC123|222|TOP|End|31DEC2030:00:00:00 IS_ACTIVE|N|ABC123|333|YES|Start|01DEC2014:00:00:00 IS_ACTIVE|N|ABC123|333|YES|End|31DEC2030:00:00:00 LEVELS|Y|ABC123|1|ALONE|Start|01DEC2014:00:00:00 LEVELS|Y|ABC123|1|BOTH|Start|01DEC2014:00:00:00 LEVELS|Y|ABC123|1|BOTH|End|18APR2021:00:00:00 LEVELS|Y|ABC123|1|ALONE|End|31DEC2030:00:00:00 TYPE|N|DEF456|111|MEDIUM|Start|01DEC2014:00:00:00 TYPE|N|DEF456|111|MEDIUM|End|31DEC2030:00:00:00 POSITION|N|DEF456|222|MID|Start|01DEC2014:00:00:00 POSITION|N|DEF456|222|MID|End|31DEC2030:00:00:00 IS_ACTIVE|N|DEF456|333|YES|Start|01MAR2014:00:00:00 IS_ACTIVE|N|DEF456|333|YES|End|31DEC2030:00:00:00 LEVELS|Y|DEF456|1|ALONE|Start|01MAR2014:00:00:00 LEVELS|Y|DEF456|1|BOTH|Start|01MAR2014:00:00:00 LEVELS|Y|DEF456|1|BOTH|End|31MAR2018:00:00:00 LEVELS|Y|DEF456|1|BOTH|Start|20AUG2018:00:00:00 LEVELS|Y|DEF456|1|ALONE|End|31DEC2030:00:00:00 LEVELS|Y|DEF456|1|BOTH|End|31DEC2030:00:00:00 ; Which is a event based table for all the attributes an ID has. I would like to be able to "stack" multiple modalities attributes (ex: stacj attrib_id 1 together) so that I end up with the following dataset: +---------------+--------+-----------+--------------------+--------------------+--------------+ | multiple_attr | id | attrib_id | start_date | end_date | member_value | +---------------+--------+-----------+--------------------+--------------------+--------------+ | Y | ABC123 | 1 | 01DEC2014:00:00:00 | 18APR2021:00:00:00 | ALONE; BOTH | | Y | ABC123 | 1 | 19APR2021:00:00:00 | 31DEC2030:00:00:00 | ALONE | | N | ABC123 | 111 | 01DEC2014:00:00:00 | 18APR2021:00:00:00 | MEDIUM | | N | ABC123 | 111 | 19APR2021:00:00:00 | 31DEC2030:00:00:00 | BIG | | N | ABC123 | 222 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | TOP | | N | ABC123 | 333 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | YES | | Y | DEF456 | 1 | 01MAR2014:00:00:00 | 31MAR2018:00:00:00 | ALONE; BOTH | | Y | DEF456 | 1 | 01APR2018:00:00:00 | 19AUG2018:00:00:00 | ALONE | | Y | DEF456 | 1 | 20AUG2018:00:00:00 | 31DEC2030:00:00:00 | ALONE; BOTH | | N | DEF456 | 111 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | MEDIUM | | N | DEF456 | 222 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | MID | | N | DEF456 | 333 | 01MAR2014:00:00:00 | 31DEC2030:00:00:00 | YES | +---------------+--------+-----------+--------------------+--------------------+--------------+ Is there a way to do this?

legends1337 · ‎04-04-2022

@mkeintz Correct - this is exactly what I am searching for.

legends1337 · ‎04-04-2022

Thank you for the swift reply @ballardw Sorry if I was not clear enough. Basically, the idea is that I have multiple records with different validity ranges for each ID. In each of those observations, some attributes are populated (at least one). The problem is that the validity ranges are not continuous. Example: +-----+------------+-----------+-------------+-------------+-------------+-------------+ | id | start_date | end_date | attribute_1 | attribute_2 | attribute_3 | attribute_4 | +-----+------------+-----------+-------------+-------------+-------------+-------------+ | ID2 | 01SEP2015 | 30NOV2020 | | | TWO | | -----> Attribute 3 is valid from 01SEP2015 to 30NOV2020 | ID2 | 01DEC2020 | 31JUL2021 | SMALL | | | | -----> Attribute 1 is valid from 01SEP2015 to 31DEC9999 What I would like to end up with is a table that contains the "status" (meaning: what are the attributes values) of an ID at one point in time (from 01JAN2014 to 31DEC9999). So instead of overlapping dates, I would have continuous validity ranges: +-----+------------+-----------+-------------+-------------+-------------+-------------+ | id | start_date | end_date | attribute_1 | attribute_2 | attribute_3 | attribute_4 | +-----+------------+-----------+-------------+-------------+-------------+-------------+ | ID2 | 01SEP2015 | 30NOV2020 | SMALL | | TWO | | ----> until 30NOV2020 | ID2 | 01DEC2020 | 31JUL2021 | SMALL | | | | ----> Start 30NOV2020 + 1 day | ID2 | 01AUG2021 | 31DEC9999 | SMALL | | TWO | | ----> Start 31JUL2021 + 1 day +-----+------------+-----------+-------------+-------------+-------------+-------------+ Having this table as output will help me when I am trying to find the exact status of an ID at a given point in time (it will output only a single observation). Example: where id="ID2" and start_date <= "28SEP2020"d <= end_date should only output one observation which is: +-----+------------+-----------+-------------+-------------+-------------+-------------+ | id | start_date | end_date | attribute_1 | attribute_2 | attribute_3 | attribute_4 | +-----+------------+-----------+-------------+-------------+-------------+-------------+ | ID2 | 01SEP2015 | 30NOV2020 | SMALL | | TWO | | +-----+------------+-----------+-------------+-------------+-------------+-------------+ ----> Attribute_1 = "SMALL" and attribute_3 = "TWO" at that specific time. instead of +-----+------------+-----------+-------------+-------------+-------------+-------------+ | id | start_date | end_date | attribute_1 | attribute_2 | attribute_3 | attribute_4 | +-----+------------+-----------+-------------+-------------+-------------+-------------+ | ID2 | 01SEP2015 | 30NOV2020 | | | TWO | | | ID2 | 01SEP2015 | 31DEC9999 | SMALL | | | | +-----+------------+-----------+-------------+-------------+-------------+-------------+ Also, your code does not produce the expected output for the example: the validity ranges are not continuous (they remain unchanged) +-----+-------------+-----------+--------------+--------------+--------------+--------------+ | id | start_date | end_date | attribute_1 | attribute_2 | attribute_3 | attribute_4 | +-----+-------------+-----------+--------------+--------------+--------------+--------------+ | ID1 | 01MAR2014 | 31DEC9999 | BIG | YES | | | | ID2 | 01SEP2015 | 30NOV2020 | SMALL | | TWO | | | ID2 | 01SEP2015 | 31DEC9999 | SMALL | | | | | ID2 | 01AUG2021 | 31DEC9999 | SMALL | | TWO | | | ID3 | 01DEC2014 | 31MAY2016 | MEDIUM | YES | | | | ID3 | 01DEC2014 | 29JUN2017 | MEDIUM | | | OK | | ID3 | 01DEC2014 | 31DEC9999 | MEDIUM | | | | | ID3 | 31MAR2015 | 29SEP2017 | MEDIUM | | ONE | | | ID3 | 30JUN2017 | 31DEC9999 | MEDIUM | YES | | TBD | | ID3 | 30SEP2017 | 31DEC9999 | MEDIUM | | ONE, TWO | | +-----+-------------+-----------+--------------+--------------+--------------+--------------+ Please let me know if that is still unclear.

legends1337 · ‎04-04-2022

Given data have; infile datalines missover delimiter="|" dsd; input id :$20. (start_date end_date) (:date9.) (attribute_1 attribute_2 attribute_3 attribute_4) ($); format start_date end_date date9.; datalines; ID1|01MAR2014|31DEC9999|BIG|YES|| ID2|01SEP2015|30NOV2020|||TWO| ID2|01SEP2015|31DEC9999|SMALL||| ID2|01AUG2021|31DEC9999|||TWO| ID3|01DEC2014|31MAY2016||YES|| ID3|01DEC2014|29JUN2017||||OK ID3|01DEC2014|31DEC9999|MEDIUM||| ID3|31MAR2015|29SEP2017|||ONE| ID3|30JUN2017|31DEC9999||YES||TBD ID3|30SEP2017|31DEC9999|||ONE, TWO| ; I would like to get continuous validity ranges for each id with the correct attributes at each of them. The desired output would be like this: +-----+------------+-----------+-------------+-------------+-------------+-------------+ | id | start_date | end_date | attribute_1 | attribute_2 | attribute_3 | attribute_4 | +-----+------------+-----------+-------------+-------------+-------------+-------------+ | ID1 | 01MAR2014 | 31DEC9999 | BIG | YES | | | | ID2 | 01SEP2015 | 30NOV2020 | SMALL | | TWO | | | ID2 | 01DEC2020 | 31JUL2021 | SMALL | | | | | ID2 | 01AUG2021 | 31DEC9999 | SMALL | | TWO | | | ID3 | 01DEC2014 | 30MAR2015 | MEDIUM | YES | | OK | | ID3 | 31MAR2015 | 31MAY2016 | MEDIUM | YES | ONE | OK | | ID3 | 01JUN2016 | 29JUN2016 | MEDIUM | | ONE | OK | | ID3 | 30JUN2016 | 29SEP2017 | MEDIUM | YES | ONE | TBD | | ID3 | 30SEP2017 | 31DEC9999 | MEDIUM | YES | ONE, TWO | TBD | +-----+------------+-----------+-------------+-------------+-------------+-------------+ I found a way of doing it using joins, but would like to know if there exist a better way of doing the following: data all_intervals; set have(keep= id start_date end_date); _start = start_date; output; _end = start_date-1; output; _end = end_date; output; if end_date < '31DEC9999'd then do; _start = end_date+1; output; end; run; proc sql; create table all_intervals as select distinct t1.id, t1._start, t2._end from all_intervals t1, all_intervals t2 where t1.id = t2.id and t2._end > t1._start ; quit; data all_intervals; set all_intervals; by id _start; if first.id or first._start; run; proc sql noprint; select 'max(t1.'||NAME||') as '||NAME into :attributes separated by ',' from sashelp.vcolumn where libname = "WORK" and memname = "HAVE" and upcase(name) not in ('ID', "_START", "_END") ; quit; proc sql; create table merge as select t2.id, t2._start as start_date, t2._end as end_date, &attributes. from have t1 right join all_intervals t2 on t1.id = t2.id and ((t2._start <= t1.start_date <= t2._end) or (t2._start <= t1.end_date <= t2._end) or (t2._start >= t1.start_date and t2._end <= t1.end_date)) group by t2.id, t2._start order by t2.id, t2._start ; quit; proc sort data=merge out=want nodupkey; by id start_date end_date; run; The above produce the expected output.

Online Status	Offline
Date Last Visited	‎05-03-2022 03:43 PM

Stack multiple modalities attributes based on multiple events

Re: Construct continuous intervals from non-contiguous validity ranges

Re: Construct continuous intervals from non-contiguous validity ranges

Construct continuous intervals from non-contiguous validity ranges

Stack multiple modalities attributes based on multiple events

Re: Construct continuous intervals from non-contiguous validity ranges

Re: Construct continuous intervals from non-contiguous validity ranges

Construct continuous intervals from non-contiguous validity ranges