SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Lee_wan
Obsidian | Level 7

what I have as below:

data vs;
   input id $1-4 vstestcd $6-7 vsdtc $9-18;
   datalines;
 M01 AA 2019-09-26
 M01 BB 2019-09-26
 M01 AA 2019-10-15
 M01 BB 2019-10-15
 M01 AA 2019-10-23
 M02 AA 2019-09-12
 M02 BB 2019-09-12
 M02 AA 2019-09-23
 M02 BB 2019-09-23
 M02 AA 2019-09-30
;
run;

data sv;
   input id $1-4 visit $6-7 svstdtc $9-18 svendtc $19-29;
   datalines;
 M01 01 2019-09-26 2019-09-26
 M01 02 2019-10-15 2019-10-15
 M01 03 2019-10-19 2019-10-19
 M01 04 2019-10-23 2019-10-23
 M01 05 2019-11-03 2019-11-03
 M02 01 2019-09-12 2019-09-12
 M02 02 2019-09-23 2019-09-23
 M02 03 2019-09-26 2019-09-26
 M02 04 2019-09-30 2019-09-30
 M02 05 2019-10-08 2019-10-08
;
run;
  

It's just a small part of the data and I want to use hash by down to up to traversal SV dataset to connect VS dataset.

 

Because there some records have multiple date for different visit in sv dataset,and I want to map last visit in sv data when SVSTDTC<=vsdt<=SVENDTC.

 

My code is as below:

 

data test;
  if _n_=1 then do;
    if 0 then set sv;
    declare hash h1(dataset:'sv', multidata: "Y");
    dcl hiter iter('h1');
    h1.definekey('id');
    h1.definedata('SVENDTC','SVSTDTC','VISIT');
    h1.definedone();
  end;
  
  do until(eof);
    set vs end=eof;
   
    do RC = iter.LAST() by 0 while (RC = 0) ;
      if svstdtc<=vsdtc<=svendtc then goto edit;
      rc=iter.PREV();
    end;
    edit:
    output;
  end;
run;

I use iter.LAST and iter.PREV to traversal from bottom to top, but the key is not map.

The second ID's visit is connected to the first ID.

11.png

Please help me and thanks.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star
  1. Is this picture the entire desired output from your sample data?

  2. Your use of the iterator does not guarantee an ID match.  It only satisfies the date condition.  In fact you don't need the iterator, since you really want only to search within tied id items.

  3. Even if you find the right ID, please remember that (for multidata:'Y') when there are more that one hash dataitem for a key, they are stored in the order encountered, so you must be sure that your data set is already in the desired order.  But if your SV dataset is already in the desired order, then you can:
data vs;
   input id $1-4 vstestcd $6-7 vsdtc $9-18;
   datalines;
 M01 AA 2019-09-26
 M01 BB 2019-09-26
 M01 AA 2019-10-15
 M01 BB 2019-10-15
 M01 AA 2019-10-23
 M02 AA 2019-09-12
 M02 BB 2019-09-12
 M02 AA 2019-09-23
 M02 BB 2019-09-23
 M02 AA 2019-09-30
;
run;

data sv;
   input id $1-4 visit $6-7 svstdtc $9-18 svendtc $19-29;
   datalines;
 M01 01 2019-09-26 2019-09-26
 M01 02 2019-10-15 2019-10-15
 M01 03 2019-10-19 2019-10-19
 M01 04 2019-10-23 2019-10-23
 M01 05 2019-11-03 2019-11-03
 M02 01 2019-09-12 2019-09-12
 M02 02 2019-09-23 2019-09-23
 M02 03 2019-09-26 2019-09-26
 M02 04 2019-09-30 2019-09-30
 M02 05 2019-10-08 2019-10-08
;
run;

data test (drop=_:);
  if _n_=1 then do;
    if 0 then set sv;
    declare hash h1(dataset:'sv', multidata: "Y");
    h1.definekey('id');
    h1.definedata('SVENDTC','SVSTDTC','VISIT');
    h1.definedone();
  end;
  
  set vs;
  _rc=h1.find();
  if _rc=0;
  do _i=1 by 1 until (_rc^=0);
    if svstdtc<=vsdtc<=svendtc then _latest_success=_i;
    _rc=h1.find_next();
  end;
  if _latest_success^=.;
  _rc=h1.find();
  if _latest_success>1 then do _i=2 to _latest_success;
    _rc=h1.find_next();
  end;
run;
  1. The FIND method always matches to the first matching data item when there are ties.
  2. The FIND_NEXT method advances within the tied data items.  It returns a non-zero when all the ties  are exhausted.  So this program uses FIND_NEXT to step through all the matching ID's,  And it updates the _latest_found index only when the date condition is satisfied.  At the end of the loop _latest_found will specify the most recent satisfactory date for a matching id.
  3. Then use the FIND again to go to the beginning of the matching id's.  And use a  counter to re-apply FIND_NEXT up  to _latest_found to get and keep the most recent matching data item in  the hash object.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

9 REPLIES 9
mkeintz
PROC Star

I'm not sure I entirely understand what you want.  Could you show us what the desired output would be for your sample data?

 

thanks

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Lee_wan
Obsidian | Level 7

Hi, thanks for your response. My English is not good and I uploaded a new picture, hoping to help you understand。

 

Thanks.

PeterClemmensen
Tourmaline | Level 20

Don't see a picture?

Lee_wan
Obsidian | Level 7

11.png

 

And now?

mkeintz
PROC Star
  1. Is this picture the entire desired output from your sample data?

  2. Your use of the iterator does not guarantee an ID match.  It only satisfies the date condition.  In fact you don't need the iterator, since you really want only to search within tied id items.

  3. Even if you find the right ID, please remember that (for multidata:'Y') when there are more that one hash dataitem for a key, they are stored in the order encountered, so you must be sure that your data set is already in the desired order.  But if your SV dataset is already in the desired order, then you can:
data vs;
   input id $1-4 vstestcd $6-7 vsdtc $9-18;
   datalines;
 M01 AA 2019-09-26
 M01 BB 2019-09-26
 M01 AA 2019-10-15
 M01 BB 2019-10-15
 M01 AA 2019-10-23
 M02 AA 2019-09-12
 M02 BB 2019-09-12
 M02 AA 2019-09-23
 M02 BB 2019-09-23
 M02 AA 2019-09-30
;
run;

data sv;
   input id $1-4 visit $6-7 svstdtc $9-18 svendtc $19-29;
   datalines;
 M01 01 2019-09-26 2019-09-26
 M01 02 2019-10-15 2019-10-15
 M01 03 2019-10-19 2019-10-19
 M01 04 2019-10-23 2019-10-23
 M01 05 2019-11-03 2019-11-03
 M02 01 2019-09-12 2019-09-12
 M02 02 2019-09-23 2019-09-23
 M02 03 2019-09-26 2019-09-26
 M02 04 2019-09-30 2019-09-30
 M02 05 2019-10-08 2019-10-08
;
run;

data test (drop=_:);
  if _n_=1 then do;
    if 0 then set sv;
    declare hash h1(dataset:'sv', multidata: "Y");
    h1.definekey('id');
    h1.definedata('SVENDTC','SVSTDTC','VISIT');
    h1.definedone();
  end;
  
  set vs;
  _rc=h1.find();
  if _rc=0;
  do _i=1 by 1 until (_rc^=0);
    if svstdtc<=vsdtc<=svendtc then _latest_success=_i;
    _rc=h1.find_next();
  end;
  if _latest_success^=.;
  _rc=h1.find();
  if _latest_success>1 then do _i=2 to _latest_success;
    _rc=h1.find_next();
  end;
run;
  1. The FIND method always matches to the first matching data item when there are ties.
  2. The FIND_NEXT method advances within the tied data items.  It returns a non-zero when all the ties  are exhausted.  So this program uses FIND_NEXT to step through all the matching ID's,  And it updates the _latest_found index only when the date condition is satisfied.  At the end of the loop _latest_found will specify the most recent satisfactory date for a matching id.
  3. Then use the FIND again to go to the beginning of the matching id's.  And use a  counter to re-apply FIND_NEXT up  to _latest_found to get and keep the most recent matching data item in  the hash object.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Lee_wan
Obsidian | Level 7

Thank you for your response!!

 

Using the FIND again that makes me becoming enlightened at once!!

 

Kurt_Bremser
Super User

Why don't you use a simple SQL?

data vs;
length
  id $3
  vstestcd $2
  vsdt 4
;
format vsdt yymmddd10.;
input id vstestcd vsdt :yymmdd10.;
datalines;
M01 AA 2019-09-26
M01 BB 2019-09-26
M01 AA 2019-10-15
M01 BB 2019-10-15
M01 AA 2019-10-23
M02 AA 2019-09-12
M02 BB 2019-09-12
M02 AA 2019-09-23
M02 BB 2019-09-23
M02 AA 2019-09-30
;

data sv;
length
  id $3
  visit $2
  svstdt
  svendt
    4
;
format
  svstdt
  svendt
    yymmddd10.
;
input id visit (svstdt svendt) (:yymmdd10.);
datalines;
M01 01 2019-09-26 2019-09-26
M01 02 2019-10-15 2019-10-15
M01 03 2019-10-19 2019-10-19
M01 04 2019-10-23 2019-10-23
M01 05 2019-11-03 2019-11-03
M02 01 2019-09-12 2019-09-12
M02 02 2019-09-23 2019-09-23
M02 03 2019-09-26 2019-09-26
M02 04 2019-09-30 2019-09-30
M02 05 2019-10-08 2019-10-08
;

proc sql;
create table want as
select
  sv.*,
  vs.vstestcd,
  vs.vsdt
from sv left join vs
on sv.id = vs.id and sv.svstdt <= vs.vsdt <= sv.svendt;
quit;

proc print data=want noobs;
run;

Result:

id     visit        svstdt        svendt    vstestcd          vsdt

M01     02      2019-10-15    2019-10-15       AA       2019-10-15
M01     04      2019-10-23    2019-10-23       AA       2019-10-23
M01     01      2019-09-26    2019-09-26       BB       2019-09-26
M01     01      2019-09-26    2019-09-26       AA       2019-09-26
M01     02      2019-10-15    2019-10-15       BB       2019-10-15
M01     03      2019-10-19    2019-10-19                         .
M01     05      2019-11-03    2019-11-03                         .
M02     01      2019-09-12    2019-09-12       BB       2019-09-12
M02     01      2019-09-12    2019-09-12       AA       2019-09-12
M02     04      2019-09-30    2019-09-30       AA       2019-09-30
M02     02      2019-09-23    2019-09-23       BB       2019-09-23
M02     02      2019-09-23    2019-09-23       AA       2019-09-23
M02     05      2019-10-08    2019-10-08                         .
M02     03      2019-09-26    2019-09-26                         .

You can see there is no cross-mapping.

Lee_wan
Obsidian | Level 7

Hi thanks for your response.

 

Because there are maybe multiple records with the same svstdtc\svstdtc but different visit existing in SV data.

 

Just like “

M01 01 2019-09-26 2019-09-26
M01 02 2019-09-26 2019-09-26

 

When I use proc sql,the records will be increase.

 

 

Kurt_Bremser
Super User

@Lee_wan wrote:

Hi thanks for your response.

 

Because there are maybe multiple records with the same svstdtc\svstdtc but different visit existing in SV data.

 

Just like “

M01 01 2019-09-26 2019-09-26
M01 02 2019-09-26 2019-09-26

 

When I use proc sql,the records will be increase.

 

 


Then use a subselect to reduce the entries in sv:

data vs;
length
  id $3
  vstestcd $2
  vsdt 4
;
format vsdt yymmddd10.;
input id vstestcd vsdt :yymmdd10.;
datalines;
M01 AA 2019-09-26
M01 BB 2019-09-26
M01 AA 2019-10-15
M01 BB 2019-10-15
M01 AA 2019-10-23
M02 AA 2019-09-12
M02 BB 2019-09-12
M02 AA 2019-09-23
M02 BB 2019-09-23
M02 AA 2019-09-30
;

data sv;
length
  id $3
  visit 3
  svstdt
  svendt
    4
;
format
  visit z2.
  svstdt
  svendt
    yymmddd10.
;
input id visit (svstdt svendt) (:yymmdd10.);
datalines;
M01 01 2019-09-26 2019-09-26
M01 02 2019-09-26 2019-09-26
M01 02 2019-10-15 2019-10-15
M01 03 2019-10-19 2019-10-19
M01 04 2019-10-23 2019-10-23
M01 05 2019-11-03 2019-11-03
M02 01 2019-09-12 2019-09-12
M02 02 2019-09-23 2019-09-23
M02 03 2019-09-26 2019-09-26
M02 04 2019-09-30 2019-09-30
M02 05 2019-10-08 2019-10-08
;

proc sql;
create table want as
select
  a.*,
  vs.vstestcd,
  vs.vsdt
from (
  select
    id,
    max(visit) as visit format=z2.,
    svstdt,
    svendt
  from sv
  group by id, svstdt, svendt
  having sv.visit = calculated visit
) a left join vs
on a.id = vs.id and a.svstdt <= vs.vsdt <= a.svendt;
quit;

proc print data=want noobs;
run;

(I added an additional observation in sv to create your "double" scenario, and changed visit to numeric, to allow use of the max() summary function)

Result:

id     visit        svstdt        svendt    vstestcd          vsdt

M01     02      2019-10-15    2019-10-15       AA       2019-10-15
M01     04      2019-10-23    2019-10-23       AA       2019-10-23
M01     02      2019-09-26    2019-09-26       BB       2019-09-26
M01     02      2019-09-26    2019-09-26       AA       2019-09-26
M01     02      2019-10-15    2019-10-15       BB       2019-10-15
M01     03      2019-10-19    2019-10-19                         .
M01     05      2019-11-03    2019-11-03                         .
M02     01      2019-09-12    2019-09-12       BB       2019-09-12
M02     01      2019-09-12    2019-09-12       AA       2019-09-12
M02     04      2019-09-30    2019-09-30       AA       2019-09-30
M02     02      2019-09-23    2019-09-23       BB       2019-09-23
M02     02      2019-09-23    2019-09-23       AA       2019-09-23
M02     03      2019-09-26    2019-09-26                         .
M02     05      2019-10-08    2019-10-08                         .

As you can see, on 2019-09-26 only visit 02 is taken.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1339 views
  • 4 likes
  • 4 in conversation