SAS Programming

Lee_wan · Posted 11-18-2019 08:48 PM

what I have as below:

data vs;
   input id $1-4 vstestcd $6-7 vsdtc $9-18;
   datalines;
 M01 AA 2019-09-26
 M01 BB 2019-09-26
 M01 AA 2019-10-15
 M01 BB 2019-10-15
 M01 AA 2019-10-23
 M02 AA 2019-09-12
 M02 BB 2019-09-12
 M02 AA 2019-09-23
 M02 BB 2019-09-23
 M02 AA 2019-09-30
;
run;

data sv;
   input id $1-4 visit $6-7 svstdtc $9-18 svendtc $19-29;
   datalines;
 M01 01 2019-09-26 2019-09-26
 M01 02 2019-10-15 2019-10-15
 M01 03 2019-10-19 2019-10-19
 M01 04 2019-10-23 2019-10-23
 M01 05 2019-11-03 2019-11-03
 M02 01 2019-09-12 2019-09-12
 M02 02 2019-09-23 2019-09-23
 M02 03 2019-09-26 2019-09-26
 M02 04 2019-09-30 2019-09-30
 M02 05 2019-10-08 2019-10-08
;
run;

It's just a small part of the data and I want to use hash by down to up to traversal SV dataset to connect VS dataset.

Because there some records have multiple date for different visit in sv dataset，and I want to map last visit in sv data when SVSTDTC<=vsdt<=SVENDTC.

My code is as below:

data test;
  if _n_=1 then do;
    if 0 then set sv;
    declare hash h1(dataset:'sv', multidata: "Y");
    dcl hiter iter('h1');
    h1.definekey('id');
    h1.definedata('SVENDTC','SVSTDTC','VISIT');
    h1.definedone();
  end;
  
  do until(eof);
    set vs end=eof;
   
    do RC = iter.LAST() by 0 while (RC = 0) ;
      if svstdtc<=vsdtc<=svendtc then goto edit;
      rc=iter.PREV();
    end;
    edit:
    output;
  end;
run;

I use iter.LAST and iter.PREV to traversal from bottom to top, but the key is not map.

The second ID's visit is connected to the first ID.

Please help me and thanks.

mkeintz · Posted 11-19-2019 02:39 AM

Is this picture the entire desired output from your sample data?
Your use of the iterator does not guarantee an ID match. It only satisfies the date condition. In fact you don't need the iterator, since you really want only to search within tied id items.
Even if you find the right ID, please remember that (for multidata:'Y') when there are more that one hash dataitem for a key, they are stored in the order encountered, so you must be sure that your data set is already in the desired order. But if your SV dataset is already in the desired order, then you can:

data vs;
   input id $1-4 vstestcd $6-7 vsdtc $9-18;
   datalines;
 M01 AA 2019-09-26
 M01 BB 2019-09-26
 M01 AA 2019-10-15
 M01 BB 2019-10-15
 M01 AA 2019-10-23
 M02 AA 2019-09-12
 M02 BB 2019-09-12
 M02 AA 2019-09-23
 M02 BB 2019-09-23
 M02 AA 2019-09-30
;
run;

data sv;
   input id $1-4 visit $6-7 svstdtc $9-18 svendtc $19-29;
   datalines;
 M01 01 2019-09-26 2019-09-26
 M01 02 2019-10-15 2019-10-15
 M01 03 2019-10-19 2019-10-19
 M01 04 2019-10-23 2019-10-23
 M01 05 2019-11-03 2019-11-03
 M02 01 2019-09-12 2019-09-12
 M02 02 2019-09-23 2019-09-23
 M02 03 2019-09-26 2019-09-26
 M02 04 2019-09-30 2019-09-30
 M02 05 2019-10-08 2019-10-08
;
run;

data test (drop=_:);
  if _n_=1 then do;
    if 0 then set sv;
    declare hash h1(dataset:'sv', multidata: "Y");
    h1.definekey('id');
    h1.definedata('SVENDTC','SVSTDTC','VISIT');
    h1.definedone();
  end;
  
  set vs;
  _rc=h1.find();
  if _rc=0;
  do _i=1 by 1 until (_rc^=0);
    if svstdtc<=vsdtc<=svendtc then _latest_success=_i;
    _rc=h1.find_next();
  end;
  if _latest_success^=.;
  _rc=h1.find();
  if _latest_success>1 then do _i=2 to _latest_success;
    _rc=h1.find_next();
  end;
run;

The FIND method always matches to the first matching data item when there are ties.
The FIND_NEXT method advances within the tied data items. It returns a non-zero when all the ties are exhausted. So this program uses FIND_NEXT to step through all the matching ID's, And it updates the _latest_found index only when the date condition is satisfied. At the end of the loop _latest_found will specify the most recent satisfactory date for a matching id.
Then use the FIND again to go to the beginning of the matching id's. And use a counter to re-apply FIND_NEXT up to _latest_found to get and keep the most recent matching data item in the hash object.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 11-18-2019 11:15 PM

I'm not sure I entirely understand what you want. Could you show us what the desired output would be for your sample data?

thanks

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Lee_wan · Posted 11-19-2019 12:22 AM

Hi, thanks for your response. My English is not good and I uploaded a new picture, hoping to help you understand。

Thanks.

PeterClemmensen · Posted 11-19-2019 01:10 AM

Don't see a picture?

The DATA to DATA Step Macro
Blog: SASnrd

Lee_wan · Posted 11-19-2019 01:26 AM

And now？

mkeintz · Posted 11-19-2019 02:39 AM

Is this picture the entire desired output from your sample data?
Your use of the iterator does not guarantee an ID match. It only satisfies the date condition. In fact you don't need the iterator, since you really want only to search within tied id items.
Even if you find the right ID, please remember that (for multidata:'Y') when there are more that one hash dataitem for a key, they are stored in the order encountered, so you must be sure that your data set is already in the desired order. But if your SV dataset is already in the desired order, then you can:

data vs;
   input id $1-4 vstestcd $6-7 vsdtc $9-18;
   datalines;
 M01 AA 2019-09-26
 M01 BB 2019-09-26
 M01 AA 2019-10-15
 M01 BB 2019-10-15
 M01 AA 2019-10-23
 M02 AA 2019-09-12
 M02 BB 2019-09-12
 M02 AA 2019-09-23
 M02 BB 2019-09-23
 M02 AA 2019-09-30
;
run;

data sv;
   input id $1-4 visit $6-7 svstdtc $9-18 svendtc $19-29;
   datalines;
 M01 01 2019-09-26 2019-09-26
 M01 02 2019-10-15 2019-10-15
 M01 03 2019-10-19 2019-10-19
 M01 04 2019-10-23 2019-10-23
 M01 05 2019-11-03 2019-11-03
 M02 01 2019-09-12 2019-09-12
 M02 02 2019-09-23 2019-09-23
 M02 03 2019-09-26 2019-09-26
 M02 04 2019-09-30 2019-09-30
 M02 05 2019-10-08 2019-10-08
;
run;

data test (drop=_:);
  if _n_=1 then do;
    if 0 then set sv;
    declare hash h1(dataset:'sv', multidata: "Y");
    h1.definekey('id');
    h1.definedata('SVENDTC','SVSTDTC','VISIT');
    h1.definedone();
  end;
  
  set vs;
  _rc=h1.find();
  if _rc=0;
  do _i=1 by 1 until (_rc^=0);
    if svstdtc<=vsdtc<=svendtc then _latest_success=_i;
    _rc=h1.find_next();
  end;
  if _latest_success^=.;
  _rc=h1.find();
  if _latest_success>1 then do _i=2 to _latest_success;
    _rc=h1.find_next();
  end;
run;

The FIND method always matches to the first matching data item when there are ties.
The FIND_NEXT method advances within the tied data items. It returns a non-zero when all the ties are exhausted. So this program uses FIND_NEXT to step through all the matching ID's, And it updates the _latest_found index only when the date condition is satisfied. At the end of the loop _latest_found will specify the most recent satisfactory date for a matching id.
Then use the FIND again to go to the beginning of the matching id's. And use a counter to re-apply FIND_NEXT up to _latest_found to get and keep the most recent matching data item in the hash object.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Lee_wan · Posted 11-19-2019 03:18 AM

Thank you for your response!!

Using the FIND again that makes me becoming enlightened at once!!

Kurt_Bremser · Posted 11-19-2019 01:50 AM

Why don't you use a simple SQL?

data vs;
length
  id $3
  vstestcd $2
  vsdt 4
;
format vsdt yymmddd10.;
input id vstestcd vsdt :yymmdd10.;
datalines;
M01 AA 2019-09-26
M01 BB 2019-09-26
M01 AA 2019-10-15
M01 BB 2019-10-15
M01 AA 2019-10-23
M02 AA 2019-09-12
M02 BB 2019-09-12
M02 AA 2019-09-23
M02 BB 2019-09-23
M02 AA 2019-09-30
;

data sv;
length
  id $3
  visit $2
  svstdt
  svendt
    4
;
format
  svstdt
  svendt
    yymmddd10.
;
input id visit (svstdt svendt) (:yymmdd10.);
datalines;
M01 01 2019-09-26 2019-09-26
M01 02 2019-10-15 2019-10-15
M01 03 2019-10-19 2019-10-19
M01 04 2019-10-23 2019-10-23
M01 05 2019-11-03 2019-11-03
M02 01 2019-09-12 2019-09-12
M02 02 2019-09-23 2019-09-23
M02 03 2019-09-26 2019-09-26
M02 04 2019-09-30 2019-09-30
M02 05 2019-10-08 2019-10-08
;

proc sql;
create table want as
select
  sv.*,
  vs.vstestcd,
  vs.vsdt
from sv left join vs
on sv.id = vs.id and sv.svstdt <= vs.vsdt <= sv.svendt;
quit;

proc print data=want noobs;
run;

Result:

id     visit        svstdt        svendt    vstestcd          vsdt

M01     02      2019-10-15    2019-10-15       AA       2019-10-15
M01     04      2019-10-23    2019-10-23       AA       2019-10-23
M01     01      2019-09-26    2019-09-26       BB       2019-09-26
M01     01      2019-09-26    2019-09-26       AA       2019-09-26
M01     02      2019-10-15    2019-10-15       BB       2019-10-15
M01     03      2019-10-19    2019-10-19                         .
M01     05      2019-11-03    2019-11-03                         .
M02     01      2019-09-12    2019-09-12       BB       2019-09-12
M02     01      2019-09-12    2019-09-12       AA       2019-09-12
M02     04      2019-09-30    2019-09-30       AA       2019-09-30
M02     02      2019-09-23    2019-09-23       BB       2019-09-23
M02     02      2019-09-23    2019-09-23       AA       2019-09-23
M02     05      2019-10-08    2019-10-08                         .
M02     03      2019-09-26    2019-09-26                         .

You can see there is no cross-mapping.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Lee_wan · Posted 11-19-2019 02:25 AM

Hi thanks for your response.

Because there are maybe multiple records with the same svstdtc\svstdtc but different visit existing in SV data.

Just like “

M01 01 2019-09-26 2019-09-26

M01 02 2019-09-26 2019-09-26

”

When I use proc sql，the records will be increase.

Kurt_Bremser · Posted 11-19-2019 04:49 AM

@Lee_wan wrote:

Hi thanks for your response.

Because there are maybe multiple records with the same svstdtc\svstdtc but different visit existing in SV data.

Just like “
M01 01 2019-09-26 2019-09-26
M01 02 2019-09-26 2019-09-26
”

When I use proc sql，the records will be increase.

Then use a subselect to reduce the entries in sv:

data vs;
length
  id $3
  vstestcd $2
  vsdt 4
;
format vsdt yymmddd10.;
input id vstestcd vsdt :yymmdd10.;
datalines;
M01 AA 2019-09-26
M01 BB 2019-09-26
M01 AA 2019-10-15
M01 BB 2019-10-15
M01 AA 2019-10-23
M02 AA 2019-09-12
M02 BB 2019-09-12
M02 AA 2019-09-23
M02 BB 2019-09-23
M02 AA 2019-09-30
;

data sv;
length
  id $3
  visit 3
  svstdt
  svendt
    4
;
format
  visit z2.
  svstdt
  svendt
    yymmddd10.
;
input id visit (svstdt svendt) (:yymmdd10.);
datalines;
M01 01 2019-09-26 2019-09-26
M01 02 2019-09-26 2019-09-26
M01 02 2019-10-15 2019-10-15
M01 03 2019-10-19 2019-10-19
M01 04 2019-10-23 2019-10-23
M01 05 2019-11-03 2019-11-03
M02 01 2019-09-12 2019-09-12
M02 02 2019-09-23 2019-09-23
M02 03 2019-09-26 2019-09-26
M02 04 2019-09-30 2019-09-30
M02 05 2019-10-08 2019-10-08
;

proc sql;
create table want as
select
  a.*,
  vs.vstestcd,
  vs.vsdt
from (
  select
    id,
    max(visit) as visit format=z2.,
    svstdt,
    svendt
  from sv
  group by id, svstdt, svendt
  having sv.visit = calculated visit
) a left join vs
on a.id = vs.id and a.svstdt <= vs.vsdt <= a.svendt;
quit;

proc print data=want noobs;
run;

(I added an additional observation in sv to create your "double" scenario, and changed visit to numeric, to allow use of the max() summary function)

Result:

id     visit        svstdt        svendt    vstestcd          vsdt

M01     02      2019-10-15    2019-10-15       AA       2019-10-15
M01     04      2019-10-23    2019-10-23       AA       2019-10-23
M01     02      2019-09-26    2019-09-26       BB       2019-09-26
M01     02      2019-09-26    2019-09-26       AA       2019-09-26
M01     02      2019-10-15    2019-10-15       BB       2019-10-15
M01     03      2019-10-19    2019-10-19                         .
M01     05      2019-11-03    2019-11-03                         .
M02     01      2019-09-12    2019-09-12       BB       2019-09-12
M02     01      2019-09-12    2019-09-12       AA       2019-09-12
M02     04      2019-09-30    2019-09-30       AA       2019-09-30
M02     02      2019-09-23    2019-09-23       BB       2019-09-23
M02     02      2019-09-23    2019-09-23       AA       2019-09-23
M02     03      2019-09-26    2019-09-26                         .
M02     05      2019-10-08    2019-10-08                         .

As you can see, on 2019-09-26 only visit 02 is taken.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SAS Programming

Why hash(hiter) do not map by key?

Re: Why hash(hiter) do not map by key?

Re: Why hash(hiter) do not map by key?

Re: Why hash(hiter) do not map by key?

Re: Why hash(hiter) do not map by key?

Re: Why hash(hiter) do not map by key?

Re: Why hash(hiter) do not map by key?

Re: Why hash(hiter) do not map by key?

Re: Why hash(hiter) do not map by key?

Re: Why hash(hiter) do not map by key?

Re: Why hash(hiter) do not map by key?

평가지표 - MAP

Resilience: A Key to Developing Future Skills

Propagating Event Attributes in CI360 Activity Maps

Mapping SPD Server Features to SingleStore

KEY NOT FOUND error

Follow Us

What is...

SAS Programming

Register Today!

SAS Training: Just a Click Away

Follow Us

What is...