- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
what I have as below:
data vs;
input id $1-4 vstestcd $6-7 vsdtc $9-18;
datalines;
M01 AA 2019-09-26
M01 BB 2019-09-26
M01 AA 2019-10-15
M01 BB 2019-10-15
M01 AA 2019-10-23
M02 AA 2019-09-12
M02 BB 2019-09-12
M02 AA 2019-09-23
M02 BB 2019-09-23
M02 AA 2019-09-30
;
run;
data sv;
input id $1-4 visit $6-7 svstdtc $9-18 svendtc $19-29;
datalines;
M01 01 2019-09-26 2019-09-26
M01 02 2019-10-15 2019-10-15
M01 03 2019-10-19 2019-10-19
M01 04 2019-10-23 2019-10-23
M01 05 2019-11-03 2019-11-03
M02 01 2019-09-12 2019-09-12
M02 02 2019-09-23 2019-09-23
M02 03 2019-09-26 2019-09-26
M02 04 2019-09-30 2019-09-30
M02 05 2019-10-08 2019-10-08
;
run;
It's just a small part of the data and I want to use hash by down to up to traversal SV dataset to connect VS dataset.
Because there some records have multiple date for different visit in sv dataset,and I want to map last visit in sv data when SVSTDTC<=vsdt<=SVENDTC.
My code is as below:
data test;
if _n_=1 then do;
if 0 then set sv;
declare hash h1(dataset:'sv', multidata: "Y");
dcl hiter iter('h1');
h1.definekey('id');
h1.definedata('SVENDTC','SVSTDTC','VISIT');
h1.definedone();
end;
do until(eof);
set vs end=eof;
do RC = iter.LAST() by 0 while (RC = 0) ;
if svstdtc<=vsdtc<=svendtc then goto edit;
rc=iter.PREV();
end;
edit:
output;
end;
run;
I use iter.LAST and iter.PREV to traversal from bottom to top, but the key is not map.
The second ID's visit is connected to the first ID.
Please help me and thanks.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Is this picture the entire desired output from your sample data?
- Your use of the iterator does not guarantee an ID match. It only satisfies the date condition. In fact you don't need the iterator, since you really want only to search within tied id items.
- Even if you find the right ID, please remember that (for multidata:'Y') when there are more that one hash dataitem for a key, they are stored in the order encountered, so you must be sure that your data set is already in the desired order. But if your SV dataset is already in the desired order, then you can:
data vs;
input id $1-4 vstestcd $6-7 vsdtc $9-18;
datalines;
M01 AA 2019-09-26
M01 BB 2019-09-26
M01 AA 2019-10-15
M01 BB 2019-10-15
M01 AA 2019-10-23
M02 AA 2019-09-12
M02 BB 2019-09-12
M02 AA 2019-09-23
M02 BB 2019-09-23
M02 AA 2019-09-30
;
run;
data sv;
input id $1-4 visit $6-7 svstdtc $9-18 svendtc $19-29;
datalines;
M01 01 2019-09-26 2019-09-26
M01 02 2019-10-15 2019-10-15
M01 03 2019-10-19 2019-10-19
M01 04 2019-10-23 2019-10-23
M01 05 2019-11-03 2019-11-03
M02 01 2019-09-12 2019-09-12
M02 02 2019-09-23 2019-09-23
M02 03 2019-09-26 2019-09-26
M02 04 2019-09-30 2019-09-30
M02 05 2019-10-08 2019-10-08
;
run;
data test (drop=_:);
if _n_=1 then do;
if 0 then set sv;
declare hash h1(dataset:'sv', multidata: "Y");
h1.definekey('id');
h1.definedata('SVENDTC','SVSTDTC','VISIT');
h1.definedone();
end;
set vs;
_rc=h1.find();
if _rc=0;
do _i=1 by 1 until (_rc^=0);
if svstdtc<=vsdtc<=svendtc then _latest_success=_i;
_rc=h1.find_next();
end;
if _latest_success^=.;
_rc=h1.find();
if _latest_success>1 then do _i=2 to _latest_success;
_rc=h1.find_next();
end;
run;
- The FIND method always matches to the first matching data item when there are ties.
- The FIND_NEXT method advances within the tied data items. It returns a non-zero when all the ties are exhausted. So this program uses FIND_NEXT to step through all the matching ID's, And it updates the _latest_found index only when the date condition is satisfied. At the end of the loop _latest_found will specify the most recent satisfactory date for a matching id.
- Then use the FIND again to go to the beginning of the matching id's. And use a counter to re-apply FIND_NEXT up to _latest_found to get and keep the most recent matching data item in the hash object.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm not sure I entirely understand what you want. Could you show us what the desired output would be for your sample data?
thanks
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, thanks for your response. My English is not good and I uploaded a new picture, hoping to help you understand。
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Don't see a picture?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
And now?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Is this picture the entire desired output from your sample data?
- Your use of the iterator does not guarantee an ID match. It only satisfies the date condition. In fact you don't need the iterator, since you really want only to search within tied id items.
- Even if you find the right ID, please remember that (for multidata:'Y') when there are more that one hash dataitem for a key, they are stored in the order encountered, so you must be sure that your data set is already in the desired order. But if your SV dataset is already in the desired order, then you can:
data vs;
input id $1-4 vstestcd $6-7 vsdtc $9-18;
datalines;
M01 AA 2019-09-26
M01 BB 2019-09-26
M01 AA 2019-10-15
M01 BB 2019-10-15
M01 AA 2019-10-23
M02 AA 2019-09-12
M02 BB 2019-09-12
M02 AA 2019-09-23
M02 BB 2019-09-23
M02 AA 2019-09-30
;
run;
data sv;
input id $1-4 visit $6-7 svstdtc $9-18 svendtc $19-29;
datalines;
M01 01 2019-09-26 2019-09-26
M01 02 2019-10-15 2019-10-15
M01 03 2019-10-19 2019-10-19
M01 04 2019-10-23 2019-10-23
M01 05 2019-11-03 2019-11-03
M02 01 2019-09-12 2019-09-12
M02 02 2019-09-23 2019-09-23
M02 03 2019-09-26 2019-09-26
M02 04 2019-09-30 2019-09-30
M02 05 2019-10-08 2019-10-08
;
run;
data test (drop=_:);
if _n_=1 then do;
if 0 then set sv;
declare hash h1(dataset:'sv', multidata: "Y");
h1.definekey('id');
h1.definedata('SVENDTC','SVSTDTC','VISIT');
h1.definedone();
end;
set vs;
_rc=h1.find();
if _rc=0;
do _i=1 by 1 until (_rc^=0);
if svstdtc<=vsdtc<=svendtc then _latest_success=_i;
_rc=h1.find_next();
end;
if _latest_success^=.;
_rc=h1.find();
if _latest_success>1 then do _i=2 to _latest_success;
_rc=h1.find_next();
end;
run;
- The FIND method always matches to the first matching data item when there are ties.
- The FIND_NEXT method advances within the tied data items. It returns a non-zero when all the ties are exhausted. So this program uses FIND_NEXT to step through all the matching ID's, And it updates the _latest_found index only when the date condition is satisfied. At the end of the loop _latest_found will specify the most recent satisfactory date for a matching id.
- Then use the FIND again to go to the beginning of the matching id's. And use a counter to re-apply FIND_NEXT up to _latest_found to get and keep the most recent matching data item in the hash object.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your response!!
Using the FIND again that makes me becoming enlightened at once!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why don't you use a simple SQL?
data vs;
length
id $3
vstestcd $2
vsdt 4
;
format vsdt yymmddd10.;
input id vstestcd vsdt :yymmdd10.;
datalines;
M01 AA 2019-09-26
M01 BB 2019-09-26
M01 AA 2019-10-15
M01 BB 2019-10-15
M01 AA 2019-10-23
M02 AA 2019-09-12
M02 BB 2019-09-12
M02 AA 2019-09-23
M02 BB 2019-09-23
M02 AA 2019-09-30
;
data sv;
length
id $3
visit $2
svstdt
svendt
4
;
format
svstdt
svendt
yymmddd10.
;
input id visit (svstdt svendt) (:yymmdd10.);
datalines;
M01 01 2019-09-26 2019-09-26
M01 02 2019-10-15 2019-10-15
M01 03 2019-10-19 2019-10-19
M01 04 2019-10-23 2019-10-23
M01 05 2019-11-03 2019-11-03
M02 01 2019-09-12 2019-09-12
M02 02 2019-09-23 2019-09-23
M02 03 2019-09-26 2019-09-26
M02 04 2019-09-30 2019-09-30
M02 05 2019-10-08 2019-10-08
;
proc sql;
create table want as
select
sv.*,
vs.vstestcd,
vs.vsdt
from sv left join vs
on sv.id = vs.id and sv.svstdt <= vs.vsdt <= sv.svendt;
quit;
proc print data=want noobs;
run;
Result:
id visit svstdt svendt vstestcd vsdt M01 02 2019-10-15 2019-10-15 AA 2019-10-15 M01 04 2019-10-23 2019-10-23 AA 2019-10-23 M01 01 2019-09-26 2019-09-26 BB 2019-09-26 M01 01 2019-09-26 2019-09-26 AA 2019-09-26 M01 02 2019-10-15 2019-10-15 BB 2019-10-15 M01 03 2019-10-19 2019-10-19 . M01 05 2019-11-03 2019-11-03 . M02 01 2019-09-12 2019-09-12 BB 2019-09-12 M02 01 2019-09-12 2019-09-12 AA 2019-09-12 M02 04 2019-09-30 2019-09-30 AA 2019-09-30 M02 02 2019-09-23 2019-09-23 BB 2019-09-23 M02 02 2019-09-23 2019-09-23 AA 2019-09-23 M02 05 2019-10-08 2019-10-08 . M02 03 2019-09-26 2019-09-26 .
You can see there is no cross-mapping.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi thanks for your response.
Because there are maybe multiple records with the same svstdtc\svstdtc but different visit existing in SV data.
Just like “
M01 01 2019-09-26 2019-09-26
M01 02 2019-09-26 2019-09-26
”
When I use proc sql,the records will be increase.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Lee_wan wrote:
Hi thanks for your response.
Because there are maybe multiple records with the same svstdtc\svstdtc but different visit existing in SV data.
Just like “
M01 01 2019-09-26 2019-09-26
M01 02 2019-09-26 2019-09-26
”
When I use proc sql,the records will be increase.
Then use a subselect to reduce the entries in sv:
data vs;
length
id $3
vstestcd $2
vsdt 4
;
format vsdt yymmddd10.;
input id vstestcd vsdt :yymmdd10.;
datalines;
M01 AA 2019-09-26
M01 BB 2019-09-26
M01 AA 2019-10-15
M01 BB 2019-10-15
M01 AA 2019-10-23
M02 AA 2019-09-12
M02 BB 2019-09-12
M02 AA 2019-09-23
M02 BB 2019-09-23
M02 AA 2019-09-30
;
data sv;
length
id $3
visit 3
svstdt
svendt
4
;
format
visit z2.
svstdt
svendt
yymmddd10.
;
input id visit (svstdt svendt) (:yymmdd10.);
datalines;
M01 01 2019-09-26 2019-09-26
M01 02 2019-09-26 2019-09-26
M01 02 2019-10-15 2019-10-15
M01 03 2019-10-19 2019-10-19
M01 04 2019-10-23 2019-10-23
M01 05 2019-11-03 2019-11-03
M02 01 2019-09-12 2019-09-12
M02 02 2019-09-23 2019-09-23
M02 03 2019-09-26 2019-09-26
M02 04 2019-09-30 2019-09-30
M02 05 2019-10-08 2019-10-08
;
proc sql;
create table want as
select
a.*,
vs.vstestcd,
vs.vsdt
from (
select
id,
max(visit) as visit format=z2.,
svstdt,
svendt
from sv
group by id, svstdt, svendt
having sv.visit = calculated visit
) a left join vs
on a.id = vs.id and a.svstdt <= vs.vsdt <= a.svendt;
quit;
proc print data=want noobs;
run;
(I added an additional observation in sv to create your "double" scenario, and changed visit to numeric, to allow use of the max() summary function)
Result:
id visit svstdt svendt vstestcd vsdt M01 02 2019-10-15 2019-10-15 AA 2019-10-15 M01 04 2019-10-23 2019-10-23 AA 2019-10-23 M01 02 2019-09-26 2019-09-26 BB 2019-09-26 M01 02 2019-09-26 2019-09-26 AA 2019-09-26 M01 02 2019-10-15 2019-10-15 BB 2019-10-15 M01 03 2019-10-19 2019-10-19 . M01 05 2019-11-03 2019-11-03 . M02 01 2019-09-12 2019-09-12 BB 2019-09-12 M02 01 2019-09-12 2019-09-12 AA 2019-09-12 M02 04 2019-09-30 2019-09-30 AA 2019-09-30 M02 02 2019-09-23 2019-09-23 BB 2019-09-23 M02 02 2019-09-23 2019-09-23 AA 2019-09-23 M02 03 2019-09-26 2019-09-26 . M02 05 2019-10-08 2019-10-08 .
As you can see, on 2019-09-26 only visit 02 is taken.