Hi all,
I'm working on a data management query report. I've a dataset which stores information about variables where query was placed. Based on that information, I've to find current value of the quaried variable.
data query;
input ptid visit source_data $ var $;
cards;
1 10 PE HEIGHT
1 10 AE AEPT
1 999 DM RACE
2 20 PE WEIGHT
;
run;
data PE;
input ptid visit HEIGHT WEIGHT;
cards;
1 10 5 60
1 20 6 80
1 30 4 40
2 10 5 60
2 20 6 80
2 30 4 40
;
data AE;
input ptid visit AEPT $ AESEV;
cards;
1 10 COUGH 1
1 20 DIZZY 2
1 30 FATIGUE 3
2 10 FEVER 4
2 20 NAUSEA 3
2 30 VOMIT 1
;
data DM;
input ptid visit RACE $;
cards;
1 999 ASIAN
2 999 AMERICAN
;
Here is desired output data:
data desired;
input ptid visit source_data $ var $ current $;
cards;
1 10 PE HEIGHT 5
1 10 AE AEPT COUGH
1 999 DM RACE ASIAN
2 20 PE WEIGHT 80
;
;
run;
In reality, I've hundreads of records and ~ 50 variables in each dataset. I would like to know efficient way of solving this problem!
Thanks in advance!
So let's see how we could find the records you want from the PE dataset. Something like this might do the trick.
data next ;
  length ptid visit 8 source_data var $32 current $32 ;
  keep ptid -- current ;
  set pe ;
  retain source_data 'PE';
  if ptid=1 and visit=10 then do;
    var = 'HEIGHT';
    current=vvalue(HEIGHT);
    output;
  end;
  if ptid=2 and visit=20 then do;
    var = 'WEIGHT';
    current=vvalue(WEIGHT);
    output;
  end;
run;So then it should be easy to generate code like that for each of the source datasets.
So then just generate one of those for each value of SOURCE_DATA in the metadata file. Along with some PROC APPEND code to aggregate them.
proc sort data=query ;
  by source_data ptid visit var ;
run;
filename code temp;
data _null_;
  set query;
  by source_data ;
  file code ;
  if _n_=1 then put
 'proc delete data=want;'
/'run;'
  ;
  if first.source_data then put 
 'data next ;'
/'  length ptid visit 8 source_data var $32 current $32 ;'
/'  keep ptid -- current ;'
/'  set ' source_data ';'
/'  retain source_data ' source_data :$quote. ';'
  ;
  put
 '  if ' ptid= 'and  ' visit= 'then do;'
/'    var = ' var :$quote. ';'
/'    current=vvalue(' var ');'
/'    output;'
/'  end;'
  ;
  if last.source_data then put
 'run;'
/'proc append base=want data=next force;'
/'run;'
  ;
run;
%include code / source2 ;
proc print; run;source_ Obs ptid visit data var current 1 1 10 AE AEPT COUGH 2 1 999 DM RACE ASIAN 3 1 10 PE HEIGHT 5 4 2 20 PE WEIGHT 80
Not sure what you mean by CURRENT value, but it looks like you have simple code generation problem.
Figure out what code you would need to get your desired result and then figure out how to generate that code from your input metadata table. Since your queries will need to analyze many input tables you might want to do it either one record of your metadata table at a time. But you could also possible group the metadata by the source data table and try to generate the results for all queries against that table at the same time.
Thanks for reply. CURRENT value is nothing but value of that variable for given ptid at that visit.
So let's see how we could find the records you want from the PE dataset. Something like this might do the trick.
data next ;
  length ptid visit 8 source_data var $32 current $32 ;
  keep ptid -- current ;
  set pe ;
  retain source_data 'PE';
  if ptid=1 and visit=10 then do;
    var = 'HEIGHT';
    current=vvalue(HEIGHT);
    output;
  end;
  if ptid=2 and visit=20 then do;
    var = 'WEIGHT';
    current=vvalue(WEIGHT);
    output;
  end;
run;So then it should be easy to generate code like that for each of the source datasets.
So then just generate one of those for each value of SOURCE_DATA in the metadata file. Along with some PROC APPEND code to aggregate them.
proc sort data=query ;
  by source_data ptid visit var ;
run;
filename code temp;
data _null_;
  set query;
  by source_data ;
  file code ;
  if _n_=1 then put
 'proc delete data=want;'
/'run;'
  ;
  if first.source_data then put 
 'data next ;'
/'  length ptid visit 8 source_data var $32 current $32 ;'
/'  keep ptid -- current ;'
/'  set ' source_data ';'
/'  retain source_data ' source_data :$quote. ';'
  ;
  put
 '  if ' ptid= 'and  ' visit= 'then do;'
/'    var = ' var :$quote. ';'
/'    current=vvalue(' var ');'
/'    output;'
/'  end;'
  ;
  if last.source_data then put
 'run;'
/'proc append base=want data=next force;'
/'run;'
  ;
run;
%include code / source2 ;
proc print; run;source_ Obs ptid visit data var current 1 1 10 AE AEPT COUGH 2 1 999 DM RACE ASIAN 3 1 10 PE HEIGHT 5 4 2 20 PE WEIGHT 80
I don't see anything here besides the need for a join/merge.
Do you need help with understanding how to do that?
https://stats.idre.ucla.edu/sas/modules/match-merging-data-files-in-sas/
Though to be honest, usually most people who can't join can't prep sample data so is this homework or work?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
