Selecting a value from metadata based on each variable value in dataset

Accepted Solution Solved
Reply
Contributor
Posts: 25
Accepted Solution

Selecting a value from metadata based on each variable value in dataset

Hi,

Could someone give me an idea about how to do the following? Attached is the sample data in excel containing sample data and specifications for rank lookup. Here goes the problem:

for each ID in the dataset, look for all possible ranks in specifics sheet based on each variable value in dataset, sort the ranks in ascending order and output the top 2 ranks to final dataset. if a var is null then it shud also have rank. low and upvalues ('(' ']') signifies > and <=.

Thanks and appreciate for ur help in advance.

RK


Accepted Solutions
Solution
‎01-30-2013 01:47 PM
Super Contributor
Posts: 578

Re: Selecting a value from metadata based on each variable value in dataset

seems a bit kludgy...but....I had to add a few columns to your input data.

data have;

input ID type var110 var220 var330 var440;

v110_rank=0;

v220_rank=0;

v330_rank=0;

v440_rank=0;

cards;

9001 1 40 1 52 10

9002 1 20 0 23 25

9003 1 10 1 22 5

9003 2 20 0 33 12

9004 1 . -2 10 25

run;

data specifics;

input varname $ min_val max_val rank_val;

cards;

var110 0 10 25

var110 10 50 18

var110 . . 120

var220 . 0 12

var220 0 1 55

var330 0 20 44

var330 20 40 24

var330 40 . 29

var440 0 10 8

var440 10 20 22

var440 20 30 126

var440 . . 88

run;

proc sql;

update have t1

set

    v110_rank = (select rank_val from specifics where varname='var110' and t1.var110 > min_val and t1.var110 <= max_val)

    ,v220_rank = (select rank_val from specifics where varname='var220' and t1.var220 > min_val and t1.var220 <= max_val)

    ,v330_rank = (select rank_val from specifics where varname='var330' and t1.var330 > min_val and t1.var330 <= max_val)

    ,v440_rank = (select rank_val from specifics where varname='var440' and t1.var440 > min_val and t1.var440 <= max_val)

;

update have t1

set v110_rank=(select rank_val from specifics where varname='var110' and (min_val =. or max_val=.))

where v110_rank =.;

update have t1

set v220_rank=(select rank_val from specifics where varname='var220' and (min_val =. or max_val=.))

where v220_rank =.;

update have t1

set v330_rank=(select rank_val from specifics where varname='var330' and (min_val =. or max_val=.))

where v330_rank =.;

update have t1

set v440_rank=(select rank_val from specifics where varname='var440' and (min_val =. or max_val=.))

where v440_rank =.;

quit;

data have_2 (keep= ID type var110 var220 var330 var440 varname varrank);

set have;

varname='var110';

varrank=v110_rank;

output;

varname='var220';

varrank=v220_rank;

output;

varname='var330';

varrank=v330_rank;

output;

varname='var440';

varrank=v440_rank;

output;

run;

proc sort in=have_2 out=have_2;

by id varrank;

run;

data have_3;

set have_2;

by id;

retain rnk 0;

if first.id then rnk=0;

rnk+1;

if rnk le 2;

run;

data have_final(drop=rnk varrank varname);

set have_3;

by id;

length

    rnk1_var $10.

    rnk1 8.

    rnk2_var $10.

    rnk2 8.;

retain

    rnk1_var ''

    rnk1 0

    rnk2_var ''

    rnk2 0;

if first.id then do;

    rnk1=varrank;

    rnk1_var=varname;

    rnk2=.;

    end;

else do;

    rnk2 = varrank;

    rnk2_var=varname;

    output;

end;

run;

View solution in original post


All Replies
Super Contributor
Posts: 578

Re: Selecting a value from metadata based on each variable value in dataset

Does this seem to accomplish what you want?

proc sort in=have out=have_srt;

by var rank;

run;

data want(drop=rnk);

set have_srt;

by var;

retain rnk 0;

if first.var then rnk=0;

rnk+1;

if rnk le 2;

run;

Contributor
Posts: 25

Re: Selecting a value from metadata based on each variable value in dataset

Bailey,

Thanks for the response. two sheets in the sample excel file i uploaded before are two seperate datasets. for each id, I have to look for rank of each variable value based on min and max bucket in specifics sheet. And then sort the ranks. final dataset should contain two ranks(rank1 rank2) in two columns for each id.

Let me know if you need further information.

Thanks,

RK

Respected Advisor
Posts: 3,799

Re: Selecting a value from metadata based on each variable value in dataset

It would have been nice if you had included a WANT sheet.

What does 4th row in the specifics represent OTHER.

I think you can just make a FORMAT for each VAR in the specifics sheet.  You will need to add a character to the end of the value of VAR so that it will be a proper FORMAT name.

Contributor
Posts: 25

Re: Selecting a value from metadata based on each variable value in dataset

Posted in reply to data_null__

Hi,

thanks for you response. I have updated excel sheet with want sheet added to it. I want the output should be the same as the one in want sheet. Could you give me clear idea about how to proceed? do you think hash objects will work?

Rk


Respected Advisor
Posts: 3,799

Re: Selecting a value from metadata based on each variable value in dataset

This is what I was thinking about.  The meta data is used to create formats for each VAR (a look up table for each).  Then the formats are applied to find the rank.  The ranks are sorted to find the two smallest and then CODE is looked up from the CODE format, you only need one CODE format).  I made some assumptions about what your META data means with regards to missing min or max and how it relates to PROC FORMAT.  There is one difference for obs 2 that may be due to my assumptions about the meta data.

There is one place in the code where you have to know how many VARnnn variables there are.  This could be calculated or you could transpose the data and never have to know.

proc import datafile='~/sample.xlsx' out=have dbms=xlsx replace;
  
sheet='data';
  
run;
proc import datafile='~/sample.xlsx' out=meta dbms=xlsx replace;
  
sheet='specifics';
  
run;
proc import datafile='~/sample.xlsx' out=want dbms=xlsx replace;
  
sheet='want';
  
run;


data cntl1;
   set meta;
   length fmtname $32;
  
retain type 'N';
   fmtname = cats(var,
'_');
   start = min;
  
end   = max;
   if missing(min) and not missing(max) then hlo='L';
  
if missing(max) and not missing(min) then hlo='H';
  
if low eq '(' then sexcl='Y';
  
if up  eq ']' then eexcl='N';
  
label = rank;
   run;
proc format cntlin=cntl1 cntlout=cntlout;
   select var:;
   run;
proc print;
  
run;

data cntl2;
   set meta;
   retain fmtname 'code' type 'N';
   start = rank;
  
label = code;
   run;
proc format cntlin=cntl2 cntlout=cntlout2;
   select @$code:;
   run;
proc print;
  
run;

data want2;
   set have;
   array v
  • var:;
  •    array r[4] 8 _temporary_; *you have to know how may VARnnn variables;
      
    call missing(of r
  • );
  •    do i = 1 to dim(v);
          r = input(putn(v,cats(vname(v),'_')),F8.);
          end;
      
    call sortN(of r
  • );
  •    rank1 = r[1];
       code1 = put(r[1],code.);
       rank2 = r[2];
       code2 = put(r[2],code.);
       run;
    proc print;
      
    run;
    proc print data=want;
       run;
    Solution
    ‎01-30-2013 01:47 PM
    Super Contributor
    Posts: 578

    Re: Selecting a value from metadata based on each variable value in dataset

    seems a bit kludgy...but....I had to add a few columns to your input data.

    data have;

    input ID type var110 var220 var330 var440;

    v110_rank=0;

    v220_rank=0;

    v330_rank=0;

    v440_rank=0;

    cards;

    9001 1 40 1 52 10

    9002 1 20 0 23 25

    9003 1 10 1 22 5

    9003 2 20 0 33 12

    9004 1 . -2 10 25

    run;

    data specifics;

    input varname $ min_val max_val rank_val;

    cards;

    var110 0 10 25

    var110 10 50 18

    var110 . . 120

    var220 . 0 12

    var220 0 1 55

    var330 0 20 44

    var330 20 40 24

    var330 40 . 29

    var440 0 10 8

    var440 10 20 22

    var440 20 30 126

    var440 . . 88

    run;

    proc sql;

    update have t1

    set

        v110_rank = (select rank_val from specifics where varname='var110' and t1.var110 > min_val and t1.var110 <= max_val)

        ,v220_rank = (select rank_val from specifics where varname='var220' and t1.var220 > min_val and t1.var220 <= max_val)

        ,v330_rank = (select rank_val from specifics where varname='var330' and t1.var330 > min_val and t1.var330 <= max_val)

        ,v440_rank = (select rank_val from specifics where varname='var440' and t1.var440 > min_val and t1.var440 <= max_val)

    ;

    update have t1

    set v110_rank=(select rank_val from specifics where varname='var110' and (min_val =. or max_val=.))

    where v110_rank =.;

    update have t1

    set v220_rank=(select rank_val from specifics where varname='var220' and (min_val =. or max_val=.))

    where v220_rank =.;

    update have t1

    set v330_rank=(select rank_val from specifics where varname='var330' and (min_val =. or max_val=.))

    where v330_rank =.;

    update have t1

    set v440_rank=(select rank_val from specifics where varname='var440' and (min_val =. or max_val=.))

    where v440_rank =.;

    quit;

    data have_2 (keep= ID type var110 var220 var330 var440 varname varrank);

    set have;

    varname='var110';

    varrank=v110_rank;

    output;

    varname='var220';

    varrank=v220_rank;

    output;

    varname='var330';

    varrank=v330_rank;

    output;

    varname='var440';

    varrank=v440_rank;

    output;

    run;

    proc sort in=have_2 out=have_2;

    by id varrank;

    run;

    data have_3;

    set have_2;

    by id;

    retain rnk 0;

    if first.id then rnk=0;

    rnk+1;

    if rnk le 2;

    run;

    data have_final(drop=rnk varrank varname);

    set have_3;

    by id;

    length

        rnk1_var $10.

        rnk1 8.

        rnk2_var $10.

        rnk2 8.;

    retain

        rnk1_var ''

        rnk1 0

        rnk2_var ''

        rnk2 0;

    if first.id then do;

        rnk1=varrank;

        rnk1_var=varname;

        rnk2=.;

        end;

    else do;

        rnk2 = varrank;

        rnk2_var=varname;

        output;

    end;

    run;

    Contributor
    Posts: 25

    Re: Selecting a value from metadata based on each variable value in dataset

    Thank you dbailey and data _null_. Both of your code worked.

    🔒 This topic is solved and locked.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 7 replies
    • 560 views
    • 6 likes
    • 3 in conversation