BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JinboZhao
Calcite | Level 5

Hi SAS,

 

I have a data set which contains over 30 character and numeric variables. Based on this dataset, I generate a scorecard. Now, I want to replace each variable's value with the score I got. 

Here is part of my dataset: 

IDDAYSGRADEREMAINING_PERIOD
152A5.23
262C10.95
382D20.22
431C.
542B8.66

 

Here is part of my scorecard:

VariableGroupLableScorecard
Points
DAYS22<= DAYS< 512
DAYS35<= DAYS< 6-4
DAYS46<= DAYS< 97
GRADE12A, 2C, 3A6
GRADE22D55
GRADE31C, 2B, 3B, _MISSING_, _UNKNOWN_6
GRADE40, 48
REMAINING_PERIOD34.66<= REMAINING_PERIOD< 9.2813
REMAINING_PERIOD49.28<= REMAINING_PERIOD< 11.6518
REMAINING_PERIOD511.65<= REMAINING_PERIOD< 23.32, _MISSING_9

 

The below one is the result I want to generate:

SCORE TABLE  
IDDAYSGRADEREMAINING_PERIOD
1-4613
27618
37559
41269
512613

 

I can do it using DATA, IF, ELSE IF, ELSE, RUN. But since I have more than 150 groups in my scorecard, it is really a time-consuming work. 

 

Could any expert suggest me an easier and time-consuming way? I am thing maybe I can use some REPLACE procedure, but not familiar with it. Could you help me?

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

It looks to me like you have code generation problem.

Do you really want to generate variables that use the same names as the input data?  Because it looks like all of your scores are numeric and your input variables are mixed.

 

You probably just need to beef up your metadata (scorecard) so that it contains all of the information you need to generate the code to compute the score.  For example you might want to either include a variable to tell if the source variable is character or numeric.  Or at least populate the values of the character variables with quotes so that the generated code will be valid SAS syntax. So perhaps something like this:

data score_meta ;
  length var $32 order 8 condition $200 points 8;
  infile cards dsd dlm='|' ; 
  input var order condition points;
cards;
DAYS|2|2<= DAYS< 5|12
DAYS|3|5<= DAYS< 6|-4
DAYS|4|6<= DAYS< 9|7
GRADE|1|grade in ('2A' '2C' '3A')|6
GRADE|2|grade='2D'|55
GRADE|3|grade in ('1C' '2B' '3B' '_MISSING_' '_UNKNOWN_')|6
GRADE|4|grade in ('0' '4')|8
REMAINING_PERIOD|3|4.66<= REMAINING_PERIOD< 9.28|13
REMAINING_PERIOD|4|9.28<= REMAINING_PERIOD< 11.65|18
REMAINING_PERIOD|5|11.65<= REMAINING_PERIOD< 23.32|9
REMAINING_PERIOD|5|REMAINING_PERIOD is null|9
;

Then you could use that to generate an SQL statement use a simple data _null_ step.

filename code temp;
data _null_;
  set score_meta end=eof;
  by var order ;
  file code ;
  if _n_=1 then put
/ 'create table want as'
/ '  select id'
  ;
  if first.var then put '      , case' ;
  put '        when ('  condition ') then ' points ;
  if last.var  then put '        end as ' var ;
  if eof then put
  '  from have'
/ ';'
  ;
run;

Which you could then execute using %INCLUDE.

proc sql;
%include code / source2 ;
quit;

So for the metadata I posted above the SQL code would look like this:

create table want as
  select id
      , case
        when (2<= DAYS< 5 ) then 12
        when (5<= DAYS< 6 ) then -4
        when (6<= DAYS< 9 ) then 7
        end as DAYS
      , case
        when (grade in ('2A' '2C' '3A') ) then 6
        when (grade='2D' ) then 55
        when (grade in ('1C' '2B' '3B' '_MISSING_' '_UNKNOWN_') ) then 6
        when (grade in ('0' '4') ) then 8
        end as GRADE
      , case
        when (4.66<= REMAINING_PERIOD< 9.28 ) then 13
        when (9.28<= REMAINING_PERIOD< 11.65 ) then 18
        when (11.65<= REMAINING_PERIOD< 23.32 ) then 9
        when (REMAINING_PERIOD is null ) then 9
        end as REMAINING_PERIOD
  from have
;

 

View solution in original post

7 REPLIES 7
RW9
Diamond | Level 26 RW9
Diamond | Level 26

 

Unfortunately I don't see a simpler way.  Your logical conditions are quite variaed and use multiple variables, if it was just one variable, then you could apply a format, however that is not the case.  You could perhaps save a bit of typing by using select syntax:

data want;
  set have;
  select;
    when(2 <= input(days,best.) < 5) then scorecard="12";
    ...
    when(grade in ("2A","2C","3A") then scorecard=6;
...
    otherwise;
  end;
run;
Astounding
PROC Star

You could conceivably create a set of formats based on your SCORECARD data.  That's work, but perhaps not as much work.  You would begin by creating two SAS data sets (one for numeric values, one for character values).  For numeric variables, LABEL would be numeric:

 

Variable                  Start   End   Label

DAYS                        2      4     12

DAYS                        5      5     -4

DAYS                        6      8      7

REMAINING_PERIOD          4.66    9.28   13

REMAINING_PERIOD          9.28   11.65   18

 

I'm not sure if that would be simpler than what you have now.  It's actually more complex than this, since a fifth variable needs to be added.  SAS needs to be told that 9.28 exactly belongs in the "18" category, not the "13" category.  And when it comes to the character fields, separate lines are required for each value (2A, 2C, and 3A for example all need to go on separate lines as the value of START).

 

If this is simpler than what you have now (difficult to tell from here), the remaining programming would be relatively easy.  But at least y ou have a choice to think about.

TomKari
Onyx | Level 15

Now THAT's an interesting one!

 

Here's an option, very complicated, but might be both reasonably time-saving and able to be changed.

 

First, assume we input your data, I'm naming the dataset "Have":

 

data Have;

length ID 8 DAYS 8 GRADE $5 REMAINING_PERIOD 8;

input ID DAYS GRADE REMAINING_PERIOD;

cards;

1 5 2A 5.23

2 6 2C 10.95

3 8 2D 20.22

4 3 1C .

5 4 2B 8.66

run;

 

Next, we input your specs for transformation:

 

data ScorecardSpecs;

length Variable $32 Group 8 Label $128 ScorecardPoints 8;

input Variable Group Label && ScorecardPoints;

cards;

DAYS 2 2<= DAYS< 5 12

DAYS 3 5<= DAYS< 6 -4

DAYS 4 6<= DAYS< 9 7

GRADE 1 2A, 2C, 3A 6

GRADE 2 2D 55

GRADE 3 1C, 2B, 3B, _MISSING_, _UNKNOWN_ 6

GRADE 4 0, 4 8

REMAINING_PERIOD 3 4.66<= REMAINING_PERIOD< 9.28 13

REMAINING_PERIOD 4 9.28<= REMAINING_PERIOD< 11.65 18

REMAINING_PERIOD 5 11.65<= REMAINING_PERIOD< 23.32 9

REMAINING_PERIOD 5 REMAINING_PERIOD = _MISSING_ 9

run;

 

Note the two spaces after each value of "Label".

 

Now this is where things get weird. I'm just doing a first-level implementation, as a proof of concept. In this, the macro Days writes out the transformations needed for variable "DAYS". Then the macro is run, and generates the code. Finally, the data step creates dataset Want by including in the transformation. (I'm creating a variable NewDays, so you can see the original and the derived, but this is easy to change).

 

%let SpecVar = DAYS;

%macro Days;

data _null_;

file 'C:...\GenDays.sas';

length CodeLine $512;

set ScoreCardSpecs;

if variable = "&SpecVar."

then do;

CodeLine = catx(" ", "if", Label, "then NewDays =", ScoreCardPoints, ";");

put CodeLine;

end;

run;

%mend;

%Days;

data want;

set have;

%include 'C:...\GenDays.sas';

run;

 

It seems to work for this variable.

 

To make this reasonable efficient to code, I was thinking the second-level implementation would be to have a macro that would generate all of the transformations of one type, another to do the transformations of another type, etc. This is where I don't know how many transformation "types" you have. In your post, it looks like DAYS and REMAINING_PERIOD are one type, and GRADE is another.

 

Second question, are you comfortable with this level of complexity?

 

Before I do any more, let me know if this is going in a reasonable direction.

 

Tom

Tom
Super User Tom
Super User

It looks to me like you have code generation problem.

Do you really want to generate variables that use the same names as the input data?  Because it looks like all of your scores are numeric and your input variables are mixed.

 

You probably just need to beef up your metadata (scorecard) so that it contains all of the information you need to generate the code to compute the score.  For example you might want to either include a variable to tell if the source variable is character or numeric.  Or at least populate the values of the character variables with quotes so that the generated code will be valid SAS syntax. So perhaps something like this:

data score_meta ;
  length var $32 order 8 condition $200 points 8;
  infile cards dsd dlm='|' ; 
  input var order condition points;
cards;
DAYS|2|2<= DAYS< 5|12
DAYS|3|5<= DAYS< 6|-4
DAYS|4|6<= DAYS< 9|7
GRADE|1|grade in ('2A' '2C' '3A')|6
GRADE|2|grade='2D'|55
GRADE|3|grade in ('1C' '2B' '3B' '_MISSING_' '_UNKNOWN_')|6
GRADE|4|grade in ('0' '4')|8
REMAINING_PERIOD|3|4.66<= REMAINING_PERIOD< 9.28|13
REMAINING_PERIOD|4|9.28<= REMAINING_PERIOD< 11.65|18
REMAINING_PERIOD|5|11.65<= REMAINING_PERIOD< 23.32|9
REMAINING_PERIOD|5|REMAINING_PERIOD is null|9
;

Then you could use that to generate an SQL statement use a simple data _null_ step.

filename code temp;
data _null_;
  set score_meta end=eof;
  by var order ;
  file code ;
  if _n_=1 then put
/ 'create table want as'
/ '  select id'
  ;
  if first.var then put '      , case' ;
  put '        when ('  condition ') then ' points ;
  if last.var  then put '        end as ' var ;
  if eof then put
  '  from have'
/ ';'
  ;
run;

Which you could then execute using %INCLUDE.

proc sql;
%include code / source2 ;
quit;

So for the metadata I posted above the SQL code would look like this:

create table want as
  select id
      , case
        when (2<= DAYS< 5 ) then 12
        when (5<= DAYS< 6 ) then -4
        when (6<= DAYS< 9 ) then 7
        end as DAYS
      , case
        when (grade in ('2A' '2C' '3A') ) then 6
        when (grade='2D' ) then 55
        when (grade in ('1C' '2B' '3B' '_MISSING_' '_UNKNOWN_') ) then 6
        when (grade in ('0' '4') ) then 8
        end as GRADE
      , case
        when (4.66<= REMAINING_PERIOD< 9.28 ) then 13
        when (9.28<= REMAINING_PERIOD< 11.65 ) then 18
        when (11.65<= REMAINING_PERIOD< 23.32 ) then 9
        when (REMAINING_PERIOD is null ) then 9
        end as REMAINING_PERIOD
  from have
;

 

JinboZhao
Calcite | Level 5

Hi, sorry to reply you so late.

This code does not work well on my SAS, it always say, the FILE WORK.HAVE.DATA does not exist. 

I could not find the reason, could you help me?

Thank you. 

Tom
Super User Tom
Super User

If you have an error you need to post the log so that the error can be seen in context.

 

Most likely you have copied example code from the discussion and not replaced the generic dataset name that was used to show the structure of the code you need with the name of your actual dataset.

 

If is common to post code in the form.

data want;
  set have ;
   ...

since we are not running SAS on your machine and so do not have access to your datasets. You just need to adapt the code to work for your situation.

 

 

JinboZhao
Calcite | Level 5

Thank you very much. The exercise code works well in my SAS. It should work on my data. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1140 views
  • 0 likes
  • 5 in conversation