Re: Break down character variable and/or extract only particular infor...

_MVB_ · Posted 02-04-2021 01:51 AM

I am looking for a way to create a couple/few step code to do following:

I have two variables, ID and DECISION (screenshot attached), and I have another variable in a different dataset (variable called Var1) that can be empty or any number from 0 to infinite (with decimals), for example first row in the Var1 is 5, and second row is Var1 is 15.75

What do I need is to create a new variable that would assign ID number based on where it falls in a combination of two variables (ID and DECISION) - meaning for value 5, it should return 1 in a newly created variable, and for 15.75, return 2, etc for other variations.

Ideally, i would prefer to break down (ID and DECISION variables first) and maybe put the in rows so i can merge to my main dataset with variable that i want to run against ID and DECISION.

screenshot of variables in my first dataset (my input variable) attached below.

andreas_lds · Posted 02-04-2021 02:20 AM

Sorry, but i don't get it. Please post what you have in usable form and show the expected result. Maybe you can solve the problem by defining a format using "decision" as start/end and ID as label, but that would require some changes to the values of "decision" and i am not sure, if those changes can be done automatically.

ballardw · Posted 02-04-2021 04:03 AM

This sort of look up is often done using a format.

This example creates a format to display numeric values as characters 1 , 2 ,3, 4, a data set with some values and two ways to use the format. One creating a new variable with the formatted value and the other using the format directly with the variable.

Proc format;
value mylookup
low  -  <10.150 = '1'
10.150 -<20.180 = '2'
20.180 - high   = '3'
.               = '4'
;

data example;
   input x;
datalines;
-43
0.0001
9
10.14999999
10.150
15
20.17999
20.180
123456789
.
;

data use;
   set example;
   Id = put(x,mylookup.);
run;

proc print data=example;
   format x mylookup.;
run;

Id values are typically character, i.e. you don't add, subtract, multiply or similar with them. If you really need a numeric value you could do:

Proc format;
value mylookup
low  -  <10.150 = '1'
10.150 -<20.180 = '2'
20.180 - high   = '3'
.               = '4'
;

data example;
   input x;
datalines;
-43
0.0001
9
10.14999999
10.150
15
20.17999
20.180
123456789
.
;

data use;
   set example;
   Id = put(x,mylookup.);
run;

proc print data=example;
   format x mylookup.;
run;

For a great many purposes use the Format as needed as in the Proc Print example is often better than adding lots of new variables. You can create a different format to use with the same variable to create different groups of values and these groups are honored by the reporting procedures like Proc Report and Tabulate and almost all of the analysis and graphing programs.

Some caveats:

The format will only work with single values, you can't use two variables together.

The format needs to be made available to each session of SAS you want to use it with. Either rerun the Proc Format code or store the format in a permanent library (check documentation for the procedure) and add that location to system options.

You can create character to character formats, such as "Robert" = "Bob". However ranges of character values seldom work the way you need as you run into issues with 1) sort order and 2) upper and lower case issues (which affect sort order as well)

You can create formats from values in a data set but certain variables are needed by name and there are many options as to how specific values may be used.

If you did not intend to treat negative values in your range replace the special value Low with 0.

_MVB_ · Posted 02-05-2021 10:07 AM

Thank you for a detailed comment.

Am looking to perform slightly different actions here - i need to create a logic that automated in a way, instead of manually defining proc format, i need that logic to take a 'DECISION' (in this example 'DECISION' variable represented in 4 lines (4 possible IDs) variable and apply it somehow to my main data with values.

As i have two datasets, first with actual values (i added in a separate screenshot below), while second dataset has criteria ("Decision"), and as decision come out from proc hpsplit procedure in a way you see it on the first picture attached, i need somehow to use it when applying to my main dataset and outcome a decision, such as 0, 1, 2, 3, 4.

I added another screenshot for clarity, where 'values' numeric variable is my data in my main dataset, and 'assigned group' is a desired outcome based on 'DECISION' criteria and 'ID' variable in a group my values have to be assigned to.

The reason i need to automate is, because i have couple hundred of this decision variable with different criteria in it, and i can not every time manually look at it an create proc format, just because of the time constraints - i need to put together a code to breaks down DECISION variable and applies it to my 'values' variable and just gives an outcome whether my value belongs to 0,1, 2,3, 4.

Tom · Posted 02-05-2021 10:18 AM

If you want actual code in response to your question please post actual examples of your data. As text, not photographs.

Sounds like you want to convert your text variable into code. Either as actual IF/THEN or SELECT/WHEN code. Or as code to create a format. Your first example of the rules looks like you would want to do the former.

So let's say you have a rule dataset with a character string with the condition to test (let's call it RULE) and numeric variable with the code to generate when that rule is true. So you could then use a simple data step to generate code like:

if (rule1) then variable=value1;
else if (rule2) then variable=value2;
...
else variable=.;

So a data step like this might do that.

filename code temp;
data _null_;
  set rules end=eof;
  file code;
  if _n_=1 then put 'else ' @;
  put 'if (' rule ') then vairable=' value ';' ;
  if eof then put 'else variable=.;' ;
run;

So now if you have a dataset with values to be coded, let's say it is called HAVE, you could then use a data step like this to run that generated code on the data in HAVE to produce a new dataset named WANT.

data want;
  set have;
%include code / source2;
run;

_MVB_ · Posted 02-05-2021 11:35 AM

Thank you for your comment.

I do not have a code for my condition table where i have variables "DECISION" and "ID" - it comes as an output from hpsplit procedure.

As I run hpsplit procedure multiple times with different condition, every time i would get different setup of DECISION and ID, such as ID might go up to 5, or 4, or 2 (representing number of lines), while decision also would be various one from another.

i simply want to evaluate my 'main data' dataset where I have variable 'value' against the condition dataset and create/return one new variable (beside my 'value' variable), which group this value belongs two - as i provided on the second screenshot.

Tom · Posted 02-05-2021 12:35 PM

I am not sure I follow what you are trying to do. Doesn't PROC HPSPLIT have an OUTPUT option to generate the coded variable automatically?

_MVB_ · Posted 02-05-2021 12:43 PM

Yes, by defining NODESTATS=nstat_result, you have an output that contain list of variables, amongst them is DECISION and ID. and what i try to do is to apply these outcomes to my main data to assign ID in a new field in my main dataset.

Tom · Posted 02-05-2021 05:59 PM

I am not sure what PROC HPSPLIT does, but why not just give it all of the data upfront? That is what you would do with a traditional regression. So any observations with missing values of the dependent variable wouldn't contribute to fitting the model but would be available to have the predicted value generated in the output dataset.

Please post a simple example that shows what you are doing.

Break down character variable and/or extract only particular information out of it

Re: Break down character variable and/or extract only particular information out of it

Re: Break down character variable and/or extract only particular information out of it

Re: Break down character variable and/or extract only particular information out of it

Re: Break down character variable and/or extract only particular information out of it

Re: Break down character variable and/or extract only particular information out of it

Re: Break down character variable and/or extract only particular information out of it

Re: Break down character variable and/or extract only particular information out of it

Re: Break down character variable and/or extract only particular information out of it

Classroom Training Available!