turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Creating a dummy Sequence

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-28-2017 07:48 AM

Hi ,

I want to create a column C in my SAS data set based on column A and B values as below.

Please suggest.

Column A | Column B | Column C |

XXX | A | 1 |

B | 2 | |

C | 3 | |

D | 4 | |

E | 5 | |

XXX | B | 2 |

C | 3 | |

XXX | A | 1 |

E | 5 |

Accepted Solutions

Solution

03-29-2017
05:00 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Abhi2017

03-29-2017 04:46 AM

3000 values should easily fit into memory with a format, so my solution should work..

You can even create reverse format in the same step that maps the number to the original value. That allows you to drop B, reducing space consumption and speeding up your computations.

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Abhi2017

03-28-2017 08:00 AM - edited 03-28-2017 08:01 AM

Abhi2017 wrote:

Hi ,

I want to create a column C in my SAS data set based on column A and B values as below.

Please suggest.

Column A Column B Column C XXX A 1 B 2 C 3 D 4 E 5 XXX B 2 C 3 XXX A 1 E 5

Can you explain further? it's not clear what the desired result is. It's not clear what is the use of Column A.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

03-28-2017 08:08 AM

I have Column A and B in my input data set,I want to create Column C based on A and B

Say i have the data as below ,I want to create a sequences as column C which should have unique value for each column A and B.

ANd the Column C value will repeat when identifies value in Column B.

Column A | Column B | Column C |

100 | A | 1 |

B | 2 | |

C | 3 | |

D | 4 | |

E | 5 | |

101 | B | 2 |

C | 3 | |

102 | A | 1 |

E | 5 |

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Abhi2017

03-28-2017 08:39 AM

Abhi2017 wrote:

I have Column A and B in my input data set,I want to create Column C based on A and B

Say i have the data as below ,I want to create a sequences as column C which should have unique value for each column A and B.

ANd the Column C value will repeat when identifies value in Column B.

Column A Column B Column C 100 A 1 B 2 C 3 D 4 E 5 101 B 2 C 3 102 A 1 E 5

I'm sorry, I still don't understand. It sounds like the statement that you want "as column C which should have unique value for each column A and B" is contradicted by the next statement in which you say "ANd the Column C value will repeat when identifies value in Column B" meaning that column C does not depend on A.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

03-29-2017 02:20 AM

Thanks for your response.Let me try to reframe .

In my input data set, I have column A and Column B.My requirement is to create a new variable i.e. Column C based on Column B.Eample:100(A) has 5 (B)values 1 to 5.I want to create 5 unique sequences for each 5 rows in Column C.If the data in B is repeated in subsequent rows it will hold the initial value not new value.Now say I have assigned a value for column B ,value B as 2 and column B ,value C as 3, then the program to hold the value 2 and 3.

Now for Column B value B and Column B value c we already assigned 2 and 3 each ,so when 101(Column A) is validated it should automatically output 2 and 3 like wise 102 (Column A) it should validate 1 and 5 each rather than assigning new number.

Column C is dependent on value at column B.Hope this helps to explain my query.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Abhi2017

03-29-2017 03:05 AM

Don't think so complicated. Since C would be directly dependent on only B, C is unnecessary. You can use B for conditions just as you would use C. **C = f(B)**, so **if C** would become **if f(B)**.

The only use I can see for a new (numeric) variable would be to have a reference for creating an order that can't be achieved with using B alone.

In any case, my suggested solution does this.

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

03-29-2017 04:41 AM

Thanks for your response Kurt,however my requirement is to create a new variable C .Because in input data set there are around 3000+ records for Column B and length for the field is greater than 200.Validation and review for numeric value column C will help much better compared to character field column B.Let me know your suggestion.

Solution

03-29-2017
05:00 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Abhi2017

03-29-2017 04:46 AM

3000 values should easily fit into memory with a format, so my solution should work..

You can even create reverse format in the same step that maps the number to the original value. That allows you to drop B, reducing space consumption and speeding up your computations.

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

03-29-2017 05:01 AM

Thanks for the suggestion Kurt.Worked great!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Abhi2017

03-28-2017 08:11 AM

```
/* using a custom format */
proc sort
data=have (keep=column_b)
out=lookup (rename=(column_b=start))
nodupkey
;
by column_b;
run;
data lookup;
set lookup;
fmtname = 'column_b';
type = 'C';
label = put(_n_,best.);
run;
proc format lib=work cntlin=lookup;
run;
data want;
set have;
column_c = put(column_b,$column_b.);
run;
proc print data=want noobs;
run;
```

Result:

column_a column_b column_c XXX A 1 B 2 C 3 D 4 E 5 XXX B 2 C 3 XXX A 1 E 5

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code