Help using Base SAS procedures

Joining and comparing few datasets which names contain format prefixes and suffixes

Reply
Occasional Contributor
Posts: 6

Joining and comparing few datasets which names contain format prefixes and suffixes

Hello

I have a rather annoying problem I need to deal with, namely - I have a few datasets, some of them are template dataset with only variable names and no actual data. I need to compare the structure of these templates with the actual datasets. I know how to do the comparison itself but I have a problem with something else. These templates have some suffixes and prefixes, which are dates but they aren't written like 201412 but YYYYMM so the names are for example YYYYMMCARD1. Real dataset have names with dates in a normal format - 201412CARD1. So there's a problem when comparing them because in order to compare variables the tables need to have the same names. I've tried using substr but templates have lots of different prefixes and suffixes, some are shorter like YYYY or YYMM and some of these 'date formats' aren't even at the beginng or at the end of a name. I'm trying to figure out how to do that without doing dozens of different comparisons and substrings for every single possibility. Does anyone have any idea? Maybe there's a way to make SAS treat these parts of a name like numbers (some kind of format maybe? or a join with LIKE condition?) so that a normal join could be possible?


Thank you in advance
Super User
Posts: 10,483

Re: Joining and comparing few datasets which names contain format prefixes and suffixes

Are these supposedly SAS datasets or external files, perhaps in another DBMS?

I ask as 2014CARD1 is not typically a valid SAS variable name.

If you are concerned about the data set structures i.e. variable type and size by position that may be possible.

Provide two small examples of what your have for structure and what you want as a report.

Occasional Contributor
Posts: 6

Re: Joining and comparing few datasets which names contain format prefixes and suffixes

201412CARD1 isn't a variable name, it's a table name. Every single table is created as a new version of a previous dataset and these versions are presented as dates like 201412. Im other words, I have many different dataset:

- customerdata

- products

- etc.

And each of these dataset have different versions, so I he sth like:

- 201411customerdata

- 201412customerdata

- products201309

- etc.

I also have a list of these tables with variables that are associated with them but the names of these tables are written like:

- YYYYMMcustomerdata

- productsYYYYMM

So it's a template name that shows that each tables needs to have a version. And what I want to do is to check whether some variables were added, deleted or maybe the strucute itself wasn't even affected when making a change.

My way of doing that was to make a dataset that would list each version and it's variables in a similiar way to the list of templates I was writing about above.

After that the comparison itself didn't seem like a problem and

as I said before, comparing the structure isn't really what I have problem with. The main issue is that in odrder for this to work I need to join this two dataset (one with table templates and the other one with actual tables) and it's not that straightforward because the names are different. I thought of joinig them on table names using substr but since I have lots of possibilities to cover it makes my code look terribly. Maybe there's a way to treat these version templates as numbers? Or maybe there's a smart way to create a join?

Super User
Posts: 17,784

Re: Joining and comparing few datasets which names contain format prefixes and suffixes

A Valid table name in sas starts with a character or _.

Assuming your data is actually the way you say it is, consider looking as dictionary.column. Parse out the name, so that you have two fields - YYYYMM and table name. You can look at the compress function for this with the ka/kd options to keep alpha/numeric.

Then  transpose the data based on table name/ so the fields of interest become variables and you'll have your comparison/results.

Ask a Question
Discussion stats
  • 3 replies
  • 233 views
  • 0 likes
  • 3 in conversation