This is one of my favorite quotes:
There are only two hard things in Computer Science: cache invalidation and naming things.
— Phil Karlton
IT is my daily live. And this quote is so true. Lately I’ve been thinking much more than usual about naming, and that names really matter. This led to some refactoring activities.
When a person with German mother tongue hears the word “eagle”, she or he automatically associates it with a “hedgehog”. Simply because the German word for it (“Igel”) is pronounced exactly the same. Of course, language skills and the concrete context play a role. The point is, a wrong association is likely. When we give a name to a thing, we basically want to avoid such false associations. In the best case they are not helpful. In the worst case this leads to rejection, as the next example shows.
In 1982 Mitsubishi Motors launched a SUV with the name “Pajero”. This name had to be changed in some regions, because “pajero” means “wanker” in Spanish. This example also shows that it is more important what others think about a name than we do.
In IT we have to name many things. Databases, schemas, tables, columns, views, packages, triggers, variables, fields, methods, classes, modules, components, products, etc. etc. Using an established name with a known and accepted definition help others to understand it better.
When we use a name, it is actually associated with a definition and properties, whether we like it or not. When names have a common and widely accepted meaning, it simplifies the communication. For example “banana”. Everybody knows what it means. Merriam-Webster’s definition is:
An elongated usually tapering tropical fruit with soft pulpy flesh enclosed in a soft usually yellow rind.
And I am sure that each of us could add a few characteristics to this definition.
A name must fulfill many characteristics. For example
Depending on context there are some goal conflicts. However, even without a major conflict, it is difficult to name something adequately in early stages. Because we do not know enough about the thing we want to name. Hence, we use an iterative approach. We name something (e.g. an entity, package or class) and while working on it we find out that the name does not fit (anymore) and we change it. Maybe we split the thing and have to name now two things, etc. etc.
Finding a fitting name means to do some research. How have others named that thing? What is the definition for it? Does it fit 100 percent? This is an interesting and instructive work. In any case it takes time. And at the time we need a new name, we want it now (e.g. when a wizard asks for a name). We can always rename it later, right? – Technically yes. And often we do. But the longer we wait, the less likely we are renaming.
Yes. The more visible a name is the more important it is.
For example, the names behind an API are very easy to change. We do not have to ask anyone before changing it. It’s no problem as long as the API provides the same results. That’s one of the reasons we strive for tight APIs, right? To get some leeway.
As soon as others are involved, we are not fully in control of the change anymore. For example, when I change a name in one of my blog posts, this change is visible immediately for everyone visiting my blog. But I cannot control the caches of others, like search engines, blog mirrors and other services that copy web content to third party storages. Remember, cache invalidation is the other hard thing in IT.
As a consequence, before we release an artifact that becomes visible to others, we should take some time to verify the used names. We cannot take back what we’ve said (at least not completely). However, we are in control what we say in the future.
Some terms (names) were discussed recently (again) due to a series of sad events. I used these terms as well. I never really thought about them as “bad”. However, I’ve changed my mind. I’m part of the problem. And I do not like it. One thing I can do is to stop using terms, that a large group of people associate with slavery and racism. No big deal, right?
This is another quote I like very much:
One cannot not communicate
— Paul Watzlawick
It is difficult to draw a line for certain terms. However, I believe that “you cannot not decide”. You decide either explicitly or implicitly. Of course, very seldom something is pure black or white. It’s much more often a shade of grey. Some decision take some time. And that’s okay. But it is impossible to postpone a decision forever. At a certain point it becomes a decision.
So, I decided to decommission some terms on this blog and introduce new ones. Here’s the list:
|Current Term||Decommissioned Term||Context|
|accessible||PL/SQL accessible_by clause|
|exclusion list||PL/SQL Cop, PL/SQL accessible_by clause|
|inclusion list||PL/SQL Cop, PL/SQL accessible_by clause|
|transaction structure data + enterprise structure data||Data modeling|
|worker||Oracle DB background process|
Finding alternative names was surprisingly easy, because others have already done the work and defined alternative names. They existed since years…
However, finding an alternative for master data was harder. I reached out to my friends on Twitter. And got some helpful feedback. Finally Robert Marti suggested to have a look at Malcolm Chisholm‘s book Managing Reference Data in Enterprise Databases. On page 258ff the different data classes are defined and explained. The book is from 2000. In the meantime Malcolm Chisholm has published revised definitions here and here.
In the next subchapters I repeat the definition of the data groups defined by Malcom Chisholm on slide 5 in this deck. I like these definitions and plan to use them in the future.
The data that describes all aspects of an enterprise’s information assets, and enables the enterprise to effectively use and manage these assets.
Here it is confined to the structure of databases. Found in a database’s system catalog. Sometimes included in database tables.
Any kind of data that is used solely to categorize other data found in a database, or solely for relating data in a database to information beyond the boundaries of the enterprise.
Codes and descriptions. Tables containing this data usually have just a few rows and columns.
Data that represents the direct participants in a transaction, and which must be present before a transaction fires.
The parties to the transactions of the enterprise. E.g. Customer, Product.
Data that permits business activity to be reported and/or analyzed by business responsibility.
Typically, data that describes the structure of the enterprise. E.g. organizational or financial structure.
Data that represents the operations an enterprise carries out.
Traditional focus of IT – in many enterprises the only focus.
Data that tracks the life cycle of individual transactions.
Includes application logs, database logs, web server logs.
You use a name to simplify communication. A name is a proxy for a longer definition and meaning. If the meaning is badly received by others and especially by the target community, this does not simplify communication. Using a different name sounds like a simple solution. Why not, if changing a name is simple enough?
In this case I only had to edit a few blog posts. I handled them like typos. This means that I did not add any update information. I also had to register new URL redirects. That was straightforward. However, changing the branch name in 26 GitHub repositories was a bit more work than anticipated, because I also had to change URLs in several related files. For certain GitHub pages I had to keep a non-default master branch. I suppose that sooner or later GitHub will allow me to get rid of them as well. If I had to change more repositories, I would probably automate this task.
Most of the time I spent to find an alternative name for “master data”. In the end I learned something new and found good names and definitions. That will help me in the future.