Numeric vs String Attributes – which is more efficient?

Introduction

An interesting question was posted on the TM1Forum about the most efficient way to craft a rule. This raised the subsequent question of how TM1 attributes are stored behind the scenes, and the impact of that on performance.

The documentation is vague on this point, and, although the arguments presented in the TM1forum post made sense, there wasn’t a definitive answer.

I thought I’d put it to the test.

The Method

I set up a cube with a large item dimension (4,000 elements) and a measures dimension containing an element for each calculation technique I wanted to test. I then wrote a rule for each measure, using one of the methods suggested in the TM1Forum post, as follows:

  • Measure 1: Used an IF statement to test if the category name ends with a ‘1’ or a ‘2’, and returned 1 if it does.
  • Measure 2: Added a numeric attribute to the category dimension which would contain 1 if the rule should return 1.
  • Measure 3: Added a string attribute to the category dimension which would contain ‘1’ if the rule should return 1.
  • Measure 4: Used the string attribute, but converted it to a numeric value before checking if it was 1 or 0.

I crafted a cube view that contained roll-ups of all the dimensions except the measures. I then calculated the cube view for each measure and timed how long it took to get a result.

I repeated the test three times for each measure, arbitrarily altering the rule and re-saving to ensure TM1 wouldn’t use cached values.

The Results

Interestingly, the test seems to have confirmed the assertion on the TM1 forum post that numeric values are actually stored as strings and converted. Here are the numbers, in seconds and %, for the detail-oriented:

Measure 1

Measure 2

Measure 3

Measure 4

Total

                     39

                     91

                     88

                   108

           326

                     43

                     99

                     97

                   125

           364

                     43

                   100

                     97

                   125

           365

                     42

                     97

                     94

                   119

           352

12%

27%

27%

34%

100%

Note:  the discrepancy between the first test and the subsequent ones can be explained by the fact that my laptop was running on battery for the latter ones.

It is clear that using a simple IF statement, even one containing two SUBST calls in its conditional, is more than double as efficient as using an attribute to determine behaviour. Of course, you get many benefits from using the attribute approach, so in many cases, it is still worth the overhead.

Converting the attribute to its numeric equivalent using NUMBR within your rule is markedly slower than comparing the string without conversion, but there is very little difference between using a string attribute over a numeric one. One is roughly as efficient as the other when reading from the attribute cube.

This means that numeric attributes are indeed most likely stored as strings, but the overhead of the conversion is negligible, and does not operate in the same way as the NUMBR function.

I would guess that internally TM1 skips validation of the string during the conversion, as the UI makes it impossible to enter a non-numeric value into a numeric attribute. The NUMBR function can’t do this, as it can accept any string as input.

I found these results very interesting, and they give us a framework to determine which method is best practice.

Conclusion

So what can we to take away from this small test? Here are a few tips:

  1. Using hard-coded logic is likely to be more efficient. However, hard-coding always goes against maintainability, so only do this if you’re sure the logic will never change, nor will the elements involved! This is a big assumption to make, so be very careful with it! Even when you can be sure, only choose this method if speed is the #1 priority in the model.
  2. When flexibility is needed, use a numeric attribute as a flag to determine rule behaviour. Since there is no significant overhead to using a numeric attribute over a string, the best choice is to go with the numeric. TM1 could one day change the way it stores numeric attributes, which could result in this approach being more efficient than storing at a string. However, it is unlikely that any changes in TM1 will lead to a string attribute becoming more efficient to read than a numeric one.
  3. Never convert your attributes manually using the NUMBR function. There is no benefit to this method, and it is significantly slower, so steer well clear of it.
I hope this article helps you make the right call when writing dynamic rules in the future! Happy modelling!
I’ve attached the model I used for testing, and the spreadsheet containing the results I obtained.

TST Model Data.zip (69.00 kb)

TM1 Attribute Tests.xlsx (10.09 kb)

Flow’s Date Lookup cube – Back to the Future!

Introduction

Firstly, sorry there hasn’t been a blog post for a while. Sometimes work commitments don’t play nicely with good intentions.

I thought I’d spend a bit of time talking about the advantages of using the SYS Date Lookup cube bundled with the Flow Server Manager. This cube has been invaluable to me during past implementations and is worth considering as a starting point for any project.

Time Travel for TM1 Rules?

Sometimes your rules and TI processes need to time travel. There are many scenarios where this is necessary. but I’m sure if you’ve got a few projects under your belt, you won’t need me to provide an example.

TM1 provides some basic calendar and time functions to deal with calculations that need to factor in the fourth dimension of our physical reality. However, they fall well short of those offered in Excel and other programming environments, and often you just can’t find an efficient way to do what you want.

How many times have you used the expression:

SUBST(
    'JanFebMarAprMayJunJulAugSepOctNovDec',
    (
        MONTH(
            DATE(['Date'] - 21916, 1)
        ) - 1
    ) * 3 + 1,
    3
);

?

Never? Oh, ok, maybe it’s just me! Anyhow, experience and logic will show you that it’s almost always better to look up such a value in a cube or attribute than it is to write a convoluted date/string combo like this.

The SYS Date Lookup cube won’t save you for the above example (you’d need a dimension with month numbers and a month name attribute for that), but is instead designed specifically to help with looking backward and forward in time.

Developers need to do this very often when creating rules that accumulate values over time, such as depreciation calculations, and, although you can use rules that look backward one index in the dimension, this approach is tricky when you need to look to a previous year (especially if Year is a separate dimension), and falls apart completely if someone changes the element order in the period dimension.

The Flow “SYS Date Lookup” cube

A seasoned developer with spend a fair bit of time considering their period dimension structure at the beginning of the project. With this, I think most of us probably also consider a date lookup cube, even if we don’t design it explicitly.

When starting on a project a few years back, I decided to create the ultimate date lookup cube, one which I could re-use for new projects. What I came up with formed the basis of the cube you find in the Flow Server Manager.

The first thing to note is that the cube is calendar based. If you use a fiscal year dimension, such as “Plan Year 1”, “Plan Year 2”, like we do in Propel Planning, you need to convert that to the correct calendar year.

Once you have a calendar year and month, you can use the SYS Date Lookup cube to look forward or backward any number of months up to 20 years from the current date.

Overkill? Probably, but at least you’ll never have to build one from scratch again!

How to get it

Simply fire up the Flow Server Manager (or if you don’t have it, get it here).

  1. Click the “Create New Data Folder” button.
  2. A dialog will appear asking you what options you would like to enable for the new TM1 data folder. Make sure you have “Date Lookup” checkbox checked.
  3. Type a server name and select the folder you would like to create the data folder in, then click the “Create” button.
This will create a new TM1 data folder with the SYS Date Lookup cube already included. If you want to include it in an existing model, you can take the files from this data folder and copy them manually into the target data folder.

How to use it

To look up a date using the SYS Date Lookup cube, you need to provide 5 parameters.

  1. Source Year
  2. Source Month
  3. Number of Years
  4. Number of Months
  5. Measure (forward or backward year or month)
For example, if you wanted to look 12 years and six months into the past, you’d use the following formulae:
[‘Year in the Past’] = N: DB(‘SYS Date Lookup’, !Year, !Month, ’12’, ‘6’, ‘Backward Year’);
[‘Month in the Past’] = N: DB(‘SYS Date Lookup’, !Year, !Month, ’12’, ‘6’, ‘Backward Month’);
This may look a little complex at first, but it is far simpler and more efficient than trying to write a rule to calculate it on the fly.

Conclusion

Using a date lookup cube is often necessary for efficient rule and TI development.

It’s definitely worth thinking about this aspect early in the project, and if you have matching year and month dimension, why not give yourself a little shortcut by using the Flow Date Lookup cube?

It’ll get you right into the future and back again safely.

An Item-based Approach to TM1 Model Design

Introduction

James and I have been working together for some time now, and we have evolved toward what I would class as a non-typical design approach in TM1. For the sake of simplicity, I refer to this as an “item-based approach”.

It is very prominent in our implementation of Propel Planning, and underpins a lot of the power of that particular product.

I thought it worthy of some discussion, as it has many advantages, and a few gotchas, that are worth considering in any implementation you might be involved with.

The Approach

The item-based design approach has the goal of allowing data input in a flat, tabular format without giving up the browsability and analytic capability of highly dimensional cubes. It also separates input from reporting in a very elegant fashion.

You begin with an input cube, which should have only the basic dimensions, plus a measures dimension to represent the columns of the input table, and item dimension to represent an arbitrary number of rows.

The measures dimension will include many string elements which map to other dimension elements. Thanks to the picklist feature in TM1 9.5+, these lists can even be restricted to ensure invalid entry does not occur.

A separate reporting cube is then created that maps the string elements to actual dimensions, for reporting and analysis, usually via rules. This cube has no data entry and populates itself entirely from data in the input cube. You could also use TI to populate such a cube without too much trouble, for implementations that have higher data volumes.

I call it item-based, because this approach naturally requires an item dimension, with arbitrary names. Most of the time we just call the elements “Item 1”, “Item 2” etc, up to a maximum amount. Because this maximum is imposed, it is important to keep the efficiency of the model from being affected by the number of elements in the item dimension. More about that below.

Advantages

There are many advantages to such an approach.

Data-entry simplicity

New users of TM1 are often uninitiated, and, dare I say it, sometimes under-prepared by standard TM1 training courses, to understand the full advantages of a multi-dimensional database model. It doesn’t matter what you do, some users have spent way too much time in Excel or Access and simply think in terms of tables.

And why should they bother? Many of these users are simply data contributors, and do not have any interest in performing extensive analysis on their data.

The flat input approach allows such users to contribute their data in a way that makes sense to them.

It also allows them to adjust manual inputs and correct errors without cutting the data from one intersection and pasting it in another, an operation which can be error prone and, let’s face it, slightly buggy in the TM1 Perspectives cube viewer, and difficult in the TM1 Contributor front-end.

Maintainability & Agility

TM1 implementations are naturally agile and flexible. Developer with an IT background like myself might fight against it, and try to impose strict, inflexible Business Requirements and a rigid change request process to protect against scope creep, but that really undermines one of TM1’s key advantages in the market place: agility.

Imagine a retail sales model, which has Region, Sales Rep and Distributor as data points of interest. Sales reps and other users contribute data from the field using their laptops.

In a typical TM1 design, you’d create a cube with Region, Distributor and Product as dimensions. The input form would ask the user to select elements from each of those 3 dimensions and would write the sales/inventory data to the intersection of the elements chosen.

All is good, and the managers and finance staff can browse the cube and get the insight they need.

However, imagine, after months of data has been collected, someone in head office decides they would like to also track the data by Customer Type. The data already exists in the point of sale system, as each customer is tracked by credit cards and loyalty cards they use when making a purchase.

With your typical design, you don’t have a lot of choice, but to redesign from scratch and create a new cube with the additional dimension. You might choose to keep the existing cube for backward compatibility, in which case you’d have two sources of the same data, which could lead to synchronization issues since the original data is manually contributed from Sales Reps in the field.

It’s your basic nightmare, and if you were halfway through the implementation, you’d probably tell your customer that it’s a change in scope and that it would have to be left to phase 2.

With an item-based approach, you don’t have these issues. You can take the new data from the POS systems, import the Customer Type field via TI (while creating the new Customer Type dimension on the fly), then update your reporting cube and rules.

Yes, you still have to do some basic redesign, but there is no requirement for a complex and error-prone data migration.

Contributor & Insight-friendly

TM1 Contributor (or “Applications” as it’s now known) and Cognos Insight, are great front end tools for data contribution. They are a little weak, however, when it comes to customizing views to be friendly for the end-user. A highly dimensional input cube forces the view designer to choose between unworkably large grids or many laborious title element selectors which make cutting and pasting data difficult.

A flat, item-based input cube is much simpler to work with, supports multi-level cut and paste, and presents itself in a more logical fashion for quick data input. String values can be typed in as well as selected from a list, then copied down as necessary.

Downsides and Gotchas

Performance

If you’re not careful, this design approach can tempt you into inefficient rules and over-feeding. Performance can suffer with large data volumes.

However, with better design and a clean rule-based approach this can be avoided. Over-feeding is not necessary and rules can be structured logically and efficiently.

As always, TM1 has it’s quirks, but once you understand the gotchas associated with this design approach, they are easy to avoid or work around.

I’m planning several follow-up articles that will go through these issues in detail, and how to make sure they don’t have you pulling your hair out.

Complexity

Rules in this design approach can appear more complex and be harder for another developer to understand. I have first hand experience of handing over such a design to very capable developers and having them screw up their noses and replace my cubes with a more standard TM1 design.

I believe this is partially a cultural issue, as TM1 is taught in a particular way, and that has become accepted as “correct”. Once a developer adjusts to this kind of thinking, it’s actually very difficult to go back!

Obviously well-formatted rules and code comments can go a long way to alleviating this issue also.

Limitations

There is a natural limitation imposed by the item-based approach, and that is the number of elements in the item dimension forms a maximum number of “slots” for data input.

To avoid the situation where a user does not have enough “slots” to input their data, a developer might be tempted to include a large number of elements in their item dimension, and, if the rules and feeders are designed poorly, this could lead to poor performance.

However, a well designed cube won’t need a lot of input slots, as the average person is not able to navigate, or even usefully perceive, thousands of rows of data!

In our retail sales example above, there may be many sales items entered, and at first glance, it may appear to require a form with thousands of visible items shown. With a bit of thought, it usually possible to group input tasks meaningfully so that only a useful number of items need to be shown for the current input task. For instance, the Sales Rep could enter only the data for the particular store they are visiting, as they most likely wouldn’t be entering data for several stores simultaneously.

And, either way, a more dimensional approach does not mitigate this problem either!

Conclusion

With a bit of planning and thought, an item-based approach to TM1 development offers many advantages and rewards.

My follow up articles will be based on a simplified example, which is attached to this post for you to download and examine. The example is built and tested in TM1 v10, with the ForceReevaluationOfFeedersForFedCellsOnDataChange set to “T”.

Feel free to leave comment, criticisms, or unfettered praise in the comments section below!

And yes, that is the name of the property!

StringFeederSample.zip (2.98 mb)