A Conversation for OLAP - Online Analytical Processing

Cubes?

Post 1

Yoz

I want to do some OLAP with h2g2 log data... but I don't understand what cubes are, as the SQL Server manual doesn't explain it very well.

Anyone?

-- Yoz


Cubes?

Post 2

alicat (Patron Saint of Good Taste)

ice cubes?smiley - winkeye


Cubes?

Post 3

Researcher 93445

A cube is the basic structure of OLAP. Hm, let's see, do you understand Access crosstab queries? If so, you can think of a cube as an n-dimensional analog of the crosstab. If not, that won't make much sense, and I'll take another stab at it.


Cubes?

Post 4

Scarp

Let's start at the beginning. A Fact is what you think it is, it's a statement of something true. For ffmike's example, it would be that quantity n of product id X were bought by customers of age A and sex S (they have a loyalty card scheme) at store id Y on day D at a cost of m. So the fact will be (n,X,A,S,Y,D,m). How many, what, who, where, when, and how much.

A fact table is just a list of such facts.

A Cube is a way of looking at a Fact Table. It looks at it as a multidimensional array. To make a cube from a fact table, you treat each part of a fact as either one of two things. It's either a position in a particular dimension, or a measure. In our example, we haven't got many measures: just n (how many) and m (how much). So each fact becomes a statement that at coordinate (X,A,S,Y,D) there are two values, n and m. That is, we have a 5 dimensional cube (but don't try to imagine it as that!) and at each coordinate there are two values, the measures.

To make it a little more useful, we can structure each Dimension by classifying its values in a Hierarchy of Levels. The values for these hierarchies live (conventionally) in separate tables called Dimension Tables. For example, our stores might be grouped in Counties, States and Regions (er, they're American stores). The days could be grouped by Months, Quarters and Years. Sex won't have any hierarchy (unless you have a very unusual database). Age might be, well, like it is in all those loyalty card questionnaires. How products are broken down will depend on what you're interested in analyzing.

I'm tired now. Can someone take over and write about Aggregation, Slices, Drill-down the necessity of additive measures, etc.? smiley - smiley


Cubes?

Post 5

Gibbous Hindsight

Thank you, Scarp. Yes, someone just as knowledgeable please do take over. Or better yet, Scarp, please take a nap and come back and finish. Your concise descriptions are just what I've needed to help me translate even the most basic portions of this jargon-filled, all-too-abstract book I've been trying to understand on the subject. (I'm a bit dim when it comes to this stuff . . . and other stuff.) And don't forget those useful examples/analogies, please, please.


Aggregation

Post 6

manolan


Well, I'll have a go, but more familiar with Relational OLAP, where the motivations and practicalities may be different.... Of course, apart from the physical storage of the data, there's little to choose except the maturity of the tools (anyone care to comment on that view?).

Anyway, consider the manager who wants to know how many of each product (X) were bought in quarter Q=1.

One way to answer this is to find all the facts where the day (D) falls within the quarter (Q) - perhaps by finding all the values of D from the dimension table for D where the value of Q is 1 and then finding all the facts with those values of D. If there are 90 business days in the quarter (Sunday trading being popular), you can imagine that there might be a lot of facts which meet the criteria. If you then consider that once you've found those facts, you have to add up all the values of n from each of those facts for each value of X so you can show a neat table of values, phew!

Wouldn't it be more convenient if you had already worked out how many of each product were bought for each value of D and perhaps for each month and Q? As this is all historic data, you're not likely to be changing it all the time, so you could calculate it once and store it. All you then need is a way of indicating that this aggregate applies to this intersection of coordinates, or to this point in a hierarchy. Some tools do this for you (well, some do the whole thing for you, apparently) and some don't.

Of course, with the really recent data (last few days, perhaps) you may find a lot of updates as some stores update the data faster than others. This means that you may have some data which _is_ being updated. You now have a dilemma. How do you update your pre-calculated aggregates? Different tools handle this differently. In some cases, there is nothing for it but re-calculate from scratch and in others, you can be more intelligent.

Aggregation is not without cost, so which aggregates to calculate has to be carefully considered.


Cubes?

Post 7

Boadicea

To do OLAP on your data they have to be organised in an OLAP database, that is a multi dimensional data base modelled so that it describe the semantics in the data, not the structure (as in ER modelling)
You build a data warehouse in order to overview your legacy data.
A cube is a set of dimensions in data warehouse. It may be a researcher dimension containing information on all researchers, it may be a entry dimension containing all entries of a certain kind.
Confirmed dimensions is dimensions that has the same meaning all trough your system. Fact tables tie this dimensions together into a cube and enable you to put all sorts of data together and do real OLAP on your data.
You can have a closer look at data warehouses and dimensions on this site http://www.businessanalysisbooks.com/index.html?0471255475Excerpt


Cubes?

Post 8

morehastelessspeed

The reason these data-structures are called cubes is because, in their most fundamental and original form, they "resemble" a cube.
Unsurprisingly, one of the most significant types of data recorded by companies, stored in a database and then summarised and analysed is sales-information.
The three basic, most direct properties of a sale are:
1) the product that was sold
2) the person (or possibly outlet) that sold it
3) when it was sold.
These properties are often referred to as (aggregate) dimensions.
Although academicaly the most simple 3-dimensional shape is a sphere, most people would (I believe) suggest it was a cube. In any case, the (mathematical) cube of a number is that number multiplied by itself 3 times (x to the power of 3). So, at least colloquially, 3 dimensions -> cube.


Key: Complain about this post

More Conversations for OLAP - Online Analytical Processing

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more