100MM Rows in Excel

The BI industry is in the middle of a drastic change in terms of tools, technologies, and data.  The Big Data movement is causing companies to re-think their approach to analytics, as its now much easier to produce a solution which derives insight from petabytes of data.

But what about the rest of the world?  Those who need to analyze historical sales, but don't have the resources to spin up a large Hadoop cluster or a data warehouse on a dedicated appliance?

At Anexinet, we're working with a company who has that exact problem.  Plus a few issues:

We settled on a solution which combines timely access to data, some traditional BI/DW concepts, and a self-service approach for end users.  The traditional approach comprises a SQL Server database, nightly ETL, and a multidimensional model.  In this approach, we've laid the groundwork for the business to consume data no less than 24 hours old, greatly improving on the status quo (month ends, analyst begins running queries, compiles data, performs a ton of VLOOKUPS, delivers a report somewhere around the 15th of the month). 

Now we're at the stage where we need to provide access to this data, but without creating a library of reports (that will come later).  For this, we turned to PowerPivot.

Using PowerPivot (Excel 2010 add-in) connected to the warehouse, we pulled in 5 years of sales history.  In one Excel file (3.5GB), anyone can look at any sale or group of sales, sliced by product, division, sales person, time...or any other attribute they choose.  100MM rows, all in one document:

Response times in the pivot tables are...faster than I can time.  Slicing and dicing is nearly immediate.

In summary, we went from a manager receiving reports on the 15th of the month for the previous month's activity to near instantaneous access to the last 5 years (including previous day).  Powerful stuff!




Labels: , , ,