Category Archives: Uncategorized

Maven manufacturing downtime

January 26, 2026 Ismo Heiskanen

This blog is based on a free data sample from Maven Analytics, Manufacturing downtime, Free Sample Dataset Download – Manufacturing Downtime – Maven Analytics | Build Data Skills, Faster. Thanks to Maven Analytics for publishing the data set.

Four questions were asked, and I will answer the questions later in this blog post.

What’s the current line efficiency? (total time / min time)
Are any operators underperforming?
What are the leading factors for downtime?
Do any operators struggle with particular types of operator error?

Let’s check the first the data we got.

Line productivity is a fact table about produced batches.

The product table includes product data and is a dimension table. We are dealing with six products.

Downtime factor table is a dimension table about reasons why downtimes happened. The table is a list of factors and ii the factor was an error caused by an operator.

Line downtime is storing the downtimes per each line and what was the downtime factor. The table is a fact table.

The issue with this table is that it is in matrix format, not a pivot table. Before starting analyzing the data, we need to convert the table into pivot format. I have written a blog post about that.

In brief, load the data as data model. In power query, activate the first column from left and press right mouse button, select “unpivot other columns”.

I will paste the value into separate Excel sheet, as it is easier for me to manage values there.

Just control + v in the Excel sheet.

I have changed the column names to be more descriptive. Factor is foreign key for the primary key in factor dimension table.

Then I loaded the data into MS Access, as usual. I changed the names for the tables to start with either D or F depending on whether the table is dimension or fact. Also, spaces were removed from the field names. It is easier for me when there are no spaces in table or field names, then I don’t have to use quotes or single quotes. SQL queries can be used with Access, too.

The data was loaded from Access into Excel data model.

This is the data model. Product dimension table D_prod is joined to line productivity fact table F_lineprod as both the tables hold product information. The relation is one-to-many. One product appears only once in dimension table but may appear several times in fact table. The factor dimension table D_dfactor is joined to down time fact table F_linedown with factor field. Relationship is again one-to-many. The two fact tables are joined with batch number as the number appears once in F_lineprod table. One batch may have several downtimes.

One notification I must make over the data. The last row in F_lineprod table, the batch starts during the previous day and ends following day. As you note, I have rounded the times into full minutes leaving seconds out.

I will adjust the times that the batch takes 2 hours 10 minutes, but during the same day.

What’s the current line efficiency? (total time / min time)

The first question is to calculate line efficiency. I understand this to be down time divided by total time and I consider product as line.

First, I calculated the total time.

This is the DAX:

=SUMX( F_lineprod;(F_lineprod[EndTime] – F_lineprod[StartTime]) * 24)

The total time in hours.

This can be doublechecked in Access with:

SELECT

ROUND(SUM(EndTime – StartTime) * 24, 2),

Product

FROM

F_lineprod

GROUP BY

Product;

The downtime is calculated by simply sum of min column in F_linedown table. As the values are given in minutes, the result is divided by 60 to get hours.

=sum(F_linedown[Mins])/60

The values are crosschecked in Access as follows:

SELECT

fd.product AS Expr1,

round(Sum(fn.Mins) / 60, 2) AS Expr2

FROM

F_linedown AS fn,

F_lineprod AS fd

WHERE

fn.batch = fd.batch

GROUP BY

fd.product;

The total time and down time are calculated here:

OR-600 has highest downtime ratio. Other product have quite equal ratio. It is depending on the business is one third of production time reasonable level for downtimes.

If efficiency is calculated by total time minus downtime, the efficiency is 64 %. Another topic is whether 64 % is a good or a bad number.

Are any operators underperforming?

Mac has highest downtime ratio, but differences are minor among operators. Charlie, who has lowest downtime ratio, has the highest absolute downtime. This is due to fact, that total times vary between the operators. Charlie’s total time is 36 % higher than Mac’s.

We can doublecheck the total time with Access as follows:

SELECT

fd.operator AS Expr1,

round(Sum(fn.Mins) / 60, 2) AS Expr2

FROM

F_linedown AS fn,

F_lineprod AS fd

WHERE

fn.batch = fd.batch

GROUP BY

fd.operator;

And down time:

SELECT

fd.operator AS Expr1,

round(Sum(fn.Mins) / 60, 2) AS Expr2

FROM

F_linedown AS fn,

F_lineprod AS fd

WHERE

fn.batch = fd.batch

GROUP BY

fd.operator;

What are the leading factors for downtime?

Downtime have been counted simply with:

=sum(F_linedown[Mins])/60

We take minutes from fact table F_linedown and divide by 60 to get hours.

With downtimes we have deviation. The most common factors for downtime are machine adjustment and machine failure. Those represent more than 40 % of all cases. No emergency stops took place. Conveyor belt jam is very uncommon.

The same can be checked with Access:

SELECT

f.factor,

d.Description,

ROUND(Sum(f.Mins) / 60, 2) AS SumOfMins,

ROUND(

SUM(f.mins) / (

SELECT

SUM(mins)

FROM

F_linedown

)

FROM

F_linedown AS f,

D_dfactors AS d

WHERE

(((f.factor) = [d].[factor]))

GROUP BY

f.factor,

d.Description

ORDER BY

Sum(f.Mins) DESC;

Do any operators struggle with particular types of operator error?

The only formula which I used here is the same as in previous question. Just summing up the minutes of downtime.

Charlie and Dee have various of operator errors. Dennis and Mac have fewer types of operator errors. Machine adjustment is typical error for Dennis and batch change for Mac.

If you want to check out the same issue in SQL, this is the query I used.

SELECT

fd.Operator,

fn.factor,

ds.Description,

SUM(fn.mins)

FROM

F_lineprod AS fd,

F_linedown AS fn,

D_dfactors AS ds

WHERE

fd.batch = fn.Batch

AND fn.Factor = ds.Factor

AND ds.OperatorError = ‘YES’

GROUP BY

fd.Operator,

fn.factor,

ds.Description;

Summary:

What’s the current line efficiency? (total time / min time)

64 %.

Are any operators underperforming?

Mac has highest downtime ratio, but differences are minor among operators.

What are the leading factors for downtime?

The most common factors for downtime are machine adjustment and machine failure.

Do any operators struggle with particular types of operator error?

Machine adjustment is typical error for Dennis and batch change for Mac.

Uncategorized

Maven Toy Store E-Commerce Database

January 5, 2026 Ismo Heiskanen

The data for this blog was taken from Maven Analytics data playground Free Practice & Free Sample Dataset Download – Toy Store E-Commerce Database – Maven Analytics | Build Data Skills, Faster toy store e-commerce database. Thanks to Maven Analytics for publishing datasets.

Four questions were presented:

What is the trend in website sessions and order volume?
What is the session-to-order conversion rate? How has it trended?
Which marketing channels have been most successful?
How has the revenue per order evolved? What about revenue per session?

I will answer those questions in this blog based on the sample data.

First, we need to get familiar with the tables.

The orders table includes the order id, website session, user id, primary product in case of bundle order, items purchased, total price of the order and cogs.

In order item table we have item level information about the order like price and cogs.

Refunds related to orders are stored in order item refund table. There are order item id and refund amount.

Product table includes four records of product id and product name.

I have downloaded the data to Access.

In addition to this model, I created a date table.

In addition to this model, I created a date table.

I created the table in Excel. Excel was counting year, quarter, month, weekday and year quarter combination.

The datamodel looks like this.

Relations in the datamodel.

What is the trend in website sessions and order volume?

Number of websessions is increasing quarterly apart from the last reported quarter.

Order volume measured by money has increased. Only 2015/Q1 was lower than previous quarter.

What is the session-to-order conversion rate? How has it trended?

I understand this question so that how many sessions turn to order. If a customer is browsing the webpage, how often does the customer buy something.

This is the rate how many websessions are led to an order.

Sessions_to_order

=count(Orders[order_id])/count(Website_sessions[website_session_id])

How has the revenue per order evolved? What about revenue per session?

Number of websessions is increasing apart from the Q1 in 2015.

Session to order rate is steadily increasing apart from the last reported quarter, Q1/15.

Which marketing channels have been most successful?

I understand market channel as campaigns in website_sessions table.

When calculating the success, I count the sales minus cogs minus refunds. Then we know the total success.

=SUM ( Orders[price_usd] )

– SUM ( Orders[cogs_usd] )

– SUM ( order_item_refunds[refund_amount_usd] )

Nonbrand has the highest sales, pilot the lowest.

The table above is calculated with DAX below:

=AVERAGEX (

Orders;

Orders[price_usd]

– Orders[cogs_usd]

– CALCULATE (

SUM ( order_item_refunds[refund_amount_usd] );

FILTER ( order_item_refunds; order_item_refunds[order_id] = Orders[order_id] )

)

We take the price of the order minus costs minus refunds and then we count the average.

Here we have divided the net sales with number of websessions.

The DAXes are as follows:

Sales

=SUM ( Orders[price_usd] )

– SUM ( Orders[cogs_usd] )

– SUM ( order_item_refunds[refund_amount_usd] )

CWeb

=count(Website_sessions[website_session_id])

S_CWeb

=DIVIDE([Sales];[CWeb])

The average sales per websession has steadily increased.

What is the trend in website sessions and order volume?

Number of website sessions and order volume have increased steadily quarterly apart from the last reported quarter.

What is the session-to-order conversion rate? How has it trended?

Session to order rate has increased steadily. Average is about 0,07, 7 % of all the sessions turn into an order.

Which marketing channels have been most successful?

I understood market channel as campaigns. Nonbrand is the best campaign.

How has the revenue per order evolved? What about revenue per session?

Both revenue per order and revenue per session have increased.

Uncategorized

Correlation matrix

December 23, 2025 Ismo Heiskanen

This blog is addition to my earlier blog about correlation. Here I present how to calculate several correlations at one go.

This is a sales report on four products.

A sales campaign was made for product P1 in P10/23.

How did the campaign affect other products? One way is to have a correlation matrix. That presents correlations between each product.

Select file – option – add-ins. Check that analysis toolpak is active.

Select data – data analysis.

Select correlation.

Select sales data as input range, include headers. Grouped by is columns as we have data per column.

Then we have the results. Correlations are presented per each combination.

The strongest correlation is between P1 and P4. Also, P2 and P4 have mutual correlation, as well P1 and P2. P3 lives its own life. Correlation between P1 and P3 is slightly negative.

Sales campaign on P1 affects positively P4 and P2. Correlation between P1 and P4 is bit higher than between P1 and P2. Campaign does not affect sales for P3 at all. Campaign is slightly decreasing the sales for P3.

Correlation matrix makes analysis fast. You don’t have to calculate each correlation separately. Of course, you can do that with CORREL function.

Uncategorized

D-functions in Excel

December 5, 2025 Ismo Heiskanen

You have data for the year, product, region, and sales volume.

The data is not in pivot format.

If we want to calculate quickly the sales volume for product P1 in East region, we could create a pivot report. For cases like this, we can use D-functions. D-formulas are normal Excel functions like SUM, MIN, MAX or AVERAGE but with D prefix.

Let’s check an example.

The DSUM function has three arguments: database, field and criteria. The database means the data range, in our case that is B4:E20. The range includes also headers. The field is the field with facts or quantitative data. The field can be the order number from left to right, in our case sales is the fourth column. The field can be defined also by header name like =DSUM(B4:E20;”Sales”;G8:H9) would work also fine. We are calculating sales values. We have defined two criteria: product and region. The product should be P1 and region East. Even though there is just one argument for criteria, we can define several criteria with one argument G8:H9.

In the same way, we can use DAVERAGE and DMIN.

We have no data for product P3 for the year 2024. We have only two values for P2 in 2025, 19,2 is the smaller of the two values.

It is possible, of course, that we create a pivot.

The result is the same as DSUM. We can check also with Access

I loaded the data into Access.

SQL-query returns the same value.

DSUM is calculating correctly.

D-functions are useful, if you want to calculate few values from a table without setting up a pivot-report. If you need to calculate several values or do sensitivity analysis, then it is better to create a pivot-report.

There might be more D-functions in Excel in addition to what I have mentioned in this blog post.

Uncategorized

SUMIF based on first digits

November 4, 2025 Ismo Heiskanen

Normally, data model and DAX are used for large data. Especially, when data amount is higher than number of rows in Excel, roughly 1,05 M records.

I have faced some simple calculations when DAX provides some features that I could not find in Excel SUMIF formula.

I have a list of accounts, and I would like to count sum for balances for all the accounts starting with 40.

By the way, =SUMIF(C4:C15;”40*”;D4:D15) this sentence does not work. If it worked, this would be an easy task.

Of course, you can do it this way. Add a new column with IF sentence. If the first two digits in the account code are 40, then mark the row with “sum”. After that count with SUMIF all the rows with sum mark. Now you must extend the data area with one column. This solution requires a new column, and it is not automated.

Data is loaded into data model. The measure is as follows:

=CALCULATE(SUM(SIF[Balance]);FILTER(SIF;LEFT(SIF[Account]; 2) = “40”))

The table is called as SIF, the fields are Account and Balance.

We calculate Balance field from SIF table, the filtering criteria is the Account field in SIF table should have 40 as the two first digits from the left.

I am using semicolon as a separator, some other user use comma instead.

Another option is downloading the data into Access.

Then create an SQL query.

Just remember that wildcard is star in Access not percent.

The result.

However, a versatile SUMPRODUCT can handle issues like this. Sometimes, I find SUMPRODUCT bit complex as there is no SUM or SUMIF functions. After LEFT, the D column values are just multiplied. Still, result matters.

Among the options I demonstrated, the SQL is the easiest one for me. You just need to load the data into Access. DAX and SUMPRODUCT do the calculations, but they are somewhat more complicated than SQL. Adding an extra column is a possible solution but not very neat solution. As I started, pity that SUMIF did not work.

Uncategorized

Simple Office Script in Excel

October 7, 2025 Ismo Heiskanen

If you want to automate something in Excel, you might want to use Macros and Visual Basic. That’s what I have been doing.

Let’s take a very simple example. When you have a sum cell, the result should be framed with a bigger font.

When I recorded the macro, the result was sum-macro. The VBA script is attached to the end of this blog.

Check if you have automate tab available in the Excel.

We can record new scripts in a bit similar way as recording a VBA macro.

Press “Record Actions”.

Record actions is writing a log.

Frame the cell and make the font bigger.

The Record Actions recorded my actions.

Press stop.

I can change the script name.

I replaced the “Script 4” by more descriptive “Font_frame”.

This is the start of the script. Green lines with slash slash are comment lines. The script has generated also comments, which is a benefit compared to VBA.

In the sixth line, the script has created a static cell reference, C4. It would be better if we had relative references like active cell. The changes are done into active cell, not C4, unless C4 is the active cell.

We need to change the selectedSheet.getRange(“C4”) with workbook.getActiveCell().

I also changed the comments, even though it does not affect the result. The change should be done to active cell, not any predefined cell.

Office Scripts are stored in OneDrive as osts files.

If you want to execute an Office Script, select “All Scripts” under automate ribbon. Then press play.

The result.

The VBA and Office script are doing the same task, they both make frame to selected sell and make the font bigger. The VBA code is quite long, even though I expected the VBA to be shorter clearer than Office Script.

VBA is older technology than Office Script. It is good to familiarize yourself with Office Scripts.

Here are the scripts.

Sub sum()

‘

‘ sum Macro

‘

Selection.Borders(xlDiagonalDown).LineStyle = xlNone

Selection.Borders(xlDiagonalUp).LineStyle = xlNone

With Selection.Borders(xlEdgeLeft)

.LineStyle = xlContinuous

.ColorIndex = 0

.TintAndShade = 0

.Weight = xlThin

End With

With Selection.Borders(xlEdgeTop)

.LineStyle = xlContinuous

.ColorIndex = 0

.TintAndShade = 0

.Weight = xlThin

End With

With Selection.Borders(xlEdgeBottom)

.LineStyle = xlContinuous

.ColorIndex = 0

.TintAndShade = 0

.Weight = xlThin

End With

With Selection.Borders(xlEdgeRight)

.LineStyle = xlContinuous

.ColorIndex = 0

.TintAndShade = 0

.Weight = xlThin

End With

Selection.Borders(xlInsideVertical).LineStyle = xlNone

Selection.Borders(xlInsideHorizontal).LineStyle = xlNone

With Selection.Font

.Name = “Aptos Narrow”

.Size = 14

.Strikethrough = False

.Superscript = False

.Subscript = False

.OutlineFont = False

.Shadow = False

.Underline = xlUnderlineStyleNone

.ThemeColor = xlThemeColorLight1

.TintAndShade = 0

.ThemeFont = xlThemeFontMinor

End With

End Sub

And here is the Office Script, I used.

function main(workbook: ExcelScript.Workbook) {

let selectedSheet = workbook.getActiveWorksheet();

// Set font name to “Aptos Narrow” for selected range on selectedSheet

workbook.getActiveCell().getFormat().getFont().setName(“Aptos Narrow”);

// Set font size to 16 for selected range on selectedSheet

workbook.getActiveCell().getFormat().getFont().setSize(16);

// Set font strikethrough to false for selected range on selectedSheet

workbook.getActiveCell().getFormat().getFont().setStrikethrough(false);

// Set font superscript to false for selected range on selectedSheet

workbook.getActiveCell().getFormat().getFont().setSuperscript(false);

// Set font subscript to false for selected range on selectedSheet

workbook.getActiveCell().getFormat().getFont().setSubscript(false);

// Set font underline to “none” for selected range on selectedSheet

workbook.getActiveCell().getFormat().getFont().setUnderline(ExcelScript.RangeUnderlineStyle.none);

// Set font color to “#000000” for selected range on selectedSheet

workbook.getActiveCell().getFormat().getFont().setColor(“#000000”);