Counting three million records

I have blogged how to count values in the data range, like how many times letter “A” appears in the data range. Let’s take such a large data range that it does not fit Excel workbook.

I have a text file with good 3M rows of values. Each line holds value either A, B, C, or D.

How many times does the letter “A” exists in the file ? We cannot count with normal Excel COUNT. As the data does not fit normal Excel workbook.

3M rows were loaded into Excel data model.

We use COUNTA function in DAX.

The formula is: =COUNTA(C3M[Val]) .

C3M is the table and Val field.

Pivot settings.

The results per letter.

Next step is to count percentual division between four letters. Therefore, we need to count the total number of rows.

The formula is: =calculate(COUNT(C3M[Val]),ALL(C3M))

The CALCULATE formula calculates the total number of rows in the data range.

To have the percentage distribution, we need to just divide count of each letter by total number of rows. To have the percentage value, the format is percentage.

The formula is: =divide([Count],[Count_all]) .

We divide one measure with another.

The letter A counts for a half, other letter divide the remaining half.

Count_all measure is counting with CALCULATE the total number of records. The expression for CALCULATE is table C3M and field Val, the filter is ALL as we are counting all the records in the table.

One option is again to use MS Access, as Access can take 3M rows of data.

The table has two fields running number and the letters in VAL field.

A simple SQL query, just remember the square brackets around the table.

The results in absolute number.

Here is the SQL query to count percentage values. How many percents of all the letters is letter “A” ?

SELECT C3M.Val, ROUND(Count(Val)/(SELECT COUNT(*) FROM [C3M]), 2) AS Expr1

FROM C3M

GROUP BY C3M.Val;

Now the square brackets were not needed with FROM command. To have percent values, we need to take the total number of letter “A” rows with Count(Val). That is divided with all the records in the table (SELECT COUNT(*) FROM [C3M]). In order to decrease the number of decimals, ROUND function is framing the division calculation.

Every second letter in the data range is “A”. Remaining letters B, C, D are dividing evenly remaining half.

SQL is powerful tool to calculate numeric values out of a large data. When using DAX we needed few formulas, but with SQL we needed only few lines. In case you want to present distribution of the four letters, you can easily create a graph like this in Excel.

Average with NULL values

You have a sales report. Each sales rep from A to J have sold a number of products except D and H.

We should count the average sales.

The standard average formula counts only numbers. For example, sales reps D and H are not included. Sum of numbers is 447. If you divide that with eight, that is roughly 56.

When sales reps D and H are on the payroll, they need to be included in the average counting. 56 is too high average as two sales reps have zero sales.

Data includes NULL values.

The more realistic average is around 45.

To make denominator as ten, we have few options available.

We can count cell with value in B1:B10 and then add cells without value B1:B10 to include all the cells. Also ROWS function counts rows in data range.

We count the sum in B1:B10 and then we divide by COUNT + COUNTBLANK or by ROWS -function. Then we have the wanted 45 as a result.

Another option is to populate NULL values with zero numbers, then standard average can be used.

Activate the B1:B10 range and execute the macro below.

Sub z_zero()

Dim cell As Range

For Each cell In Selection

If cell.Value = vbNullString Then

    cell.Value = 0

End If

Next cell

End Sub

NULL values are in Excel VBA called as vbNullString. If the macro finds NULL value in the range, then that is replaced by zero. If the value is zero, then it is NULL.

Activate the values, then execute the macro. Now NULL values have been replaced by zero, and standard AVERAGE behaves as we want.

We can count average in Access too.

The same values are in Access.

A query was created.

The SQL is:

SELECT Avg(IIf(IsNull(sal),0,sal)) AS Expr1

FROM tab;

The result is the same as in Excel.

We have nested formulas AVG, IIF and ISNULL. In other SQL dialects same thing can be done with AVG and IFNULL.

SQL sentence is short compared to do same in Excel. Excel is a better tool number crunching, but if you can do something with SQL, Access is worth trying out.