Maven Analytics 2015 New Year’s Eve resolutions tweets challenge

Maven Analytics published the data about 2015 New Year’s Eve resolutions tweets. The data can be

found here: https://www.mavenanalytics.io/data-playground?page=1&pageSize=5 . Thanks for Maven Analytics for providing the data.

Also sample questions were included.

  1. What is the most popular resolution category? Least popular?
  2. Which resolution category was retweeted the most? Least?
  3. Using the tweet_created field, and rounding to the nearest hour, what was the most popular hour of day to tweet? How many resolutions were tweeted?
  4. Using a map visual, what U.S. State tweeted the highest number of NYE resolutions?

The first, I downloaded the data to MS Access. Decimal symbol was changed to dot.

Reporting was done in Power BI. To add the hours in Power Query, activate the tweed_created column and select add column – time – hour. Hour information is needed in question number three.

Now the hour column appeared.

Let’s create the needed measurements.

Cat = count(New_years_resolutions[tweet_category])

The cat measure counts simply the number of tweet_category items.

FullHour = count(New_years_resolutions[Hour])

This measure counts the full hours.

Retw = count(New_years_resolutions[retweet_count])

The measure counting retweets.

States = count(New_years_resolutions[tweet_state])

This measure counts where tweets are coming from.

To double-check the values in Access.

SELECT COUNT(tweet_state), tweet_state

FROM New_years_resolutions

GROUP BY tweet_state

ORDER BY COUNT(tweet_state) DESC

;

California is leading, New York is the second.

Map used here is GlobeMap. The field tweet_state and measure states were selected.

  1. What is the most popular resolution category? Least popular?

Personal growth, philantrophic.

2. Which resolution category was retweeted the most? Least?

Personal growth, time management.

3. Using the tweet_created field, and rounding to the nearest hour, what was the most popular hour of day to tweet? How many resolutions were tweeted?

9, 505.

4. Using a map visual, what U.S. State tweeted the highest number of NYE resolutions?

California.

Sales postings higher than 1000 in MySQL

You have accounting data. Both accounts 1000 and 2000 are revenue accounts. Each posting is a sales order. Debit postings are correction postings.

A controller is interested to know how many sales orders have been made valuing more than 1000 units. As the sales orders higher than 1000 are high value sales orders.

How many percent of the total sales, the sales order more than 1000, cover ?

Correction postings should be removed when calculating the total sales.

A sales order for 1067 was originally posted for account 2000 and cost center C2. That posting has been reversed and a correct posting was assigned to account 2000 and cost center C1. There is only one sales order for 1067.

Let’s create a table in MySQL.

The fields are:

ID running number primary key to define each record individually.

ACC bookkeeping account.

CC cost center.

DEB debit postings.

CRE credit posting.

CREATE TABLE excel_demo.onek(

id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,

acc TEXT,

cc TEXT,

deb INTEGER,

cre INTEGER

);

Before importing data you need to fill zeros to empty debit and credit cells.

Activate the value fields, take home – find & select – go to special.

Select blanks. Ok.

Enter zero and press control + enter.

All the empty value fields are populated by zero.

Now we can import the CSV-file to the MySQL table.

Map the correct source and destination columns.

Everything went fine. Data amount was small.

Check that the table has been populated with values.

ID field has values automatically. They were not in flat file.

Connect to the MySQL database.

Select get data – from database – from MySQL database

Press ok.

Transform data.

Data types are ok. Table name was shortened to onek.

Close & Load – Close & Load to

Only create connection and add this data to the data model.

=CALCULATE(count(onek[Cre]),FILTER(onek,[Cre]>1000))-CALCULATE(count(onek[Deb]),FILTER(onek,[deb]>1000))

The measure is counting records Cre-fields in onek table where Cre-fields have value more than 1000. Corresponding debit values are decreased from credit values. As 1000 and  2000 are revenue accounts, they get increased in credit side.

Add measure.

=CALCULATE(SUM(onek[Cre]),FILTER(onek,[Cre]>1000))-CALCULATE(SUM(onek[Deb]),FILTER(onek,[deb]>1000))

Total_val counts the sum of the sales order more than 1000 units.

=sum(onek[cre])-sum(onek[deb])

Sum_val counts simply the sum of credit postings minus sum of debit postings.

To count how many percent of value are the sales orders exceeding 1000:

Now we create the pivot itself.

Create pivot table based on data model.

Cost centers are on vertical and accounts on horizontal axis. No sales order above 1000 in cost center C2, all of them are in C1. Altogether four sales orders two sits in account 1000 and two in 2000.

Now to calculate the total value of sales order exceeding 1000 units.

In the account 1000 the sales order exceeding 1000 cover more than a third of sales in value. The proportion of large sales orders are considerably lower in account 2000.

All the sales order more than 1000 units are in cost center CC1 and none in CC2.

We can check same in Power BI.

Get data from MySQL database.

Similar to Excel.

Transform data.

Again similar to Excel.

DAX is similar to Excel.

In cost center C2 there are two sales lines exceeding 1000 in both accounts 1000 and 2000.

The other measures behave similarly in Power BI. I am not creating all the measures in Power BI, but they all work equally.

Let’s still check something from database, that the number are correct.

SELECT

((SELECT COUNT(excel_demo.onek.cre)

FROM excel_demo.onek

WHERE excel_demo.onek.cre > 1000)

(SELECT COUNT(excel_demo.onek.deb)

FROM excel_demo.onek

WHERE excel_demo.onek.deb > 1000)) AS total

;

The total number of sales order exceeding 1000 is four.

SELECT 

((SELECT SUM(excel_demo.onek.cre)

FROM excel_demo.onek)

(SELECT  SUM(excel_demo.onek.deb)

FROM excel_demo.onek)) AS value

;

This SQL count the total sales. The 16282 ties with Excel.