The data which is analyzed here is taken from Maven Analytics Data playground Free Data Sets & Dataset Samples | Maven Analytics . The data set is UK Train rides. Thanks for Maven Analytics for publishing the data sample. I have also answered the four questions in the data playground.
The data is downloaded into Microsoft Access. There I have used SQL queries. Then I made a data model in Excel. The Access database is the source for my data model in Excel.
- What are the most popular routes?
The query:
SELECT COUNT([Departure Station] & [Arrival Destination]), [Departure Station] & [Arrival Destination]
FROM Railway
GROUP BY [Departure Station] & [Arrival Destination]
ORDER BY COUNT([Departure Station] & [Arrival Destination]) DESC;

The same result was found with Excel.
A route is a combination of departure station and arrival destination.
=CALCULATE(count(Railway[Route]),ALLEXCEPT(Railway, Railway[Route]))

The most popular route is Manchester Piccadilly – Liverpool Lime Street, the second is the route from London Euston to Birmingham New Street.
2. What are the peak travel times?
The query:
SELECT COUNT([Departure Time]), [Departure Time]
FROM Railway
GROUP BY [Departure Time]
ORDER BY COUNT([Departure Time]) DESC;

In Excel the DAX is nearly the same as with the first question.
=CALCULATE(count(Railway[Departure Time]),ALLEXCEPT(Railway, Railway[Departure Time]))

The most popular travelling times in the morning are 6:30 and 8:00. Looks like 7:00 is not popular time at all. People are leaving for school or for work early 6:30 or bit later 8:00.
In the evening most popular travelling times are 18:45 and 17:45. There is a difference of one hour. 18:00 is not popular time.
3. How does revenue vary by ticket types and classes?
The query:
SELECT SUM(Price), [Ticket Type], [Ticket Class]
FROM Railway
GROUP BY [Ticket Type], [Ticket Class]
ORDER BY SUM(Price) DESC

With Excel, the Dax is simple. Just calculate the sum just like in Excel.
=sum(Railway[Price])

Standard tickets produce most of the revenue. Advance is selling best and anytime worst among the ticket types.
4. What is the on-time performance? What are the main contributing factors?
Query:
SELECT COUNT([Journey Status]), [Journey Status], ROUND(COUNT(*) *100/ (SELECT count(*) FROM Railway), 1)
FROM Railway
GROUP BY [Journey Status];

The same can be calculated in Dax with following sentences:
=CALCULATE(count(Railway[Journey Status]),ALLEXCEPT(Railway, Railway[Journey Status]))
The Dax above can be replaced by this:
=COUNTAX(Railway, [Journey Status])
It would be useful to have percentages also. To count the total number of values in Journal status field, we want to have following Dax:
=CALCULATE(COUNTA(Railway[Journey Status]), ALL(Railway[Journey Status]))
To divide first with second, we have this Dax.
=divide([4_1_delay], [4_2_delay])

Query:
SELECT COUNT([Reason for Delay]), [Reason for Delay], ROUND(COUNT([Reason for Delay]) *100/ (SELECT count([Reason for Delay]) FROM Railway), 1)
FROM Railway
GROUP BY [Reason for Delay]
ORDER BY COUNT([Reason for Delay]) DESC;

To count how many times each reason for delay is appearing, we use COUNTAX Dax:
=COUNTAX(Railway, [Reason for Delay])
To have the total number of reason of delays, we have:
=CALCULATE(COUNTA(Railway [Reason for Delay]), ALL(Railway [Reason for Delay]))
To have a division we need to have DIVIDE command:
=divide([4_5_reason],[4_6_reason])

The most typical reason for delay is weather. The are two reason codes weather and weather conditions. The two could be summarized, then we would see that about one third of delays are due to weather.
I was using two tools to answer the questions in Maven Analytics playground: Access and Excel. Same answers could be found using both the tools. Data sets are useful as you can practice with the tools you like.

















