XML data

Sometime data you want to investigate is in XML-format. If you download data from a system and data happens to be in XML format. This blog demonstrates how you can analyze data with Excel and Access. The example data is taken from page https://www.w3schools.com/xml/cd_catalog.xml .  Thanks for organizing and providing the data.

Here is a part of the data:

<CATALOG>

<CD>

<TITLE>Empire Burlesque</TITLE>

<ARTIST>Bob Dylan</ARTIST>

<COUNTRY>USA</COUNTRY>

<COMPANY>Columbia</COMPANY>

<PRICE>10.90</PRICE>

<YEAR>1985</YEAR>

</CD>

<CD>

<TITLE>Hide your heart</TITLE>

<ARTIST>Bonnie Tyler</ARTIST>

<COUNTRY>UK</COUNTRY>

<COMPANY>CBS Records</COMPANY>

<PRICE>9.90</PRICE>

<YEAR>1988</YEAR>

</CD>

<CD>

<TITLE>Greatest Hits</TITLE>

<ARTIST>Dolly Parton</ARTIST>

<COUNTRY>USA</COUNTRY>

<COMPANY>RCA</COMPANY>

<PRICE>9.90</PRICE>

<YEAR>1982</YEAR>

</CD>

<CD>

<TITLE>Still got the blues</TITLE>

<ARTIST>Gary Moore</ARTIST>

<COUNTRY>UK</COUNTRY>

<COMPANY>Virgin records</COMPANY>

<PRICE>10.20</PRICE>

<YEAR>1990</YEAR>

</CD>

XML is starting with tag <CATALOG> and end with </CATALOG> which is not visible here. Each CD starts with tag <CD> and ends with </CD>. Data fields are title, artist, country, company, price, and year. For example, title starts with tag <TITLE> and ends with </TITLE>. In each line there can be only one start and end tag or just one of those.

Copy and paste the sample data into Notepad++. Save with extension XML.

Data looks like this. By the way, if you don’t yet use Notepad++, I recommend that, as that holds more features than normal Notepad.

Vertical bar on left side helps to characterize each CD.

One way is just to take file – open and browse XML-file.

Excel asks you how you would like to open the file.

As an XML table.

Press ok.

Data is just file.

You can also calculate from table like value of the collection is 237 units and average year is 1988.

If you open the file with second radio button.

Read-only workbook.

Read only table.

If you select source task pane, the third option.

Press ok.

Again press ok.

At the right hand side, you will have XML catalog.

Drag and drop fields from the XML source to worksheet.

Select Developer – XML – refresh data.

Data is brought to worksheet.

One way to handle XML-data with Excel, is to make a data model and use XML file as data source.

Select data – get data – from file – from XML.

Data is organized correctly.

Press transform data.

Data type for price was changed for decimal number and year as whole number.

Just create a pivot table based on the XML file.

The total price of CD collection can be counted with a simple DAX.

Results.

Also the average year is counted.

Let’s see how XML file can be used with MS Access.

External data can be fed into Access via new data source – from file – XML file.

Before uploading the data, I changed the decimal separator to be dot, as price is with dot.

Data looks neat.

Another simple query.

The result is same as in Excel.

Query about average year in CD collection.

The result is same than in Excel.

If you happen to have data in XML-format, that can be used in Excel and in Access. XML format should not be overlooked with Excel or Access. You don’t need fancy tools to analyze XML-data.  The example I used was simple and neat. Real life XML-files can be more complex and cause issues which was not seen here.