XML is commonly used language. A tag starts with starting tag <TAG> and ends with ending tag </TAG>. The data itself is between the tags.
The example CD catalog was taken from following page XML Examples (w3schools.com) .
The data should be extracted into Excel.
This is the data:
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
The data should be extracted like this:
Empire Burlesque
Bob Dylan
USA
Columbia
10.90
1985
Or even better to have name of the segment and data itself in separate columns like this:
TITLE Empire Burlesque
ARTIST Bob Dylan
COUNTRY USA
COMPANY Columbia
PRICE 10.90
YEAR 1985
To extract the data, we can use MID function which extract several characters from the middle of the string.
MID consists of three arguments:
- Text: from which text MID is extracting characters.
- Start_num: what is the first character to be extracted.
- Num_char: how many characters are to be extracted.
To extract the segment, can be done with following sentence MID(A3;2;(FIND(“>”;A3))-2.
The extraction is in cell A3, we are starting from the second character. The number of characters is defined with FIND formula. FIND is looking for the > sign. > is the 7th character, we need to deduct two from the result. We take five digits starting from second character.
<TITLE>Empire Burlesque</TITLE>
In case the MID returns an error value, we rather have an empty value than an error.
=IF(ISERROR(MID(A3;2;(FIND(“>”;A3))-2));””;(MID(A3;2;(FIND(“>”;A3))-2)))
The sentence to return the data content is more complex.
The first argument, text, is easy. We have the text in A3 cell. The start number is where > is added by one.
<TITLE>Empire Burlesque</TITLE>
In XML above, the data starts with 8th character.
Number of characters is calculated by finding where / sign. That result is deducted by where > situates. That result is deducted by two. In Empire Burlesque example the / is 25th character, > is the seventh. 25 minus seven minus two equals 16. The MID needs to extract 16 characters starting with eighth digit as / is the seventh digit and we need to start extracting from the following character.
Just one notification, there are two > signs in the string. FIND function holds the argument start_num, where the find starts. As that is not defined, the FIND returns the value for the first >, which is something we are looking for.
MID(A3;(FIND(“>”;A3)+1);(FIND(“/”;A3)-(FIND(“>”;A3)))-2
In case MID returns an error, the MID can be framed with IF..ISERROR.
=IF(ISERROR(MID(A11;(FIND(“>”;A11)+1);(FIND(“/”;A11)-(FIND(“>”;A11)))-2));””;(MID(A11;(FIND(“>”;A11)+1);(FIND(“/”;A11)-(FIND(“>”;A11)))-2)))

On the top formula text in the cell E3 and below formula text in the cell F3.