How to Import Data from a Website to Excel?

It is well known that World Wide Web contains huge useful data. However, we have to import the data into Microsoft Excel before doing any kind of analysis. There are two methods that you can use to complete this kind of task.

Get External Data from Web Manually

Suppose that you want to download revenues about top moneymaking movies from this web page, here shows you how to do it easily.

Open Microsoft Excel and click on the Data tab, in the Get External Data group, click From Web. After New Web Query dialog box appears, copy the web address (http://www.the-numbers.com/movie/records/All-Time-Worldwide-Box-Office) into Address field and click on Go button. Excel will start to download the web page as shown in Figure 1.1. Just click on No if you will get a Script Error warning box which is similar to that in below figure. The box will disappear and it has no impact on your import process.

Import data from web to Excel Fig 1.1

Figure 1.1

There is an arrow in the yellow box on the upper right corner of New Web Query dialog box. By clicking on it, you can determine whether to show the similar icons before tables or not. For example, there is no arrow icon next to the table in the left panel of Figure 1.2. And there will be an icon (in the right panel) after you click on the arrow button to show icons.

Import data from web to Excel Fig 1.2

Figure 1.2 [click on the image to get a full view]

Click the arrow icon next to the tables that you want to select. The icons and tables will change and similar to that shown in the left panel of Figure 1.3. After you click on Import; an Import Data dialog box will be prompted. Fill the range (from column A through column H in our case) where you want to put the data and click on OK.

Import data from web to Excel Fig 1.3

Figure 1.3 [click on the image to get a full view]

The data will be imported into Excel after you click on Ok. If you right click on any cell within the table and select Refresh, Excel will go out and pull out most recent data from a web page.

Import data from web to Excel Fig 1.4

Figure 1.4

And you can even set your query data to be refreshed in any fashion you want. Just right click on any cell within the table, select Data Range Properties. In the prompted External Data Range Properties dialog box, you can change Refresh Control settings to change the style. For example, you can specify the query to be refreshed every 60 minutes or when the file is opened.

Import data from web to Excel Fig 1.5

Scrape data using VBA programming

VBA programming can be used to scrape data from a web page. It is much more useful while difficult when comparing against the first approach. Moreover, you need to understand what HTML is before learning how to scrape data using VBA programming. I recommend you to learn basic knowledge of HTML from this website if you know nothing or little about HTML. And from here, you can learn almost everything related to web scraping using VBA.  This article will only show you two examples.

Scrape data from one web page

Suppose that we want to extract company name, email address and contact name from this web page. If you open this web page, you will find that there is a contact block at the bottom. Figure 2.1 shows the contact block and the corresponding source code. Information in red boxes is what we want to need. And those with a green underline are exactly what we need to extract.

Import data from web to Excel Fig 2.1

Figure 2.1

The following code can help you extract above requested information and put them in the first worksheet.

Sub Retrieve_Click()
 
'Create InternetExplorer

Set IE = CreateObject("InternetExplorer.Application")
 
'Let's not see the browser window

IE.Visible = False
 
'Open the web page

IE.Navigate "http://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG8160044431&folderid=1736"
 
'Wait while IE is loading

Do While IE.readyState <> 4 Or IE.Busy = True
 
DoEvents
 
Loop
 
'Retrieve company name, email address & contact information

Set contactobj = IE.document.getElementsByClassName("contact-details block dark")
 
htext = contactobj(0).innerHTML
 
MsgBox htext
 
If InStr(htext, "<p>Company Name: ") Then
 
ThisWorkbook.Worksheets(1).Cells(1, 1) = Split(Split(htext, "<p>Company Name: ")(1), "<br")(0)
 
End If
 
If InStr(htext, "mailto:") Then
 
ThisWorkbook.Worksheets(1).Cells(2, 1) = Split(Split(htext, "mailto:")(1), Chr(34) & ">")(0)
 
End If
 
If InStr(htext, "<p>Name: ") Then
 
ThisWorkbook.Worksheets(1).Cells(3, 1) = Split(Split(htext, "<p>Name: ")(1), "<br")(0)
 
End If
 
ThisWorkbook.Worksheets(1).Cells(4, 1) = IE.LocationURL
 
ThisWorkbook.Save
 
Set IE = Nothing
 
Set contactobj = Nothing
 
End Sub

“IE.document.getElementsByClassName(“contact-details block dark”)” can enable you to get all elements with the class name – contact-details block dark. This web page lists almost all the properties and methods that can be used on HTML elements. You can choose the appropriate one for your own problem.

The innerHTML property can allow you to set or return the content of an HTML element. In our case, it returned the content of the element with class name “contact-details block dark “ and set it to a variable htext. The content (Figure 2.2) can be prompted by ‘Msgbox htext’.

Import data from web to Excel Fig 2.2

Figure 2.2

You can see that the text is well structured. That’s why we can use SPLIT function to extract what need. For example, if we take “<p>Company Name:” as a delimiter and formula “Split(htext, “<p>Company Name: “)(1)” can return the whole text after “<p>Company Name:”. For this newly returned text, we can take “<br” as a delimiter and get the text before the first “<br” and that is the company name. In summary, SPLIT function is a flexible tool that you can use to extract almost everything.  Other useful functions include LEN, INSTR, LEFT, RIGHT, MID and REPLACE. I will not discuss in details here.

After you click on OK in Figure 2.2, requested data can be imported from the web into Excel worksheet. For example, cell A1 contains Company Name while cell A4 contains company web page address.

Import data from web to Excel Fig 2.3

Figure 2.3

By adding below code before you save the workbook, you can add a hyperlink to cell A4.

'Add hyperlink

ThisWorkbook.Worksheets(1).Hyperlinks.Add ThisWorkbook.Worksheets(1).Cells(4, 1), ThisWorkbook.Worksheets(1).Cells(4, 1)

If you click on cell A4, you can revisit the web page. This is useful especially when you have to retrieve data for a lot of companies. Since you can click on any hyperlink to revisit the corresponding web page and add or update any information manually during review later.

Import data from web to Excel Fig 2.4

Figure 2.4

Interact with web pages

Above example just illustrates on how to retrieve data from one static web page. But more than often, we are required to extract have interacted with web pages in order to retrieve a large amount of data. Look at Figure 3.1. It shows you how to get to the web page of above example. You can see that there are a lot of industries and for each industry, there are a lot of companies. For example, there are 651 companies in Agribusiness industry. What should we do if we want to extract contact information of all the companies from all the industries?

Import data from web to Excel Fig 3.1

Figure 3.1 [click on the image to get a full view]

Well, the key point is that how to make VBA interact with web pages like what we do manually. Let’s take S.W.I.S Advantage as an example. Generally, we hope that Excel can click on Agribusiness (top panel of Figure 3.1) and thus trigger IE to direct us to the second web page. In the second web page (bottom panel of Figure 3.1), Excel can click on S.W.I.S Advantage and then IE directs us to page as shown in Figure 2.1 so that we can retrieve contact information of S.W.I.S Advantage.

After entering following code into Visual Basic Editor and submitting the code, you will see that your IE is opened. The first web page appears followed by the second web page. Here you will learn how to get drop-down list element, to select an option and to trigger event after you select the option.  “m = IE.document.getElementsByTagName(“option”).Length – 1” will give you a total number of options. This can be used in for next loop.

Sub retrieve()
 
'Create InternetExplorer

Set IE = CreateObject("InternetExplorer.Application")
 
'Let's see the browser window

IE.Visible = True
 
'Open the web page

IE.Navigate "http://www.austrade.gov.au/international/buy#"
 
'Wait while IE is loading

Do While IE.Busy
 
Application.Wait DateAdd("s", 1, Now)
 
Loop
 
Application.Wait (Now + TimeValue("00:00:10"))
 
'Part 1 - Select dropdown list and trigger event after you select one option

Set selectobj = IE.document.getElementsByTagName("select")
 
m = IE.document.getElementsByTagName("option").Length - 1
 
selectobj(0).selectedIndex = 1
 
selectobj(0).FireEvent ("onchange")
 
'Wait while IE is loading

Do While IE.readyState <> 4 Or IE.Busy = True
 
Application.Wait DateAdd("s", 1, Now)
 
Loop
 
Application.Wait (Now + TimeValue("00:00:10"))
 
End Sub

This part of the code can direct you to the web page shown in Figure 2.1 after Excel click on the first name. All companies’ names are included in the element with the class name of “Name”.  Searchobj is a collection and searchobj(i) can return the (i+1)th object. For example, searchobj(1).Click can enable you to visit the web page for RIDLEY CORPORATION (Melbourne).

'Part 2 - Select company Name

Set searchobj = IE.document.getElementsByClassName("Name")
 
searchobj(0).Click
 
'Wait while IE is loading

Do While IE.readyState <> 4 Or IE.Busy = True
 
DoEvents
 
Loop

Finally, here is the whole code that can show you the process of opening IE, browsing web pages and extracting data. The data extracted will be the same as that in Figure 2.4.

Sub Retrieve()
 
'Create InternetExplorer

Set IE = CreateObject("InternetExplorer.Application")
 
'Let's see the browser window

IE.Visible = True
 
'Open the web page

IE.Navigate "http://www.austrade.gov.au/international/buy#"
 
'Wait while IE is loading

Do While IE.Busy
 
Application.Wait DateAdd("s", 1, Now)
 
Loop
 
Application.Wait (Now + TimeValue("00:00:10"))
 
'Part 1 - Select dropdown list and trigger event after you select one option

Set selectobj = IE.document.getElementsByTagName("select")
 
m = IE.document.getElementsByTagName("option").Length - 1
 
selectobj(0).selectedIndex = 1
 
selectobj(0).FireEvent ("onchange")
 
'Wait while IE is loading

Do While IE.readyState <> 4 Or IE.Busy = True
 
Application.Wait DateAdd("s", 1, Now)
 
Loop
 
Application.Wait (Now + TimeValue("00:00:10"))
 
'Part 2 - Select company Name

Set searchobj = IE.document.getElementsByClassName("Name")
 
searchobj(0).Click
 
'Wait while IE is loading

Do While IE.readyState <> 4 Or IE.Busy = True
 
DoEvents
 
Loop
 
'Part 3 - Retrieve company name, email address & contact information

Set contactobj = IE.document.getElementsByClassName("contact-details block dark")
 
htext = contactobj(0).innerHTML
 
If InStr(htext, "<p>Company Name: ") Then
 
ThisWorkbook.Worksheets(1).Cells(1, 1) = Split(Split(htext, "<p>Company Name: ")(1), "<br")(0)
 
End If
 
If InStr(htext, "mailto:") Then
 
ThisWorkbook.Worksheets(1).Cells(2, 1) = Split(Split(htext, "mailto:")(1), Chr(34) & ">")(0)
 
End If
 
If InStr(htext, "<p>Name: ") Then
 
ThisWorkbook.Worksheets(1).Cells(3, 1) = Split(Split(htext, "<p>Name: ")(1), "<br")(0)
 
End If
 
ThisWorkbook.Worksheets(1).Cells(4, 1) = IE.LocationURL
 
'Add hyperlink

ThisWorkbook.Worksheets(1).Hyperlinks.Add ThisWorkbook.Worksheets(1).Cells(4, 1), ThisWorkbook.Worksheets(1).Cells(4, 1)
 
End Sub

In fact, what we need to do is to extract contact information for all companies from all industries, therefore, we need to do use for loop statement to complete this task. Following is the complete code. And you can also find the code in Retrieve contact information for all companies.xlsm which you can download at the end of this article.

Sub Retrieve()
 
For idex = 2 To 18
 
'Create InternetExplorer

Set IE = CreateObject("InternetExplorer.Application")
 
'Let's see the browser window

IE.Visible = False
 
'Open the web page

IE.Navigate "http://www.austrade.gov.au/international/buy#"
 
'Wait while IE is loading

Do While IE.Busy
 
Application.Wait DateAdd("s", 1, Now)
 
Loop
 
Application.Wait (Now + TimeValue("00:00:10"))
 
idexn = idex - 1
 
'Part 1 - Select dropdown

Set selectobj = IE.document.getElementsByTagName("select")
 
m = IE.document.getElementsByTagName("option").Length - 1
 
selectobj(0).selectedIndex = idexn
 
selectobj(0).FireEvent ("onchange")
 
'Wait while IE is loading

Do While IE.readyState <> 4 Or IE.Busy = True
 
Application.Wait DateAdd("s", 1, Now)
 
Loop
 
Application.Wait (Now + TimeValue("00:00:10"))
 
wurl = IE.LocationURL
 
tot = IE.document.getElementsByClassName("SearchTotal")(0).innerHTML
 
pg = Int(tot / 25) + 1
 
Max = (tot Mod 25) - 1
 
'Part 2 - Select Class = "Name"

a = 2
 
For j = 1 To pg
 
If j = 1 Then
 
IE.Navigate (wurl)
 
Else
 
IE.Navigate (wurl & "&pg=" & j)
 
End If
 
Do While IE.Busy
 
Application.Wait DateAdd("s", 1, Now)
 
Loop
 
If j <> pg Then
 
For i = 1 To 24
 
Set searchobj = IE.document.getElementsByClassName("Name")
 
searchobj(i).Click
 
'Wait while IE is loading

Do While IE.readyState <> 4 Or IE.Busy = True
 
DoEvents
 
Loop
 
'Part 3 - Retrieve company name, email address & contact information

Set contactobj = IE.document.getElementsByClassName("contact-details block dark")
 
htext = contactobj(0).innerHTML
 
ThisWorkbook.Worksheets(idex).Cells(a, 1) = j
 
ThisWorkbook.Worksheets(idex).Cells(a, 2) = a - 1
 
If InStr(htext, "<p>Company Name: ") Then
 
ThisWorkbook.Worksheets(idex).Cells(a, 3) = Split(Split(htext, "<p>Company Name: ")(1), "<br")(0)
 
End If
 
If InStr(htext, "mailto:") Then
 
ThisWorkbook.Worksheets(idex).Cells(a, 4) = Split(Split(htext, "mailto:")(1), Chr(34) & ">")(0)
 
End If
 
If InStr(htext, "<p>Name: ") Then
 
ThisWorkbook.Worksheets(idex).Cells(a, 5) = Split(Split(htext, "<p>Name: ")(1), "<br")(0)
 
End If
 
ThisWorkbook.Worksheets(idex).Cells(a, 6) = IE.LocationURL
 
IE.GoBack
 
Do While IE.Busy
 
Application.Wait DateAdd("s", 1, Now)
 
Loop
 
a = a + 1
 
Next i
 
Else
 
For i = 0 To Max
 
Set searchobj = IE.document.getElementsByClassName("Name")
 
searchobj(i).Click
 
'Wait while IE is loading

Do While IE.readyState <> 4 Or IE.Busy = True
 
DoEvents
 
Loop
 
'Part 3 - Retrieve company name, email address & contact information

Set contactobj = IE.document.getElementsByClassName("contact-details block dark")
 
htext = contactobj(0).innerHTML
 
ThisWorkbook.Worksheets(idex).Cells(a, 1) = j
 
ThisWorkbook.Worksheets(idex).Cells(a, 2) = a - 1
 
If InStr(htext, "<p>Company Name: ") Then
 
ThisWorkbook.Worksheets(idex).Cells(a, 3) = Split(Split(htext, "<p>Company Name: ")(1), "<br")(0)
 
End If
 
If InStr(htext, "mailto:") Then
 
ThisWorkbook.Worksheets(idex).Cells(a, 4) = Split(Split(htext, "mailto:")(1), Chr(34) & ">")(0)
 
End If
 
If InStr(htext, "<p>Name: ") Then
 
ThisWorkbook.Worksheets(idex).Cells(a, 5) = Split(Split(htext, "<p>Name: ")(1), "<br")(0)
 
End If
 
ThisWorkbook.Worksheets(idex).Cells(a, 6) = IE.LocationURL
 
ThisWorkbook.Worksheets(idex).Hyperlinks.Add ThisWorkbook.Worksheets(idex).Cells(a, 6), ThisWorkbook.Worksheets(idex).Cells(a, 6)
 
IE.GoBack
 
Do While IE.Busy
 
Application.Wait DateAdd("s", 1, Now)
 
Loop
 
a = a + 1
 
Next i
 
End If
 
ThisWorkbook.Save
 
Next j
 
Set IE = Nothing
 
Set contactobj = Nothing
 
Next idex
 
End Sub

The only point that I need to explain is illustrated in Figure 3.2. One web can list only as many as 25 companies. When the total number of companies is greater than 25, there will be more than one page. Figure 3.2 shows that there is a rule to get the address of pages after the first page. It is the result of the concatenating address of the first page, “&pg=” and the actual page number. And for all pages before the last page, the total object number is 25.  “IE.document.getElementsByClassName(“SearchTotal”)(0).innerHTML” can return the total number of companies within an industry. In our case, it will be 651. “Int(tot / 25) + 1” can get you the total number of pages. And “Max = (tot Mod 25) – 1” can return the max number of companies on the last page. I will stop here and leave you to figure out how to apply this idea to the code. It is a much better approach for you to grasp code. You can leave a comment if you have questions.

Import data from web to Excel Fig 3.2

Figure 3.2

Here shows you part of the final excel. Contact information for all companies within one industry is put together in one worksheet.

Import data from web to Excel Fig 3.3

Figure 3.3 [click on the image to get a full view]

Read More: 

How to extract data from website to Excel automatically?

 Import Data (Sentence, Paragraphs, Tables, Comments) from Word to Excel

Download working file

Download the working file from the link below.

Pull-Data-from-Web-to-Excel.rar


I am from China and this photo was taken in a classical garden. There are many similar gardens in China, attracting a lot of visitors every year, especially in spring and summer. I was major in Biotechnology. But I took a job as an SAS programmer because I prefer programming. Besides SAS, I also learned Excel VBA in my spare time. It is fantastic to be able to manipulate data, files and even to interact with the internet via programming. This will save me a lot of time. I am keen to learn new things.

4 Comments
  1. Reply
    Ken March 6, 2017 at 10:38 PM

    Since Microsoft discontinued Internet Explorer, is it possible to modify the code for Google Chrome or Microsoft Edge?

  2. Reply
    Ron June 4, 2017 at 11:32 PM

    Can this be replicated to hotel/travel search websites like trivago, expedia, etc?

    • Reply
      Kawser June 5, 2017 at 10:13 AM

      Check out trying. I never tried. If you face any problem, let us know 🙂

  3. Reply
    Tung Le July 30, 2018 at 4:46 PM

    hi Kawser,
    May I have the code for the similar case?
    List to be extract : http://bocongan.gov.vn/truy-na/doi-tuong-truy-na.html
    thanks in advance

    Leave a reply