We need to extract data from various places to Excel for calculations and data manipulation reasons. PDF is one of the most used documents used worldwide and an important source of data. How we extract data from multiple PDF files to Excel is discussed here with detailed instructions with adequate examples.
Download Practice Workbook
Download this practice workbook below.
3 Ways to Extract Data from Multiple PDF Files to Excel
We are going to use the below PDF file dataset for demonstrating the extraction of data from PDF to Excel sheets.
1. Extract Data from Multiple PDF Files Using Power Query
Power Query is a data preparation or processing engine. Here we will extract data from the PDF table and then process it in another window in Excel. Then we will get the output and load the result in the Excel worksheet altogether.
We need to open 3 PDF files and load all the tables inside of them in the Excel worksheet given below.
- To do this, first, go to the Data tab. Next, click the Get Data command
- After clicking the Get Data icon, go to From File to From Folder shown in the image.
- After clicking the From Folder, a new Browse window will open, from that window select the folder in your computer where your PDF files are being saved. Click on Open after this.
- After clicking Open, a new Power Query window will open from where you will notice that all of the PDF files inside the folder are now loaded with their name listed here in the Name columns.
- After this, click on the Combine Option menu below.
- Now, choose Combine & Transform Data if you need to change data. Otherwise, click on Combine & Load icon.
- Then another query window will open.
- In that window, you can cycle through files by the Sample File window.
- You can choose a table of a particular file for the preview.
- Click OK after this.
- Then you will notice that all of the tables are now loaded into a new query window. The left-most column of the table denotes the source of the data, whether it is from Dataset_1 or Dataset_2, or Dataset_3.
- After this, from the Home tab, click on Close & Load, then click on Close and Load To.
- The power query window will close and a new window named Import Data will spawn, in that window select the Existing Worksheet option and also select the location of the loaded data in the range box, here it is, $B$5:$F$29. Click OK after this.
- After this, you will notice the data table is now loaded into the specified location as a table in the worksheet.
- Now select the table and go to Table Design, from there select Convert to Range from Tools group to convert the table back to the range.
- Now the loaded data is converted into range.
- Select and delete the range of cells B4:B35. Then our dataset will be free from Source Column.
- Now you can see that the Excel worksheet now has extracted data from multiple PDF files.
All of the tables in the PDF are now loaded in one Excel worksheet. This is the easiest way to extract data from multiple PDF files in Excel.
2. Using Microsoft Word
Microsoft word is another handy tool to manipulate PDF docs. Word can open a variety of document formats besides its own docx format. PDF is one of them.
- We are going to import this below PDF file in Excel.
- To do this, open Microsoft Word, and from the file menu, click Open and then click on Browse.
- A new file Browse menu will open, from that menu go to your file location where your PDF files are located and select your PDF file. Click Open after this.
- The PDF file is then open in the word window in editable form. You can edit this file now.
- Now select the table and right-click on the mouse. A context menu will appear.
- From the context menu, click on Copy.
- Then head over to your Excel worksheet, in the Excel worksheet select cell B4, and right-click on the mouse.
- From the context menu select the Keep Source Formatting (K) icon.
- After clicking the Paste icon, you will notice the table from the word file now pasted here.
- You may need to alter the format of the cell as it may not contain the same format as in the PDF, because of the same size. But the data will extract here accurately.
- The below image is the final image after tweaking some formatting like text color, column width, cell color, etc.
- You can repeat the same process for other files and add as many as files information in your Excel worksheet you need.
- Below now is given the second PDF file data after they are extracted into the Excel worksheet.
- Below now is given the third PDF file data after they are extracted into the Excel worksheet.
3. Utilizing Copy and Paste Feature
Copy and pasting the data from the PDF file to the Excel worksheet is the most straightforward way to transfer data from pdf to Excel sheet. But data after extraction needs modification afterward.
- In this method, we are going to extract table data from the below PDF file below.
- First, we are going to press Ctrl+A, to select all the data in the PDF file that table file.
- Next, right-click on the mouse, and from the context menu select Copy.
- After copying the data, head over to your Excel worksheet and select cell B4.
- Then right-click on the mouse again and click on the Keep Source Formatting (K) icon from Paste Options.
- After clicking on the paste icon, you are going to see a PDF table now pasted into the Excel sheet.
- But this table isn’t the same as in the PDF. The reason behind this is that the cell size in Excel is fixed, and for many cell values it isn’t adequate.
- That’s why you will need some formatting to the Excel worksheet, like changing text color, cell color, cell width change, etc.
After applying the formatting you will notice that the imported table now looks exactly like the PDF one.
To sum it up, the question “how to extract data from multiple PDF files in Excel” is answered here in 3 different ways. The most useful method is to use a power query o get data from multiple PDFs at once. There are other methods which are basically copying and pasting data from PDF to Excel and formatting them appropriately afterward.
For this problem, a workbook is available to download where you can practice these methods.
Feel free to ask any questions or feedback through the comment section. Any suggestion for the betterment of the Exceldemy community will be highly appreciable.