Read pdf using ItextSharp



I am going to show how you can use Itextsharp to read a table or paragraph from pdf
In last article I have shown how to download and include itextsharp in project How to download ItextSharp 

Follow this step to read a pdf using Itextsharp

Step1: Create a project and include a dll reference.

Step 2: I have created folder name PDF to store pdf file, through which we will be read PDFs file.

Step 3: on code behind we have created a method getParagraphByCoOrdinate which read your pdf

public string[] getParagraphByCoOrdinate(string filepath,int pageno,int cordinate1,int coordinate2,int coordinate3,int coordinate4)
        PdfReader reader = new PdfReader(filepath);
        iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(cordinate1, coordinate2, coordinate3, coordinate4);
        RenderFilter[] renderFilter = new RenderFilter[1];
        renderFilter[0] = new RegionTextRenderFilter(rect);
        ITextExtractionStrategy textExtractionStrategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), renderFilter);
        string text = PdfTextExtractor.GetTextFromPage(reader,pageno, textExtractionStrategy);
        string[] words = text.Split('\n');
        return words;


This method takes six parameters

 File path: path to pdf file

 Page no: pdf page number to which page we want to read.

We are going to read pdf by creating a rectangle or for a particular area for that we are Rectangle constructor which takes four values
        public Rectangle(float llx, float lly, float urx, float ury);

llx - lower left x
lly - lower left y
urx - upper right x
ury - upper right y

Now on page load we are going to pass these values
 protected void Page_Load(object sender, EventArgs e)

        //code to get file path
        string filepath = Server.MapPath("PDF");

        filepath = filepath + "\\ak2.pdf";
        // passing cordinate to  method
        string[] strarray = getParagraphByCoOrdinate(filepath, 1, 50, 850, 450, 600);

        //code to display data on webapge
        Response.Write("<b>Table 1</b> <br/>");
        foreach (string str in strarray)
            Response.Write(str + "<br/>");

        string[] strarray1 = getParagraphByCoOrdinate(filepath, 1, 50, 590, 450, 505);
        Response.Write("<b>Table 2</b> <br/>");
        foreach (string str in strarray1)
            Response.Write(str + "<br/>");

  1. Hello, this is very nice example. But I cannot understand how the llx lly urx and ury co-ordinates work! please help this is perfect solution

    1. Hello these are co-ordinates, anything between this co-ordinates are read by getParagraphByCoOrdinate

  2. The coordinates I don't understand too. The program works well, but when I try to change the area for different parts in the PDF, it doesn't find them. It does not look like normal x y co-ords.

    1. yes it not like normal x y co-ordinates i don't konw is that where it takes origin just try by hit and trial it easy to identify

  3. hello, does exist a method that can retreive data from table with respecting each column, in other words to obtain the same structure of excel file? thinks

