How to use Tesseract OCR in C# - Full Tutorial

Last updated: Aug 26, 2024 Optical Character Recognition (OCR) technology has revolutionized the way we interact with documents, images, and text data. By converting scanned images and PDFs into searchable and editable text, OCR opens up a world of possibilities for automation, data extraction, and text analysis. In this tutorial, we will walk you through using Tesseract OCR in C#, leveraging the power of IronOCR, a comprehensive .NET library that simplifies OCR processes. Whether you're working on Windows Forms, ASP.NET, or any other .NET framework, this guide will equip you with the knowledge to extract text from images quickly and efficiently.

Why Choose IronOCR for Tesseract OCR?

IronOCR is more than just a library; it's a robust solution that encapsulates the Tesseract OCR engine within a user-friendly .NET wrapper. By using IronOCR, you get access to the advanced capabilities of Tesseract, coupled with enhanced features like error correction, language support, and cross-platform compatibility. The library is designed for developers who want to integrate OCR functionality into their .NET applications with minimal effort and maximum flexibility.

Key Benefits of IronOCR:

IronOCR

Setting Up Your Project

Begin by creating a new C# project in Visual Studio. You can choose any project type, such as a Console App, Windows Forms, or ASP.NET application. Once your project is set up, you'll need to install the IronOCR package via NuGet.

Step # 1: Open Visual Studio and Create Project

Open Visual Studio. I am using Visual Studio 2019, but you can use any version.
image
Select “Create New Project”. Select the Windows Form Application from the template.
image
Click “Next”. Name the Project, select Location, and click “Next”.
image
Click “Next” and select the “target framework''. I have chosen .Net (5.0), but you can choose your preferred option. Click “Finish”. The Windows Form Application will be created as shown below.
image
Before proceeding further, we need to install the Nuget Package for IronOCR.

Step # 2: Install Nuget Package IronOcr

Open the Nuget Package Manager Console from Tools > Nuget Package Manager > Package Manager Console.
image
The Package Manager Console will open as shown below.
image
Type “Install-Package IronOcr” in the Nuget Package Manager Console and click “Enter”.
image
IronOCR will begin installing in your project. Wait for a while. After installation is complete, open your Windows Form and design your Application.

Step 3: Designing Your Application Interface (Windows Forms Example)

For this tutorial, we'll create a simple Windows Forms application that allows users to select an image, perform OCR, and display the extracted text. Start by designing your form with the following controls:

Your form might look something like this:

image

Now that the interface is ready, let's write the code to handle image selection and OCR processing.

Step # 4: Writing the Code behind the Buttons

Double-click on the “Select Image” button.
The following code will appear:

private void SelectImage_Click(object sender, EventArgs e)
Enter fullscreen mode

Exit fullscreen mode

Write the following code inside this function:

private void SelectImage_Click(object sender, EventArgs e) < OpenFileDialog open = new OpenFileDialog(); // image filters open.Filter = "Image Files(*.jpg; *.jpeg; *.gif; *.bmp)|*.jpg; *.jpeg; *.gif; *.bmp"; if (open.ShowDialog() == DialogResult.OK) < // display image in picture box pictureBox1.Image = new Bitmap(open.FileName); // image file path ImagePath.Text = open.FileName; >> 
Enter fullscreen mode

Exit fullscreen mode

Next, double-click on the “Convert to Text Button” and the following code will appear:

private void ConvertToText_Click(object sender, EventArgs e)
Enter fullscreen mode

Exit fullscreen mode

Add the following namespace at the top of the file: using IronOcr;

Next, add the following code inside the ConvertToText_Click() function:

 private void ConvertToText_Click(object sender, EventArgs e)
Enter fullscreen mode

Exit fullscreen mode

As you can see, we only needed to write three lines of code to perform this major task, all thanks to IronOcr.

Step # 5: Run the Project

Let’s run the Project.
Press Ctrl + F5 to run the Project.
image
Click on the “Select Image” button to select the image.
image
Select an image of your choice. I am selecting a snapshot of an article, but you can select any of your choosing.
image
Next, click the “Convert to Text” button to extract all the text from this newspaper image as shown below.
image
You can see that I have easily extracted text from an image of the article. It is very accurate and easy to use for any ongoing purpose. IronOcr has made this job incredibly easy.

Using IronOcr to Extract Text in Different Languages

One of the standout features of IronOCR is its support for over 150 languages. Whether you need to extract text in English, Chinese, Arabic, or any other language, IronOCR makes it straightforward.

Step # 1: Install the Nuget Package for the Specific Language

To extract text in a language other than English, you need to install the corresponding language package via NuGet. For example, to work with Chinese, use the following command:

Install-Package IronOcr.Languages.Chinese 
Enter fullscreen mode

Exit fullscreen mode

image


Once the language package is installed, update your code to specify the language: IronOcr.Language = OcrLanguage.ChineseSimplified;
Such as:

 private void ConvertToText_Click(object sender, EventArgs e)
Enter fullscreen mode

Exit fullscreen mode

Let’s do the test again.

Step # 2: Run the Project

image


We can see that we have easily converted our Chinese language image into text with just one line of code. The IronOcr .Net library provides accuracy, efficiency, and an easy method to employ with our .Net Application.

How to Extract Text from the Image using Traditional Tesseract: A Step-by-Step Guide

Let’s look at the following example to see how we can achieve the same goal using Tesseract OCR. We can keep the same Windows Form as the previous example and just change the code behind the “ConvertToText”_Click button. Everything else will remain the same as before.

Step # 1: Install Nuget Package for Tesseract

Write the following command in the Nuget Package Manager Console.

image

Install-Package Tesseract

After installing the Nuget Package, you must install the language files manually in the project folder. One could say that this is a drawback of this particular library. Download the language files from the following link .Unzip it and copy the tessdata folder in the debug folder of your project.
Next, write the following code inside the ConvertToText_Click function:

Now, write the following code inside the ConvertToText_Click Function

private void ConvertToText_Click(object sender, EventArgs e)
Enter fullscreen mode

Exit fullscreen mode

Step # 2: Run the Project

image

Press Ctrl + F5 to run the project. Select the image file you want to convert. I have selected the same file in the English language as in the previous example. Click the “Convert to Text” button to extract the text from the image. The following window will appear:

Tesseract also supports images featuring different languages. However, we have to add separate language files into our project folder.
It is now becoming clear that the IronOcr .Net Library is far easier to use.

Now, It is clearly understood that IronOcr .Net Library is more easy to use and easy to understandable.

Practical Use Cases for OCR in C

IronOCR's versatility makes it a valuable tool in various industries and applications. Here are some common use cases:

Common Errors and Troubleshooting

While IronOCR is user-friendly, you might encounter some common errors during implementation. Here’s how to troubleshoot them:

Advanced Features of IronOCR

Beyond basic OCR, IronOCR offers several advanced features:

Conclusion:

IronOCR stands out as a top choice for developers integrating OCR in C# applications, offering a seamless experience with its easy integration, support for over 150 languages, and powerful features like image pre-processing and multithreading. Whether you're building simple or complex OCR solutions, IronOCR simplifies text extraction from images, catering to developers of all experience levels. Start your journey with IronOCR by downloading the library and exploring its extensive documentation. With regular updates and a free trial by Iron Software, you have everything you need to build robust, OCR-powered applications. Happy coding!