Recognize Text From PDF Step
  • 17 Mar 2023
  • 1 Minute to read
  • Dark
    Light

Recognize Text From PDF Step

  • Dark
    Light

Article summary

Overview

Step Details

Introduced in Version4.0.9
Last Modified in Version4.0.9
LocationData > PDF

The Recognize Text From PDF step scans the Pdf file for string text and outputs all recognized text. Unlike the Get Text From PDF step, this step does not support options for Page selection or whitespace preservation: it only reads the text. Text outputs from this step may deviate slightly from the original content, so other Flow logic steps, like Replace Text step, may correct these errors assuming the text is all in a similar style. This step may take fifteen to thirty seconds to execute.


Properties

Inputs

PropertyDescriptionData Type
PDF DocumentFile to scan for recognizable text.FileData

Outputs

PropertyDescriptionData Type
OutputText that was recognized in the PDF file. If nothing was recognized, the step will output an empty string.String

Debugger showing Inputs and Outputs on the Recognize Text from PDF step.


Common Errors

Incorrect file header at

A file type other than .pdf has been used as the PDF input. To correct this, change the PDF input to a PDF file.

Exception Message:

Exception Stack Trace: DecisionsFramework.Design.Flow.ErrorRunningFlowStep: Error running step Recognize Text From PDF 1[RecognizeTextFromPdf] in flow [Display Steps]: Exception invoking method RecognizeTextFromPdf on class PdfManagementSteps
 ---> DecisionsFramework.LoggedException: Exception invoking method RecognizeTextFromPdf on class PdfManagementSteps
 ---> Aspose.Pdf.InvalidPdfFileFormatException: Incorrect file header at #=zKZIslgV_VTgI_laDaHeWplnCCTSzL8YTTg==.#=zHht8sihEEDo7(
 at #=zKZIslgV_VTgI_laDaHeWplnCCTSzL8YTTg==..ctor(Stream #=ziA0t9_I=, String #=zuiJIvEo=, Boolean #=zBvJEbRdfIP3N
 at #=zKZIslgV_VTgI_laDaHeWplnCCTSzL8YTTg==..ctor(Stream #=ziA0t9_I=
 at #=zjroLc8wES_h1ilPWuP2WJb4VKmlo2M1kB_9Ae9Q=.#=zCVbFrz0=(Stream #=ziA0t9_I=
 at #=zzVjo1F9wNoksFXH0KKuyBSP8d$xY$dIsPQ==..ctor(Stream #=ziA0t9_I=
 at #=zjroLc8wES_h1ilPWuP2WJb4VKmlo2M1kB_9Ae9Q=.#=zG1sr3o8SaFaT(Stream #=ziA0t9_I=
 at #=zGaHf0$lTElKXJzbip2dw4yLBu4qT.#=zXO2JtCM=(Stream #=ziA0t9_I=
 at #=zGaHf0$lTElKXJzbip2dw4yLBu4qT..ctor(Stream #=ziA0t9_I=
 at Aspose.Pdf.Document.#=zQxeSgpE=(Stream #=z3Wdp9mg=, String #=zuiJIvEo=
 at Aspose.Pdf.Document..ctor(Stream input
 at DecisionsFramework.Design.Flow.CoreSteps.StandardSteps.DocumentManagementMethods.GetPdfDocFromFileData(FileData fileData
 at DecisionsFramework.Design.Flow.CoreSteps.StandardSteps.PdfManagementSteps.GetPdfDocFromFileData(FileData fileData
 at DecisionsFramework.Design.Flow.CoreSteps.StandardSteps.PdfManagementSteps.RecognizeTextFromPdf(FileData PdfDocument
 at InvokeStub_PdfManagementSteps.RecognizeTextFromPdf(Object, Object, IntPtr*
 at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
   --- End of inner exception stack trace --- at DecisionsFramework.Design.Flow.StepImplementations.InvokeMethodStep.Run(StepStartData data
 at DecisionsFramework.Design.Flow.FlowStep.RunStepInternal(String flowTrackingID, String stepTrackingID, KeyValuePairDataStructure[] stepRunDataValues, AbstractFlowTrackingData trackingData
 at DecisionsFramework.Design.Flow.FlowStep.Start(String flowTrackingID, String stepTrackingID, FlowStateData data, AbstractFlowTrackingData trackingData, RunningStepData currentStepData)
   --- End of inner exception stack trace ---

Was this article helpful?