Skip to content

Handling of multi-line text in extracts and CSV file needs cleanup #6

@kevinburleigh75

Description

@kevinburleigh75

In a previous commit, processFile.rb was peppered with #gsub calls to convert newlines in extracted text to spaces. The possibility of newlines in the extracts has two implications:

  • regexps processing the text need to be aware that multi-line patterns might be needed
  • the output comma-separated value (CSV) file is ill-formed for certain uses

An example of a multi-line extract is the xref field of the second grant extracted from ipg140107 (see lines 2509-2530 of ipg140107.extract), though others are likely to exist.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions