Tuesday, November 16, 2010

Automating Microsoft Word with IronRuby

Below some snippets with details of a project I worked on for documentation creation workflow.   A core focus of the project was reduce the number of steps needed by a person to automate a document creation and distribution process.

At the basis of the organizational knowledge is Microsoft Word and Excel, therefore, we stuck with Microsoft Word and Excel.  Given I introduced more base technology, I would have to support more base technology.  Therefore, I tried to minimize change and maximize return-on-technology.

Each document was a mail merge file in Microsoft Word.  Each mail merged document had multiple receipts, and the routing and packaging of documents could be either physical or E-mail.  The physical distribution had a variable number of copies based on receipts, and the E-mail distribution had a single copy with a variable number of recipients.

Now for the code:

Below loads the information into IronRuby:
require 'Microsoft.Office.Interop.Word, Version=12.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c'

Below I abstracted the Microsoft Word interface:
class Word
  attr_accessor :word, :documents

  def self.connect
    word = Word.new
    word.word = System::Runtime::InteropServices::Marshal.get_active_object("Word.Application")
    word.documents = (1..word.word.Documents.Count).inject([]) do | documents, num |
    documents && Document.new(:document => word.word.Documents[num])
    end
    word
  end

  def self.open
    word = Word.new
    word.word = Microsoft::Office::Interop::Word::ApplicationClass.new
    word.visible
    word.documents = []
    word
  end

  def visible
    word.visible = true
  end

  def close(args = {})
    args[:force] ||= false
    word.Quit(args[:force])
  end

  def printers
  System::Drawing::Printing::PrinterSettings.InstalledPrinters.map { | p | p }
  end

  def active_printer
  word.ActivePrinter
  end

  def active_printer=(arg)
  word.ActivePrinter = arg
  end

  def disable_alerts
  word.DisplayAlerts = Microsoft::Office::Interop::Word::WdAlertLevel::wdAlertsNone
  end

  def enable_alerts
  word.DisplayAlerts = Microsoft::Office::Interop::Word::WdAlertLevel::wdAlertsAll
  end
  
end

Below is the Document object. While working, I originally started with Document controlling the Word and Document objects, but eventually realized I needed to split the two:

class Document
  attr_accessor :document

  def initialize(args = {})
   self.document = args[:document]
  end

  def name
    document.Name
  end

  def ToString
    name
  end

  def self.find(*args)
    @@word ||= Word.connect

    case
    when args[0].is_a?(Fixnum) then
      @@word.documents[args[0]] || raise(WordErrors::DocumentNotFound, args.inspect)
    when args[0].is_a?(String) then
      @@word.documents.find { | document | document.name == args[0] } || raise(WordErrors::DocumentNotFound, args.inspect)
    when args[0] == :all then
      @@word.documents
    else
      raise(WordErrors::DocumentNotFound, args.inspect)
    end
  end

  def self.open(file_path = nil)
    @@word ||= Word.open

    document = Document.new(
      :document => !file_path.nil? ?
      @@word.word.Documents.Open(System::String.new(file_path)) : @@word.word.Documents.Add
    )

    @@word.documents << document
    document
  end

  def mailmerge?
    document.MailMerge.State != 0
  end

  def close(args)
    save_changes = args[:force] ? false : true
    @@word.documents.delete(self)
    document.Close(save_changes)

    if @@word.documents.length == 0
      @@word.close(:force => true)
      @@word = nil
    end
  end

  #
  # record => Fixnum, :first, :last, :previous, :next, [Fixnums]
  #
  def goto(record)
    return true if record == :current || current_record == record

    case
    when record.is_a?(Fixnum)
      document.MailMerge.DataSource.ActiveRecord = record
    when record == :last
      document.MailMerge.DataSource.ActiveRecord = record_count
    when record == :first
      document.MailMerge.DataSource.ActiveRecord = 1
    when record == :next
      document.MailMerge.DataSource.ActiveRecord = current_record + 1
    when record == :previous
      document.MailMerge.DataSource.ActiveRecord = current_record - 1
    else
      raise(WordErrors::RecordNotFound, "Cannot use '#{record}' to find a record")
    end
  rescue
    raise WordErrors::NotMailMerge unless mailmerge?
  end

  #
  # Expects the following variables to be passed:
  #
  # action => [:print, :save]
  #
  # args =>
  #
  #   :records           => [:all, :current, :next, :previous, :first, :last]
  #   :path              => path to the directory which will save the files
  #   :directory_formula => The formula for the directory based on the
  #                         Investors datasource values '#{Inv} - #{Investor}'
  #   :name              => The file name you wish to give the document
  #
  #
  def export(action, args)
    set_action(action)

    if args[:records] == :all
      1.upto(record_count) do | record |
        export(action, args.merge(:records => record))
        @@word.word.ActiveDocument.Close(0)
      end
    else # For all other possibilities
      goto(args[:records])
      export_current
      save(args) if action == :save
      Document.new(:document => @@word.word.ActiveDocument)
    end
  end

  def save(args)
    location = create_directory(args[:path], args[:directory_formula])
    
    url = System::String.new(File.windows_join(location, args[:name]))

    puts "Saving: #{url}"

    case
    when args[:name].downcase =~ /\.pdf$/ then save_as_pdf(url)
    when args[:name].downcase =~ /\.doc$/ then save_as_doc(url)
    else
      raise "Don't know how to save: #{location}"
    end
  rescue Exception => e
    raise WordErrors::CouldNotSave, "Error saving to #{location}\\#{args[:name]}", [e.to_s] + e.backtrace
  end

  #
  # Holder for an array of all the data fields in the DataSource
  #
  def data_fields
    @data_fields ||= return_data_fields
  end

  #
  # Returns the number of records in a DataSource
  #
  def record_count
    @document.MailMerge.DataSource.RecordCount
  end

  #
  # Returns the Row Value for a Particular Column of the
  # DataSource
  def field_value(index)
    index = data_fields.index(index) + 1 if index.is_a?(String)
    @document.MailMerge.DataSource.DataFields(index).Value
  rescue
    raise WordErrors::NotMailMerge unless mailmerge?
  end


  #
  # Pass in a formula for fields and it finds and replaces the fields
  #
  def replace_fields_with_values(formula)
    while formula =~ /\#\{([^\}]+)\}/ do
      replaced_string = $&
      escaped_string  = field_value($1)
      formula.gsub!(replaced_string, escaped_string)
    end
    formula
  end

  def save_as_pdf(url)
    @@word.word.ActiveDocument.ExportAsFixedFormat(url, Microsoft::Office::Interop::Word::WdExportFormat::wdExportFormatPDF)
  end

  def print(args = {})
  copies = args[:copies] ? args[:copies].to_i : 1
  document.PrintOut(false)
  end

  private
  #
  # Creates the Directory to the current record based on the
  # path created by the directory formula
  #
  def create_directory(path, dir_formula)
    dir_name = directory_name_from_formula(dir_formula)
    dir_path = File.windows_join(path, dir_name)

    FileUtils.mkdir_p(dir_path) unless File.exists?(dir_path)

    dir_path
  end

  #
  # actions = [:save, :print]
  #
  def set_action(action)
    document.MailMerge.Destination = case

    when action == :print then 1
    when action == :save then 0
    else
      raise WordErrors::InvalidAction, action
    end
  end

  #
  # Performs the current action as setup by other variables.
  # Including set_action.  This action will only be used with
  # the :save action, since print automatically prints records
  #
  def export_current
    document.MailMerge.DataSource.FirstRecord = current_record
    document.MailMerge.DataSource.LastRecord = current_record
    document.MailMerge.Execute(false) # false is whether to "Pause"
  end

  #
  # Generates a name for the directory based on the values from
  # the record and a formula given by the user
  #
  def directory_name_from_formula(dir_formula)
    dir_name = dir_formula.clone
    replace_fields_with_values(dir_name).gsub(/[^A-Za-z0-9\ \-\+\.\,]+/, "-")[0..64].strip
  end

  #
  # Returns an array of all the data fields in the DataSource
  def return_data_fields
    @data_fields = []
    1.upto(@document.MailMerge.DataSource.DataFields.Count) do | i |
      @data_fields << @document.MailMerge.DataSource.DataFields.Item(i).Name.strip
    end
    @data_fields
  end

  #
  # Returns the numeric value of the ActiveRecord
  #
  def current_record
    document.MailMerge.DataSource.ActiveRecord
  end

  def save_as_doc(url)
    @@word.word.ActiveDocument.SaveAs(url)
  end

end

Working to scrub the data of any domain specific content, then I'll post the full code. If you have any questions, please E-mail me, and I'll answer them.

Chris

1 comment:

Kelley said...

Sometimes when you talk to me about computers, I feel like you are speaking int these codes.....maybe I should learn tis language so I can write your chores in it.
Love you, k