- commons-logging-1.1.jar
- dom4j-1.6.1.jar
- geronimo-stax-api_1.0_spec-1.0.jar
- junit-3.8.1.jar
- log4j-1.2.13.jar
- poi-3.6-20091214.jar
- poi-contrib-3.6-20091214.jar
- poi-examples-3.6-20091214.jar
- poi-ooxml-3.6-20091214.jar
- poi-ooxml-schemas-3.6-20091214.jar
- poi-scratchpad-3.6-20091214.jar
- xmlbeans-2.3.0.jar
I don't remember which ones I used, but those are the ones which were in the library when it worked. According to the code below, you will want to put those in "lib/java_classes." However, you can put them wherever.
The code below outputs the Excel spreadsheet as an array:
Dir.new("lib/java_classes").sort.each do | file |
require("lib/java_classes/" + file) if file =~ /\.jar$/
end
include_class Java::org.apache.poi.poifs.filesystem.POIFSFileSystem
include_class Java::org.apache.poi.poifs.filesystem.POIFSFileSystem
include_class Java::java.io.ByteArrayOutputStream
include_class Java::java.util.Date
require 'yaml'
args = YAML::load(STDIN)
input = Java::JavaIo::FileInputStream.new(args["file"])
include_class Java::org.apache.poi.ss.usermodel.WorkbookFactory
include_class Java::org.apache.poi.ss.usermodel.Workbook
include_class Java::org.apache.poi.ss.usermodel.Sheet
include_class Java::org.apache.poi.ss.usermodel.Row
include_class Java::org.apache.poi.ss.usermodel.Cell
include_class Java::org.apache.poi.ss.usermodel.FormulaEvaluator
@spreadsheet = WorkbookFactory.create(input)
@formula_evaluator = @spreadsheet.get_creation_helper.create_formula_evaluator
def parse_value(cell)
return nil if cell.nil?
case cell.get_cell_type
when Cell.CELL_TYPE_NUMERIC then cell.get_numeric_cell_value
when Cell.CELL_TYPE_STRING then cell.get_string_cell_value
when Cell.CELL_TYPE_FORMULA then parse_formula(@formula_evaluator.evaluate(cell))
when Cell.CELL_TYPE_BOOLEAN then cell.get_boolean_cell_value
when Cell.CELL_TYPE_BLANK then nil
else
raise "Don't know what to do with cell type: #{cell.get_cell_type}"
end
end
def parse_formula(cell_value)
case cell_value.get_cell_type
when Cell.CELL_TYPE_NUMERIC then cell_value.get_number_value
when Cell.CELL_TYPE_STRING then cell_value.get_string_value
when Cell.CELL_TYPE_BOOLEAN then cell_value.get_boolean_value
when Cell.CELL_TYPE_BLANK then nil
else
raise "Don't know what to do with cell type: #{cell_value.get_cell_type}"
end
end
spreadsheet_as_array = ([email protected]_number_of_sheets - 1).map do | sheet_num |
sheet = @spreadsheet.sheet_at(sheet_num)
sheet_as_array = [sheet.sheet_name, []]
sheet.row_iterator.each do | row |
sheet_as_array[1] << []
row.cell_iterator.each do | cell |
sheet_as_array[1].last << parse_value(cell)
end
end
sheet_as_array
end
puts spreadsheet_as_array.inspect
There is a flaw to this code above: an Excel sheet needs to be "cleaned" prior to importing. Excel files typically get tons of extra rows and tons of extra columns associated with the Active Cells. People apply formatting to unused areas by selecting all, etc. Therefore, go in and delete unused cells, and do "Save As" to get Excel to forget formatting on those cells.
To run the code above, do the following:
echo 'file: my_spreadsheet.xls' | jruby xls_to_array.rb

No comments:
Post a Comment