- commons-logging-1.1.jar
- dom4j-1.6.1.jar
- geronimo-stax-api_1.0_spec-1.0.jar
- junit-3.8.1.jar
- log4j-1.2.13.jar
- poi-3.6-20091214.jar
- poi-contrib-3.6-20091214.jar
- poi-examples-3.6-20091214.jar
- poi-ooxml-3.6-20091214.jar
- poi-ooxml-schemas-3.6-20091214.jar
- poi-scratchpad-3.6-20091214.jar
- xmlbeans-2.3.0.jar
I don't remember which ones I used, but those are the ones which were in the library when it worked. According to the code below, you will want to put those in "lib/java_classes." However, you can put them wherever.
The code below outputs the Excel spreadsheet as an array:
Dir.new("lib/java_classes").sort.each do | file | require("lib/java_classes/" + file) if file =~ /\.jar$/ end include_class Java::org.apache.poi.poifs.filesystem.POIFSFileSystem include_class Java::org.apache.poi.poifs.filesystem.POIFSFileSystem include_class Java::java.io.ByteArrayOutputStream include_class Java::java.util.Date require 'yaml' args = YAML::load(STDIN) input = Java::JavaIo::FileInputStream.new(args["file"]) include_class Java::org.apache.poi.ss.usermodel.WorkbookFactory include_class Java::org.apache.poi.ss.usermodel.Workbook include_class Java::org.apache.poi.ss.usermodel.Sheet include_class Java::org.apache.poi.ss.usermodel.Row include_class Java::org.apache.poi.ss.usermodel.Cell include_class Java::org.apache.poi.ss.usermodel.FormulaEvaluator @spreadsheet = WorkbookFactory.create(input) @formula_evaluator = @spreadsheet.get_creation_helper.create_formula_evaluator def parse_value(cell) return nil if cell.nil? case cell.get_cell_type when Cell.CELL_TYPE_NUMERIC then cell.get_numeric_cell_value when Cell.CELL_TYPE_STRING then cell.get_string_cell_value when Cell.CELL_TYPE_FORMULA then parse_formula(@formula_evaluator.evaluate(cell)) when Cell.CELL_TYPE_BOOLEAN then cell.get_boolean_cell_value when Cell.CELL_TYPE_BLANK then nil else raise "Don't know what to do with cell type: #{cell.get_cell_type}" end end def parse_formula(cell_value) case cell_value.get_cell_type when Cell.CELL_TYPE_NUMERIC then cell_value.get_number_value when Cell.CELL_TYPE_STRING then cell_value.get_string_value when Cell.CELL_TYPE_BOOLEAN then cell_value.get_boolean_value when Cell.CELL_TYPE_BLANK then nil else raise "Don't know what to do with cell type: #{cell_value.get_cell_type}" end end spreadsheet_as_array = ([email protected]_number_of_sheets - 1).map do | sheet_num | sheet = @spreadsheet.sheet_at(sheet_num) sheet_as_array = [sheet.sheet_name, []] sheet.row_iterator.each do | row | sheet_as_array[1] << [] row.cell_iterator.each do | cell | sheet_as_array[1].last << parse_value(cell) end end sheet_as_array end puts spreadsheet_as_array.inspect
There is a flaw to this code above: an Excel sheet needs to be "cleaned" prior to importing. Excel files typically get tons of extra rows and tons of extra columns associated with the Active Cells. People apply formatting to unused areas by selecting all, etc. Therefore, go in and delete unused cells, and do "Save As" to get Excel to forget formatting on those cells.
To run the code above, do the following:
echo 'file: my_spreadsheet.xls' | jruby xls_to_array.rb
No comments:
Post a Comment