redcarton.com

JDiff: Differentiation of Comma Separated Values...

Intro | How-To | Download | History

JDiff is a small Java program that takes two sets of comma separated values and returns several results, including:

  • All data items that are in both data sets.
  • All data items that are mutually exclusive (mutex) - where the values are either in one data set or the other, but not both.
  • All data items that are mutually inclusive (mutin) - where the values are in both data sets.
  • All data items that are only in the first data set.
  • All data items that are only in the second data set.
  • And duplicates detection (but not removal).

When returned, the data is sorted for easy comparison.

Note: JDiff is not like other diff programs that you may be familiar with, such as fc under DOS. JDiff was developped as a personal tool to solve a task that I had, and as such is not very customizable (unless you change the code).

How-To

Preparing the data

JDiff requires two data sets of comma separated values. You must create a text file (the extention is of no consequence) with the following format:

  • Each line must contain two values separated by a comma (,). The first value may of of any form (number, letter, etc.), so long as it is a string. The second value, however must be a number, but it need not have any specify decimal positions.
  • There can only be one data pair per line.
  • There may not be any trailing spaces at the end of a line.
  • There may not be any space between the values and the comma.
  • Enter a -1 on a line by itself at the end of both the first and second data set.

Here is an example of an acceptable file:

abc,1.0
def,2.0
ghi,3.0
-1
abc,1.0
ghi,3.0
jkl,4.0
def,3.0
-1

Running the program

JDiff is run through the command line with the command:

java JDiff

JDiff will then ask you for your the input file, desired output file and desired options. The output will then be stored in the specified output file for easy access. (If the output file already exists, a message will be displayed on the screen and the file will then be overriden.) You can also use the JAR file if you only want to keep track of one file:

java -jar JDiff.jar

Similarly, you can use the provided JDiff.bat batch file to run JDiff. The batch file requires the use of the JAR file. Both files must be in the same directory. You can run it like so:

JDiff

Download

JDiff can be downloaded here (28.7 KB)

The archives contain the following directories:

  • bytecode: Contains the class files needed to run JDiff
    • JDiff.class
    • DataItem.class
    • JDiff.jar
    • JDiff.bat (Assumes java command is in your path, may require editing. Requires accompaning JAR file.)
  • documentation: Contains this page, as well as a javadoc subdirectory
  • src: Contains the source java files
    • JDiff.java
    • DataItem.java

History

Version 1.1 (May 19th 2002):

  • Added interactive support. You can now specify the options you want.
  • Removed use of redirection, you can now specify the source file interactively.
  • Removed support for manually entering your data (it was tedious and useless).
  • Added support for saving results to specified file. (If the specified file exists, it will be overriden.)
  • Added a batch file to the executables. It provides for easier execution of JDiff with the JAR file.
  • Added duplicates detection. If a data set contains duplicates, JDiff will not remove them for its results, so you can use this option to determine if any exist.

Version 1.0 (May 18th 2002): First version.