Discrepancy Finder Tool

Suppose you have several characteristics (e.g. age, height, weight) and you wish to split a set of people into two groups, Group A and Group B. This website aims to find such a split so that for each characteristic, the sum of the values of the characteristic in Group A is as close as possible to the corresponding sum in Group B. Given a spreadsheet where each row corresponds to a person and columns correspond to characteristics, this website will find a good split. To get started, please follow the steps below:

  1. Click "Choose File" to upload your CSV or Excel file. Ensure that the first row in your CSV or Excel file contains the column names, and that your desired characteristics are represented by the columns.
  2. After uploading the file, select the desired runtime from the dropdown menu. This will approximately be how long the discrepancy algorithm will run for in minutes.
  3. Decide whether you want to normalize each of the columns. When normalization is applied, the values in each column will be rescaled to have minimum value -1 and maximum value 1, which treats each of the chosen characteristics with more equal weight.
  4. In the first column selection dropdown, choose the columns that you wish to be considered as characteristics in the discrepancy calculation.
  5. In the second column selection dropdown, identify the columns that correspond to categorical variables. Columns that do not consist entirely of numerical values will be checked by default and cannot be unchecked.
  6. Click 'Submit' to process your file. Once processing is complete, you will be redirected to a download page where you can download the processed CSV file. The outputted split will be displayed under the new column name "Group". A plot showing how splits with better discrepancies are found over time is also displayed to allow the user to gauge whether to run for more or less time.