ExtendTable

java.lang.Object
- Controller
- - controllers.ExtendTable

```
public class ExtendTable
extends Controller
```
The ExtendTable-class is the main controller of the DS4DM Backend. All the API functions that are specified in ds4dm_webservice/conf/routes directly lead to the execution of one of the functions here. Below the functions are described in more detail, but for a high-level overview... Data Search Functions:
- search() - The old keyword-based search this doesn't return as good results as the extendedSearch, has however a faster execution time
- extendedSearch(String repositoryName) - Also called 'correspondence-based Search'. Just like 'search()' it returnes the tablenames and correspondences necessary for extending a table with one additional column
- extendedSearch_T2DGoldstandard() - For the case that no repositoryName is specified for the extendedSearch, one of these three functions can be called instead. They use a pre-defined repository.
- extendedSearch_Produktdata()
- extendedSearch_WebWikiTables()
- unconstrainedSearch(String repositoryName) - instead of providing the data necessary for extending a table with exactly one additional column, as is done by search() and extendedSearch(), the unconstrainedSearch extends the queryTable with as many columns as possible.
- correlationBasedSearch(String repositoryName) - The correlationBasedSearch works the same way as the unconstrainedSearch, just that it only extends the QueryTable with columns that correlate with the 'correlationAttribute'
Fetch Table Functions: - the functions search() and extendedSearch() don't return new data, only the correspondences necessary for fusing the new data. In order to get the new data, the fetchTable-function has to be called.
- fetchTable(String name, String repositoryName) - returns the data and metadata of the specified table
- fetchTable_T2DGoldstandard(String name) - in the case that no repository name is specified, the following three functions could be used. They work with pre-defined repositories.
- fetchTable_Produktdata(String name)
- fetchTable_WebWikiTables(String name)
Repository Maintenance Functions:
- createRepository(String repositoryName) - create a new empty repository with the specified name
- getRepositoryNames() - returns the names of all existing repositories
- getRepositoryStatistics(String repositoryName) - returns information about the specified repository, such as numberOfTablesInRepository, created_timestamp and creator_ip
- deleteRepository(String repositoryName)
- uploadTable(String repositoryName) - this uploads a single Table to a repository
Bulk upload functions: - for uploading many tables to a repository, the bulk upload has more than an order of magnitude better performance (see DS4DM Backend website)
- moderateBulkUploadTables(final String repositoryName) - this functions makes sure that the bulkUpload is done in parallel and returns a status {"status": "ACCEPTED"} message
- bulkUploadTables(String repositoryName, String uploadId, long startTime) - the actual bulkUpload
- getUploadStatus(String repositoryName, String uploadID) - this returns "PROCESSING", "UPLOAD SUCCESSFUL", or "UPLOAD UNSUCCESSFUL" depending on the current status of the specified BulkUpload
Author:

Benedikt Kleppmann (benedikt@informatik.uni-mannheim.de)

Constructor Summary

Constructors
Constructor and Description

ExtendTable()

Constructors
Constructor and Description
`ExtendTable()`

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`bulkUploadTables(java.lang.String repositoryName, java.lang.String uploadId, long startTime, java.lang.String requestBody)` bulkUploadTables For uploading many tables to a repository, the bulk upload has more than an order of magnitude better performance - see evaluation.
`Result`	`correlationBasedSearch(java.lang.String repositoryName)` The `unconstrainedSearch(String repositoryName)` extends the query table with as many columns as possible.
`Result`	`createRepository(java.lang.String repositoryName)` Create a new empty repository with the specified name.
`Result`	`deleteRepository(java.lang.String repositoryName)` Deletes the specified repository.
`Result`	`extendedSearch_Produktdata()` If the DS4DM Frontend does not specify the repositoryName, then this function can be used instead.
`Result`	`extendedSearch_T2DGoldstandard()` If the DS4DM Frontend does not specify the repositoryName, then this function can be used instead.
`Result`	`extendedSearch_WebWikiTables()` If the DS4DM Frontend does not specify the repositoryName, then this function can be used instead.
`Result`	`extendedSearch(java.lang.String repositoryName)` This is an old version of the extendedSearch.
`Result`	`fetchTable_Produktdata(java.lang.String name)` If the DS4DM Frontend does not specify the repositoryName, then this function can be used instead of `fetchTable(String, String)` It calls `fetchTable(String, String)` with the repositoryName set to "Produktdata".
`Result`	`fetchTable_T2DGoldstandard(java.lang.String name)` If the DS4DM Frontend does not specify the repositoryName, then this function can be used instead of `fetchTable(String, String)` It calls `fetchTable(String, String)` with the repositoryName set to "T2D_Goldstandard".
`Result`	`fetchTable_WebWikiTables(java.lang.String name)` If the DS4DM Frontend does not specify the repositoryName, then this function can be used instead of `fetchTable(String, String)` It calls `fetchTable(String, String)` with the repositoryName set to "WebWikiTables".
`Result`	`fetchTable(java.lang.String name, java.lang.String repositoryName)` The `ExtenededSearch.extendedSearch(String)`-methods searches for tables in the repository that contain useful data for extending a table with an additional column.
`Result`	`fetchTablePOST(java.lang.String repositoryName)` The fetchTablePOST function is rarely used in practice.
`Result`	`generateCorrespondences_withKnownBlocking(java.lang.String repositoryName, java.lang.String blockingsFileName)` This function is not part of the standard Backend API.
`Result`	`generateCorrespondences(java.lang.String repositoryName)` This function is not part of the standard Backend API.
`Result`	`getRepositoryNames()` Returns the names of all existing repositories.
`Result`	`getRepositoryStatistics(java.lang.String repositoryName)` Returns information about the specified repository, such as numberOfTablesInRepository, created_timestamp and creator_ip.
`Result`	`getUploadStatus(java.lang.String repositoryName, java.lang.String uploadID)` This function returns the status of the `moderateBulkUploadTables(String)`-job with the specified uploadID.
`Result`	`ind()`
`Result`	`moderateBulkUploadTables(java.lang.String repositoryName)` This method manages the `controllers.ExtendTable#bulkUploadTables(String, String, long)` on an high level.
`Result`	`PreCalculatedSearch()`
`Result`	`search()` ------------------------------------------------------------------------ search() ------------------------------------------------------------------------ The matching is performed by the SearJoin Service, via a POST request to http://ds4dm.informatik.uni-mannheim.de/search where the input is the table specified above.
`Result`	`suggestAttributes()`
`Result`	`unconstrainedSearch(java.lang.String repositoryName)` `search()` and `ExtenededSearch.extendedSearch(String repositoryName)` provide the information necessary for extending a table with one additional column.
`Result`	`uploadTable(java.lang.String repositoryName)` This uploads a single Table to a specified repository.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - ExtendTable
```
public ExtendTable()
```
- Method Detail
  - search
```
public Result search()
```
    ------------------------------------------------------------------------ search() ------------------------------------------------------------------------ The matching is performed by the SearJoin Service, via a POST request to http://ds4dm.informatik.uni-mannheim.de/search where the input is the table specified above. The SearchJoin service returns a composite response, containing: A mapping object, which specifies: the "targetSchema", constructed using the header of the query input table plus the extension attribute(s) the "dataTypes" of the target schema the mapping between the initial query table schema and the target schema ("queryTable2targetSchema") the mapping between the extension attributes in the query table and the target schema (“extensionAttributes2targetSchema") and an array of "relatedTables". Each related table has: a name ("tableName") which is a unique identifier for the table within the SearchJoin Index the correspondence between instances in the table and those from the query table ("instancesCorrespondences2queryTable"); these are given as the - the correspondence between the schema of the current table and the target schema ("tableSchema2TargetSchema"); these are given as the -
    
    Returns:
  - PreCalculatedSearch
```
public Result PreCalculatedSearch()
```
  - extendedSearch
```
public Result extendedSearch(java.lang.String repositoryName)
```
    This is an old version of the extendedSearch.
    In practice the refactored version is used: ExtenededSearch.extendedSearch(String repositoryName)
  - extendedSearch_T2DGoldstandard
```
public Result extendedSearch_T2DGoldstandard()
```
    If the DS4DM Frontend does not specify the repositoryName, then this function can be used instead. It calls ExtenededSearch.extendedSearch(String repositoryName) with the repositoryName set to "T2D_Goldstandard".
  - extendedSearch_Produktdata
```
public Result extendedSearch_Produktdata()
```
    If the DS4DM Frontend does not specify the repositoryName, then this function can be used instead. It calls ExtenededSearch.extendedSearch(String repositoryName) with the repositoryName set to "ProductDataRepository_withSubjectcolumns".
  - extendedSearch_WebWikiTables
```
public Result extendedSearch_WebWikiTables()
```
    If the DS4DM Frontend does not specify the repositoryName, then this function can be used instead. It calls ExtenededSearch.extendedSearch(String repositoryName) with the repositoryName set to "WebWikiTables".
  - unconstrainedSearch
```
public Result unconstrainedSearch(java.lang.String repositoryName)
```
    search() and ExtenededSearch.extendedSearch(String repositoryName) provide the information necessary for extending a table with one additional column.
    The unconstrainedSearch on the other hand has the following two differences:
    - it returns the a fully fused table (instead of providing the data for the fusion to happen in the frontend)
    - it extends the query table with many additional columns columns. Specifically, all columns that it manages to populate to a certain threshold density (specified by the parameter "mimimumDensity" in the http-post-body)
      
      Steps of execution:
      
      extract the query table from the http-request-body
      extend the query table with UnconstrainedSearch.getFusedTable(model.QueryTable queryTableObject, String repositoryName)
      save the returned data as a relation (=list of columns)
      save the relation table in a Table-Object, convert this to a json string and return it
    Parameters:
    repositoryName - the name of the repository used for the unconstrainedSearch
    request().body().asJson() - the Json in the body of the http-post-request is also used as parameter. There is more info on it here.
    
    Returns:
    a Json String containing the extended table
  - correlationBasedSearch
```
public Result correlationBasedSearch(java.lang.String repositoryName)
                              throws java.io.IOException,
                                     java.lang.InterruptedException
```
    The unconstrainedSearch(String repositoryName) extends the query table with as many columns as possible. When this is done with a big repository, this can lead to significantly more than 100 columns being added to the query table. This can be overwhelming for the user. This is why the correlationBasedSearch is useful, it extends the query table only with columns that correlate with the "correlation attribute" (specified by the parameter "correlationAttribute" in the http-request-body).
    Steps of execution:
    1. extract the query table from the http-request-body
    2. pivot the query table - to be a list of rows instead of a list of columns. Save it in temp_table1.csv
    3. extend the query table with as many columns as possible with UnconstrainedSearch.getFusedTable(model.QueryTable, String)
    4. save the extended table to temp_table2.csv
    5. execute correlation_based_filtering in a new process
    6. read the result (temp_table3.csv), pivot it to be a list of columns and return it as json string
    Parameters:
    repositoryName - the name of the repository used for the unconstrainedSearch
    request().body().asJson() - the Json in the body of the http-post-request is also used as parameter. There is more info on it here.
    
    Returns:
    a Json String containing the extended table
    
    Throws:
    
    java.io.IOException
    
    java.lang.InterruptedException
  - getRepositoryNames
```
public Result getRepositoryNames()
```
    Returns the names of all existing repositories.
    Steps of execution:
    1. get list of sub-folders in the folder "public/repositories/"
    2. concatenate the names of the sub-folders to a list
    3. return the list as json-string
    Returns:
    a Json String the list of repository names
  - getRepositoryStatistics
```
public Result getRepositoryStatistics(java.lang.String repositoryName)
```
    Returns information about the specified repository, such as numberOfTablesInRepository, created_timestamp and creator_ip.
    Steps of execution:
    1. see how many tables are in the folder "public/repositories/"+ repositoryName + "/tables/". This is the numberOfTablesInRepository.
    2. read repositoryStatistics.json as JSONObject (this contains created_timestamp and creator_ip) and add the numberOfTablesInRepository.
    3. return the JSONObject as json-string
    Parameters:
    repositoryName - the name of the repository for which the information should be returned
    
    Returns:
    a Json String with the numberOfTablesInRepository, created_timestamp and creator_ip for the specified repository.
  - deleteRepository
```
public Result deleteRepository(java.lang.String repositoryName)
```
    Deletes the specified repository.
    Steps of execution:
    1. retrieve the ip of the repository creator from repositoryStatistics.json
    2. check if the ip of the computer that sent the http-request matches that of the repository creator
    3. IF the ips match, or the correct password was submitted in the http-request-body, THEN delete the folder of the specified repository (with its subfolders).
    Parameters:
    repositoryName - the name of the repository being deleted
    request().body().asJson() - (optional) admin-password in the body of the http-post-request.
    
    Returns:
    the message: "The repository " + repositoryName + " was deleted."
  - createRepository
```
public Result createRepository(java.lang.String repositoryName)
```
    Create a new empty repository with the specified name.
    Steps of execution:
    1. if no repositoryName was chosen, then "DefaultRepository" is chosen as name.
    2. if there is no repository-folder yet with the specified repositoryName, then create the repository-folder along with all its subfolders.
    3. create the repositoryStatistics.json-file. This made from a JSONObject containing the repository name, the current timestamp, and the ip-adress from the request-sender.
    4. return "Successfully created repository " + repositoryName
    Parameters:
    repositoryName - the name of the repository being created
    
    Returns:
    the message: "Successfully created repository " + repositoryName
  - uploadTable
```
public Result uploadTable(java.lang.String repositoryName)
```
    This uploads a single Table to a specified repository. The table-data is passed as json in the body of the post-request. More information on the format of the json is here.
    Steps of execution:
    1. check if the specified repository exists. If not: return "The repository " + repositoryName + " doesn't exist"
    2. convert the table in body of the post-request to a Table-Object. Save this table (as csv-file) in the "public/repositories/" + repositoryName + "/tables/" folder.
    3. open/create the three indexes: KeyColumnIndex, ColumnNameIndex and TableIndex.
    4. save the table in the indexes by executing TableIndexer.writeTableToIndexes(File)
    5. retrieve the keyColumn of the submitted table from the KeyColumnIndex
    6. get Table-object for the submitted table by reading it from where we saved it (as csv-file) in step 2.
    7. format the keyColumn (from step 5) and use it to search in the KeyColumnIndex for tables with similar key columns.
    8. for each table found this way:
      
      get the instance-correspondences to the submitted table and save them to in the correspondence-files by executing FindCorrespondences.getInstanceMatches(DataSet, DataSet, File, File, String)
      get the schema-correspondences to the submitted table and save them to in the correspondence-files by executing FindCorrespondences.getDuplicateBasedSchemaMatches(DataSet, DataSet, File, File, Processable, String)
    Parameters:
    repositoryName - the name of the repository to which the table should be uploaded to
    request().body().asJson() - The request body should contain the table data of the table that is to be uploaded.
    
    Returns:
    the message: "Table " + tableName + " successfully uploaded to " + repositoryName
  - suggestAttributes
```
public Result suggestAttributes()
```
  - moderateBulkUploadTables
```
public Result moderateBulkUploadTables(java.lang.String repositoryName)
                                throws java.io.IOException
```
    This method manages the controllers.ExtendTable#bulkUploadTables(String, String, long) on an high level. Amongst others, it ensures that the bulkUpload is executed as a separate process that doesn't block the webservice.
    Steps of execution:
    1. check if the specified repository exists. If not: return "The repository " + repositoryName + " doesn't exist".
    2. Create an uploadId by hashing the json in the body of the post-request
    3. run controllers.ExtendTable#bulkUploadTables(String, String, long) in a separate process
    4. return the json {"status": "ACCEPTED", "message": }
    Parameters:
    repositoryName - the name of the repository to which the tables should be uploaded to
    request().body().asJson() - The request body should contain a list of tables to be uploaded. The format of this json-string is specified here
    
    Returns:
    the json: {"status": "ACCEPTED", "message": }
    
    Throws:
    
    java.io.IOException
  - bulkUploadTables
```
public void bulkUploadTables(java.lang.String repositoryName,
                    java.lang.String uploadId,
                    long startTime,
                    java.lang.String requestBody)
                      throws java.io.IOException
```
    bulkUploadTables
    For uploading many tables to a repository, the bulk upload has more than an order of magnitude better performance - see evaluation. This method is not directly called by the Webservice API, it gets called by moderateBulkUploadTables(String).
    Steps of execution:
    1. convert request body to ListOfTables-object
    2. save the tables in ListOfTables as csv files
    3. save the tables in the indexes by running de.uni_mannheim.informatik.dws.ds4dm.CreateLuceneIndex.Main#main(String[])
    4. find the correspondences between the tables and the already indexed tables by running Main.main(String[])
    5. return message: "UPLOAD SUCCESSFUL\n"
    Parameters:
    repositoryName - the name of the repository to which the tables are being uploaded
    uploadId - The uploadId is a code that can be used by getUploadStatus(String, String) to check the status of this bulkUpload while (and after) it's being executed.
    startTime - the time when the execution of moderateBulkUploadTables(String) began. This is only for evaluation purposes.
    requestBody - The requestBody should contain a list of tables to be uploaded. The format of this json-string is specified here
    
    Throws:
    
    java.io.IOException
  - getUploadStatus
```
public Result getUploadStatus(java.lang.String repositoryName,
                     java.lang.String uploadID)
```
    This function returns the status of the moderateBulkUploadTables(String)-job with the specified uploadID. When a moderateBulkUploadTables(String)-job is started, it returns its uploadID in the return-message. During the acctual bulkupload (which is running in a seperate process), the status of the bulkUpload is written to a file. The status may be "PROCESSING", "UPLOAD SUCCESSFUL\n" or "UPLOAD UNSUCCESSFUL\n". The getUploadStatus reads the status from the file and returns it to the user
    
    Parameters:
    repositoryName - the name of the repository to which the tables are being uploaded
    uploadId - The uploadId is the unique identifier of a uploadProcess. It is returned by moderateBulkUploadTables(String)
    
    Returns:
    "UPLOAD SUCCESSFUL\n"
  - generateCorrespondences
```
public Result generateCorrespondences(java.lang.String repositoryName)
                               throws java.io.IOException
```
    This function is not part of the standard Backend API. It is only used on the rare occasion, that for a bulk-upload the Index-creation has already been done, but no correspondences have been created yet.
    
    Throws:
    
    java.io.IOException
  - generateCorrespondences_withKnownBlocking
```
public Result generateCorrespondences_withKnownBlocking(java.lang.String repositoryName,
                                               java.lang.String blockingsFileName)
                                                 throws java.io.IOException
```
    This function is not part of the standard Backend API. It is only used on the rare occasion, that for a bulk-upload the Index-creation and the blocking has already been done, but no correspondences have been created yet.
    
    Throws:
    
    java.io.IOException
  - fetchTablePOST
```
public Result fetchTablePOST(java.lang.String repositoryName)
```
    The fetchTablePOST function is rarely used in practice. The DS4DM-Frontend uses the fetchTable(String, String)-GET-method instead. The GET- and the POST- method work in exactly the same way. More info here: fetchTable(String, String)
  - fetchTable_T2DGoldstandard
```
public Result fetchTable_T2DGoldstandard(java.lang.String name)
```
    If the DS4DM Frontend does not specify the repositoryName, then this function can be used instead of fetchTable(String, String) It calls fetchTable(String, String) with the repositoryName set to "T2D_Goldstandard".
  - fetchTable_Produktdata
```
public Result fetchTable_Produktdata(java.lang.String name)
```
    If the DS4DM Frontend does not specify the repositoryName, then this function can be used instead of fetchTable(String, String) It calls fetchTable(String, String) with the repositoryName set to "Produktdata".
  - fetchTable_WebWikiTables
```
public Result fetchTable_WebWikiTables(java.lang.String name)
```
    If the DS4DM Frontend does not specify the repositoryName, then this function can be used instead of fetchTable(String, String) It calls fetchTable(String, String) with the repositoryName set to "WebWikiTables".
  - fetchTable
```
public Result fetchTable(java.lang.String name,
                java.lang.String repositoryName)
```
    The ExtenededSearch.extendedSearch(String)-methods searches for tables in the repository that contain useful data for extending a table with an additional column. It returns the name of the found tables as well as it's correspondences to the query table (which are necessary for constructing the additional column). It however does not return the actual data from the found tables, as the http-response-messages would be too big.
    The actual data from the found tables have to be requested seperately - with the fetchTable-method. This is done as follows: at the end of the ExtenededSearch.extendedSearch(String)-method, the found tables are saved to json-files (using ExtenededSearch.saveTableDataForFetching(model.ExtendedTableInformation, extendedSearch2.GlobalVariables)). The fetchTable-method just opens the requested json-file and returns its content.
  - ind
```
public Result ind()
```

Class ExtendTable

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

ExtendTable

Method Detail

search

PreCalculatedSearch

extendedSearch

extendedSearch_T2DGoldstandard

extendedSearch_Produktdata

extendedSearch_WebWikiTables

unconstrainedSearch

correlationBasedSearch

getRepositoryNames

getRepositoryStatistics

deleteRepository

createRepository

uploadTable

suggestAttributes

moderateBulkUploadTables

bulkUploadTables

getUploadStatus

generateCorrespondences

generateCorrespondences_withKnownBlocking

fetchTablePOST

fetchTable_T2DGoldstandard

fetchTable_Produktdata

fetchTable_WebWikiTables

fetchTable

ind