New bug in elasticfacade

taher · December 22, 2022, 9:46am

So I tracked it down to exactly this commit after which everything continues to work but the results from queries now return duplicates, with zero changes to code.

before commit results

[documentListCount:2,
 documentListPageIndex:0,
 documentListPageSize:20,
 documentListPageMaxIndex:0,
 documentListPageRangeLow:1,
 documentListPageRangeHigh:2]

after commit results

[documentListCount:12,
 documentListPageIndex:0,
 documentListPageSize:20,
 documentListPageMaxIndex:0,
 documentListPageRangeLow:1,
 documentListPageRangeHigh:12]

The only difference is that duplicates are being returned. Not sure why but I’m sure that the above commit is the reason for change in behavior

taher · December 22, 2022, 10:13am

I should add that we’re using some custom logic to create the query map. Perhaps the structure changed from ElasticFacade … I’ll investigate this some more

taher · December 22, 2022, 10:28am

We tested again and I confirm it also is broken on the stock search#DataDocuments service with similar issues

jonesde · December 22, 2022, 11:34pm

Thanks for reporting this. It turned out to be an issue with indexing rather than search. When creating DataDocument documents it was using the EntityValue.getPrimaryKeysString() method instead of the internal method (my attempt at code reuse), but that was incorrect behavior for those EntityValue objects which are instances of the dynamic view entity for the DataDocument’s DB query rather than the EntityValue for the primary entity only.

In other words if you looks at the IDs of the docs in ElasticSearch you’ll see big long strings for IDs instead of just the primary entity’s PK value.

This is fixed in this commit:

For an existing instance that deployed with this bug the best solution is probably to delete your ElasticSearch indexes impacted and reindex them (using the Feed Index screen in the System app).

On a very large database where feeding the data might take a long time you could also remove the duplicate and bad documents (each will be a fragment of the real data document) using a delete query searching for document IDs that are incorrect. For example, because Party has only one PK field delete all documents in the index for each one that uses the Party as the primary entity and has a “::” in the ID (:: is the value separator in these PK strings). With any such delete or other bulk data change queries it’s always a good idea to do a pre-query with the same query condition and manually spot check the records to make sure you’re deleting only the ES documents you want to.

taher · December 23, 2022, 10:34am

Great, thank you for attending to this David. I will test again and report if any further anomalies are found.