Skip to content

Move catalog search indexer outside stores loop #33227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 24, 2021
Merged

Move catalog search indexer outside stores loop #33227

merged 3 commits into from
Oct 24, 2021

Conversation

jasperzeinstra
Copy link
Contributor

@jasperzeinstra jasperzeinstra commented Jun 13, 2021

Description (*)

There's a plugin (catalogsearchFulltextProductAssignment / Magento\Elasticsearch\Model\Indexer\Fulltext\Plugin\Category\Product\Action\Rows) in the Magento Elastisearch module that hooks in on the Category Products indexer (Magento\Catalog\Model\Indexer\Category\Product\Action\Rows)

This plugin receives the category id's that are reindexed by the Category Products indexer. This plugin collects the assigned products from these categorie id's per store. And inside this loop it triggers a Catalog Search Fulltext reindex for these product id's.

The problem is that the indexer is executed within the stores loop. But the indexer itself already does a reindex for all stores.

We ran in to this issue with our setup that contains 12 stores. 1 category was passed in to the Category Products indexer. The plugin was executed after fetches all connected products for this category (1000+). These 1000 were reindexed in the 12 stores loop and indexed 12 times in each loop.

Manual testing scenarios (*)

  1. Have a Magento setup with a lot of stores
  2. Have a category with a lot of products
  3. Disable the category, save it, enable the category, save it.
  4. Check how long the indexer Category Products takes

Contribution checklist (*)

  • Pull request has a meaningful description of its purpose
  • All commits are accompanied by meaningful commit messages
  • All new or changed code is covered with unit/integration tests (if applicable)
  • README.md files for modified modules are updated and included in the pull request if any README.md predefined sections require an update
  • All automated tests passed successfully (all builds are green)

Resolved issues:

  1. resolves [Issue] Move catalog search indexer outside stores loop #33984: Move catalog search indexer outside stores loop

@m2-assistant
Copy link

m2-assistant bot commented Jun 13, 2021

Hi @jasperzeinstra. Thank you for your contribution
Here are some useful tips how you can test your changes using Magento test environment.
Add the comment under your pull request to deploy test or vanilla Magento instance:

  • @magento give me test instance - deploy test instance based on PR changes
  • @magento give me 2.4-develop instance - deploy vanilla Magento instance

❗ Automated tests can be triggered manually with an appropriate comment:

  • @magento run all tests - run or re-run all required tests against the PR changes
  • @magento run <test-build(s)> - run or re-run specific test build(s)
    For example: @magento run Unit Tests

<test-build(s)> is a comma-separated list of build names. Allowed build names are:

  1. Database Compare
  2. Functional Tests CE
  3. Functional Tests EE,
  4. Functional Tests B2B
  5. Integration Tests
  6. Magento Health Index
  7. Sample Data Tests CE
  8. Sample Data Tests EE
  9. Sample Data Tests B2B
  10. Static Tests
  11. Unit Tests
  12. WebAPI Tests
  13. Semantic Version Checker

You can find more information about the builds here

ℹ️ Please run only needed test builds instead of all when developing. Please run all test builds before sending your PR for review.

For more details, please, review the Magento Contributor Guide documentation.

⚠️ According to the Magento Contribution requirements, all Pull Requests must go through the Community Contributions Triage process. Community Contributions Triage is a public meeting.

🕙 You can find the schedule on the Magento Community Calendar page.

📞 The triage of Pull Requests happens in the queue order. If you want to speed up the delivery of your contribution, please join the Community Contributions Triage session to discuss the appropriate ticket.

🎥 You can find the recording of the previous Community Contributions Triage on the Magento Youtube Channel

✏️ Feel free to post questions/proposals/feedback related to the Community Contributions Triage process to the corresponding Slack Channel

@mrtuvn
Copy link
Contributor

mrtuvn commented Jun 14, 2021

Thanks you for contribution!
Can you cover changes with automation tests ?
Just curiously how this impact with other indexer ? Previously i saw some people claims when they reindexer catalog_category_products magento also trigger catalog_search_index too. Imagine we have a lot products and each product may have some variants and lot attributes. Is this be different use case or may related with your case in scope of this pull request

@mrtuvn
Copy link
Contributor

mrtuvn commented Jun 14, 2021

cc: @engcom-Alfa @engcom-Bravo

@mrtuvn mrtuvn added Auto-Tests: Not Covered Changes in Pull Request requires coverage by auto-tests and removed Auto-Tests: Not Covered Changes in Pull Request requires coverage by auto-tests labels Jun 14, 2021
@mrtuvn
Copy link
Contributor

mrtuvn commented Jun 14, 2021

@magento run all tests

@magento-automated-testing
Copy link

The requested builds are added to the queue. You should be able to see them here within a few minutes. Please re-request them if they don't show in a reasonable amount of time.

@sidolov sidolov added the Priority: P1 Once P0 defects have been fixed, a defect having this priority is the next candidate for fixing. label Jun 17, 2021
Copy link
Contributor

@ihor-sviziev ihor-sviziev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do array_merge in a loop, usually it's causing performance issues.

@ihor-sviziev
Copy link
Contributor

@jasperzeinstra, will you be able to update this PR with suggested changes?

@hostep
Copy link
Contributor

hostep commented Aug 5, 2021

@engcom-Alfa: I think the comparison should be done between this PR and the parent commit of this branch (being 32fda1c in this case). Not on the latest 2.4-develop, because there are already more changes done to 2.4-develop and maybe some of them also influence the performance of this indexer.

Not saying that the results will be different, just saying that the comparison is not accurate in my opinion.

@ihor-sviziev
Copy link
Contributor

@engcom-Alfa, could you check if you merged the 2.4-develop into the branch with changes from PR before testing?

Copy link
Contributor

@ihor-sviziev ihor-sviziev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed with @engcom-Alfa in Slack - he tested changes with merged in the recent 2.4-develop branch.

@jasperzeinstra, could you double-check why your changes reducing indexation speed, which is not expected?

@hostep
Copy link
Contributor

hostep commented Aug 5, 2021

From what I can tell the testing results still make no sense since the code that was changed is only triggered by the partial reindexing system and not by the full reindexing system. Also running indexer:reindex twice on the same database will probably always be faster the second time you execute it, so this is not an accurate way of testing this.

So to properly test this, the performance of the cronjob indexer_update_all_views should be tested I think. Easiest way is probably by using magerun2 by executing php n98-magerun2.phar sys:cron:run indexer_update_all_views
So first indexer modes should be set to 'schedule'. Then you'll need to trigger some changes that will make the catalog_category_product indexer dirty somehow. Then make a backup of the database, test the performance of the indexer_update_all_views cronjob without changes from the PR, restore backup of database, now test with changes from the PR.

Does this make sense?

@engcom-Charlie
Copy link
Contributor

Thanks @hostep for the guidance. Moving this issue in Ready for testing again.

@engcom-Alfa
Copy link
Contributor

engcom-Alfa commented Aug 31, 2021

From what I can tell the testing results still make no sense since the code that was changed is only triggered by the partial reindexing system and not by the full reindexing system. Also running indexer:reindex twice on the same database will probably always be faster the second time you execute it, so this is not an accurate way of testing this.

So to properly test this, the performance of the cronjob indexer_update_all_views should be tested I think. Easiest way is probably by using magerun2 by executing php n98-magerun2.phar sys:cron:run indexer_update_all_views
So first indexer modes should be set to 'schedule'. Then you'll need to trigger some changes that will make the catalog_category_product indexer dirty somehow. Then make a backup of the database, test the performance of the indexer_update_all_views cronjob without changes from the PR, restore backup of database, now test with changes from the PR.

Does this make sense?

Thanks for the inputs @hostep , Yes, I executed full indexing by referring one of the above comment. I will re-do it following your further instructions.
I executed re indexing command twice in same database and agree that the indexing will be faster second time compared to first time. However, I executed the indexing command second time using PR fixed changes. So, second executed result should be very much faster than the first execution time, but taken time is more!
Anyhow, I will comeback as per your further instructed execution steps soon today!
Thanks

@engcom-Alfa
Copy link
Contributor

Hi @jasperzeinstra , @ihor-sviziev, @hostep

❌ QA not Passed Again as per below execution steps as well.

Manual Scenario Steps

  1. Created 2 separate Magento instances fresh with sample data (namely- prbefore & prafter).
  2. Created a new category in each (used exactly same name and configuration in both instances)
  3. prbefore instance is a 2.4-develop instance with latest update; and prafter instance is having this PR fixes code remotely switched and upgraded.
  4. Set both instances indexer to index by schedule
  5. By importing 10k products and by updating existing data, disturbed the indexing.
  6. Executed the command php n98-magerun2.phar sys:cron:run indexer_update_all_views

✔️ Expected result After Fix

  • php n98-magerun2.phar sys:cron:run indexer_update_all_views command execution time should be lesser compared to prior fix; Whatever the execution time currently being taken by the cron job, has to be reduced after this PR fix

❌ Actual result After Fix

  • php n98-magerun2.phar sys:cron:run indexer_update_all_views command execution time is hardly resulting lesser compared to prior fix. The command execution time taken before the fix was- 19mins57secs; Execution time taken after the fix is- 19mins47secs
  • So, reduced time was lesser than 0.01% and it is very much negligible
  • Below are the screenshots of the commands executed in both instances.
  • image
  • image

@ihor-sviziev
Copy link
Contributor

@engcom-Alfa, when you should see the bigger difference only in cases when you will edit the same product(s) multiple times.

@engcom-Alfa
Copy link
Contributor

@magento create issue

@engcom-Alfa
Copy link
Contributor

Manual testing scenario:

  1. Create a new category in each instances (used exactly same name and configuration in both instances)

  2. Set both instances indexer to index by schedule

  3. Import ~1000 products as such. Edit the many products multiple times by their price/quantity/weight as such. (I edited totally 200 products; each product 20 times). Do exactly same in both instances so that data indexer disturbs.

  4. Also create a Marketing>cart price rule and Content>Block in both the instances exactly same

  5. Execute the command php n98-magerun2.phar sys:cron:run indexer_update_all_views in the prbefore instance and note down the total time taken

  6. Execute the command php n98-magerun2.phar sys:cron:run indexer_update_all_views in the prafter instance and note down the total time taken

Before: ✖️ Command execution time taken in the prbefore instance is 6 seconds
image

After: ✔️ Command execution time taken in the prafter instance is nearly 4 seconds (performance has increased by 30-35%)
image

This has a specific fix belong to cron job performance issue. There is no impact on any other modules, also indexing has no any other issue as I tested as per previous comments. Hence there is no regression required on this!

@m2-assistant
Copy link

m2-assistant bot commented Oct 24, 2021

Hi @jasperzeinstra, thank you for your contribution!
Please, complete Contribution Survey, it will take less than a minute.
Your feedback will help us to improve contribution process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Perf/Frontend All tickets related with improving frontend performance. Auto-Tests: Not Required Changes in Pull Request does not require coverage by auto-tests Component: Elasticsearch Partner: MediaCT Pull Request is created by partner MediaCT partners-contribution Pull Request is created by Magento Partner Priority: P1 Once P0 defects have been fixed, a defect having this priority is the next candidate for fixing. Progress: accept Release Line: 2.4 Severity: S2 Major restrictions or short-term circumventions are required until a fix is available.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Issue] Move catalog search indexer outside stores loop
9 participants