Hi Experts,
We have this performance issue where there appears to be 'locking' or 'hold' for the metadata table /IWBEP/I_MGW_CTC whenever there is a lot of load or parallel calls to the same service/URI. The BW team has already performed several optimizations on their end enough to bring down the application run time to 22 seconds. However the gateway layer still has 90~160 seconds run time which is too long to be of any practical use.
The service model is generated by SAP_BW 7.4 SP10 (w/Gateway Add-on SAP_GWFND 7.4 SP12) HANA on-premise. There is no more setup in SEGW for this. Once the BW enables a query for ODATA, it will generate a service model which we can see in /IWFND/MAINT_SERVICE. We then just set a system alias for the service and activate it.
The issue is very intermittent so I did a test with my small team of 5 different users accessing the same service. Notice below how for after every instance, the duration for GET_META_DATA becomes longer.
If we look closely at the perf trace, we can see it spends a lot of time on the GET_META_DATA. Line 14 would be the application/generated class from the BW query.
Our BASIS team found the 'hold' on the application layer (but it's hard to spot this during run time)...
but you can see it as logs on the HANA DB layer in the view M_EXPENSIVE_STATEMENTS, note the small difference between duration and lock_wait.
We did some steps after those findings:
1) Implemented SAP Note 2224957.
2) Deactivated metadata cache via SPRO.
3) Cleaned up cache via /IWBEP/CACHE_CLEANUP.
Results:
1) The long backend overhead duration is still there.
2) I've read here in SCN that you cannot really deactivate the 'backend' IW_BEP metadata cache. We still saw the work process trying to update the metadata cache.
3) Since backend metadata cache cannot be deactivated, the table /IWBEP/I_MGW_CTC gets filled up again right after clean up.
I did some spot checks afterwards for 1 user and noticed the GET_META_DATA part is no longer showing. It's unclear whether it wasn't picked up because the logic did not pass there anymore or if the time spent on that method was no longer long enough to be picked up by the trace. In a perfect scenario this should be how the gateway should perform:
I asked our Java team to run a test case of 50 parallel calls to the same URI using a single user. We normally have a BG user setup for the service(s) in SICF which will be then used by the front-end UI, so this scenario will be realistic when it moves to production.
Checking the traces, the application run time seems to be hovering at 22k~23k milliseconds, while for the gateway framework backend overhead it was 29K milliseconds at the start then after every instance it grew almost exponentially until it peaked at 171k milliseconds.
For the later instances you can see the large discrepancy between the application time and the gateway backend overhead. Still, there is no more GET_META_DATA entry in the trace.
However, we still have the locking as shown in the HANA DB layer in the view M_EXPENSIVE_STATEMENTS which coincides with our test case.
It appears there are metadata logic updating /IWBEP/I_MGW_CTC inside the processing of GET_ENTITY_SET. So I looked for that and found some for GET_CACHED_MODEL and SET_CACHED_MODEL. I think it has to do with the set cache more which is in the form of EXPORT... TO DATABASE which I'm not familiar with. Though I noticed the cache key will be the same for each instance of the 50 parallel calls.
I also looked into the main method which is long running /IWBEP/CL_MGW_LOCAL_HANDLER->GET_ENTITY_SET. It calls a BADI /IWBEP/BD_MGW_SRV_RUNTIME so I looked for how it was instantiated. The GET BADI uses filters, which is again the same for each of the 50 parallel calls.
My question is how can we get rid of the metadata lock/wait or atleast reduce the backend overhead?
Does the metadata locking have to do with how the SAP standard methods are designed/are they working as expected?
Thanks and Best Regards,
Francis