"service mapping recomputation" jobs causing ServiceNow performance issues

 

Please review these - Service Mapping Recomputation jobs and as mentioned in KB - KB0824377 lower the job count to 1 for immediate relief.

(https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0824377

 

https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0869419

 )

 

see also below re: sa.service.max_ci_service_population sys property setting

 

Memory degradation caused by high number of "Service Mapping Recomputation" jobs

 https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0824377 


note my observation is that pulling this property via an update set DOES NOT WORK - it needs to be created manually per instance. On San Diego version at least


1) Go to the "System Properties" list (https://<instance>.service-now.com/sys_properties_list.do

2) Click new 

a) Fill out: 
Name ==> glide.service_mapping.recomputation.job_count
Type: Integer
Value: 1 

b) Save 
 
can actually kill these jobs by setting this to 0 
==============================
 

Issue:
Dev running slow

Investigation Summary:

I have reviewed the instance and could see the memory contention on different nodes across the instance .

servlet       started              heapspace        metaspace          rss      virt   cpu    resp    tpm  sess  errors
------------  -------------------  ---------------  -----------------  -------  -----  -----  ------  ---  ----  ------
xxxpdev015 
2022-04-20 10:34:22    1.8 G / 1.9 G  147.0 M / 304.0 M    2.6 G  2.8 G  214.6   12.81  1.4     4     201
xxxdev021  2022-04-13 10:54:21 
  1.8 G / 1.9 G  201.0 M / 304.0 M    2.9 G  3.0 G  215.3  158.01  1.4    13  224975
xxxev022 
2022-04-20 12:11:58  311.0 M / 1.9 G  106.0 M / 304.0 M  681.0 M  2.7 G  875.0    0.00  0.0     0       0
xxxdev023  2022-04-13 11:01:16    1.6 G / 1.9 G  186.0 M / 304.0 M    2.8 G  3.0 G  184.6 
  5.72  0.2    11  223210

job           thread                    item                                      started              age   
------------  ------------------------  ----------------------------------------  -------------------  -------
xxxdev021  glide.scheduler.worker.6  Service Mapping Recomputation 2           2022-04-20 11:54:33 
0:18:39
xxxdev021  glide.scheduler.worker.5  Service Mapping Recomputation 1           2022-04-20 11:56:36 
0:16:35
xxxdev015  glide.scheduler.worker.2  Autoclose Incidents                       2022-04-20 12:00:12 
0:13:00
xxxdev015  glide.scheduler.worker.3  Service Mapping Recomputation 1           2022-04-20 12:04:02 
0:09:10
xxxdev015  glide.scheduler.worker.0  Service Mapping Recomputation 2           2022-04-20 12:04:02 
0:09:10
xxxdev023  glide.scheduler.worker.2  Service Mapping Recomputation 1           2022-04-20 12:09:38  0:03:36
xxxdev021  glide.scheduler.worker.1  ASYNC: Affected ci notifications          2022-04-20 12:09:45  0:03:26
xxxdev023  glide.scheduler.worker.4  Service Mapping Recomputation 2           2022-04-20 12:11:27  0:01:47
xxxdev021  glide.scheduler.worker.3  Run Instance-side Probes                  2022-04-20 12:11:24  0:01:47
xxxdev023  glide.scheduler.worker.6  ASYNC: Affected ci notifications          2022-04-20 12:11:39  0:01:35
xxxdev023  glide.scheduler.worker.0  ASYNC: Affected ci notifications          2022-04-20 12:11:39  0:01:35
xxxdev015  glide.scheduler.worker.5  Event Management  - Impact Calculator fo  2022-04-20 12:11:39  0:01:33
xxxdev023  glide.scheduler.worker.1  Event Management  - Impact Calculator fo  2022-04-20 12:12:00  0:01:14
xxxdev021  glide.scheduler.worker.7  Update Business Service Status            2022-04-20 12:12:06  0:01:05
xxxdev015  glide.scheduler.worker.4  UsageAnalytics App Persistor              2022-04-20 12:12:42  0:00:29
xxxdev021  glide.scheduler.worker.4  ASYNC: Discovery - Sensors get (https://  2022-04-20 12:12:43  0:00:28
xxxdev023  glide.scheduler.worker.7  GCF Download Definition Collections       2022-04-20 12:12:55  0:00:19
xxxdev023  glide.scheduler.worker.5  GCF Download Blacklist and Whitelist      2022-04-20 12:12:55  0:00:19
xxxdev021  glide.scheduler.worker.2  ASYNC: Discovery - Sensors get (https://  2022-04-20 12:12:55  0:00:17
xxxdev022  glide.scheduler.worker.5  Init UI Metadata                          2022-04-20 12:13:07  0:00:04
xxxdev022  glide.scheduler.worker.3  Init Service Designer Form                2022-04-20 12:13:07  0:00:04
xxxdev022  glide.scheduler.worker.0  Register Instance                         2022-04-20 12:13:07  0:00:04
xxxdev022  glide.scheduler.worker.2  Init Service Portal SCSS                  2022-04-20 12:13:07  0:00:04



I have analysed the heapdumps for nodes - xxxdev021 and xxxdev022. Observed that multiple threads of 'Service Mapping Recomputation' jobs running on these nodes occupying the major heapspace and are the main reason of slowness.
for node - xxxdev021 , around 67% heapspace was used by 'Service Mapping Recomputation 1' job.

The thread com.glide.schedule_v2.SchedulerWorkerThread @ 0x9b531838 glide.scheduler.worker.0 keeps local variables with total size 1.196.819.952 (67,21%) bytes.

The memory is accumulated in one instance of com.glide.schedule_v2.SchedulerWorkerThread, loaded by

com.snc.orbit.container.tomcat8.Tomcat8$OrbitTomcat8ClassLoader

 @ 0x91c4fc78, which occupies 1.196.819.952 (67,21%) bytes.

 

2022-04-20 02:56:36 (969) worker.5 worker.5 txid=e5d1a3901bcf Name: Service Mapping Recomputation 1
Job Context:
#Wed Apr 20 02:56:33 PDT 2022
fcScriptName=in the schedule record

Script:
SNC.ServiceMappingFactory.recompute();
2022-04-20 02:56:36 (979) worker.5 worker.5 txid=e5d1a3901bcf service_mapping.coordinator                     : Recomputing environment '7216c62d1b3249106248a822b24bcb92'
2022-04-20 02:56:37 (054) worker.5 worker.5 txid=e5d1a3901bcf service_mapping.coordinator                     : Pre processing environment '7216c62d1b3249106248a822b24bcb92'
2022-04-20 02:56:37 (054) worker.5 worker.5 txid=e5d1a3901bcf service_mapping.service_populator               : Populating service 3e16c62d1b3249106248a822b24bcb4d via Script Populator
2022-04-20 02:56:37 (055) worker.5 worker.5 txid=e5d1a3901bcf service_mapping.service_populator               : About to acquire lock SMServicePopulatorLock (service=3e16c62d1b3249106248a822b24bcb4d, mode=RECOMPUTATION)
2022-04-20 02:56:37 (661) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.165] id: xxxdev_1[glide.12 (connpid=756727)] for: DBQuery#loadResultSet[cmdb_rel_ci:

parentIN717907df1bde85906248a822b24bcb5a,a280ede31bde499049c38732f54bcb71,2e209ed61b464990d4802f0a2d4bcb49,1d7bfb0f1bdac1906248a822b24bcb9f,94c944521bb2411049c38732f54bcbae,8a8d9f3f1b2ac9946248a822b24bcbb9,b87903df1bde85906248a822b24bcb31,651b06261b3d09946248a822b24bcbfd,050e2be61beec55449c38732f54bcb6f,1b20ded61b464990d4802f0a2d4bcb5f,e3101ad61b464990d4802f0a2d4bcb86,098b7f4f1bdac1906248a822b24bcb29,7c7903df1bde85906248a822b24bcb33,1cf2341a1bb2811049c38732f54bcbe9,e97b3f0f1bdac1906248a822b24bcb00,3fc99f2e1b0e45506248a822b24bcbbb,31fa06e21b3d09946248a822b24bcbc1,864be7371b220d94987d1fc3b24bcb95,e47767a61b4e01d0987d1fc3b24bcbd3,8ace7f271b754d14987d1fc3b24bcb61,070825831b0a49d0d4802f0a2d4bcb2e,1050d25a1b464990d4802f0a2d4bcb0c,e57bfb0f1bdac1906248a822b24bcba7,f47903df1bde85906248a822b24bcb2f^child.sys_class_pathNOT

 LIKE/!!/#L%^child.sys_class_pathNOT LIKE/!!/!D%^child.sys_class_pathNOT LIKE/!!/!(%^child.sys...
2022-04-20 02:56:37 (661) worker.5 worker.5 txid=e5d1a3901bcf Time: 0:00:00.198 id: xxxdev_1[glide.12] (connpid=756727) for: SELECT cmdb_rel_ci0.`parent`, cmdb_rel_ci0.`child`, cmdb_rel_ci0.`type`, cmdb_rel_ci0.`sys_id`, cmdb_rel_ci0.`sys_updated_on`, cmdb1.`sys_class_name` AS parent_sys_class_name, cmdb1.`sys_domain` AS parent_sys_domain_sys_id, cmdb1.`name` AS parent_name, cmdb2.`sys_class_name` AS child_sys_class_name, cmdb2.`sys_domain` AS child_sys_domain_sys_id, cmdb2.`name` AS child_name, cmdb1.`sys_id` AS parent_sys_id, cmdb2.`sys_id` AS child_sys_id FROM ((cmdb_rel_ci cmdb_rel_ci0  LEFT JOIN cmdb cmdb1 ON cmdb_rel_ci0.`parent` = cmdb1.`sys_id` )  LEFT JOIN cmdb cmdb2 ON cmdb_rel_ci0.`child` = cmdb2.`sys_id` )  WHERE cmdb_rel_ci0.`parent` IN ('717907df1bde85906248a822b24bcb5a' , 'a280ede31bde499049c38732f54bcb71' , '2e209ed61b464990d4802f0a2d4bcb49' , '1d7bfb0f1bdac1906248a822b24bcb9f' , '94c944521bb2411049c38732f54bcbae' , '8a8d9f3f1b2ac9946248a822b24bcbb9' , 'b87903df1bde85906248a822b24bcb31' , '651b06261b3d09946248a822b24bcbfd' , '050e2be61beec55449c38732f54bcb6f' , '1b20ded61b464990d4802f0a2d4bcb5f' , 'e3101ad61b464990d4802f0a2d4bcb86' , '098b7f4f1bdac1906248a822b24bcb29' , '7c7903df1bde85906248a822b24bcb33' , '1cf2341a1bb2811049c38732f54bcbe9' , 'e97b3f0f1bdac1906248a822b24bcb00' , '3fc99f2e1b0e45506248a822b24bcbbb' , '31fa06e21b3d09946248a822b24bcbc1' , '864be7371b220d94987d1fc3b24bcb95' , 'e47767a61b4e01d0987d1fc3b24bcbd3' , '8ace7f271b754d14987d1fc3b24bcb61' , '070825831b0a49d0d4802f0a2d4bcb2e' , '1050d25a1b464990d4802f0a2d4bcb0c' , 'e57bfb0f1bdac1906248a822b24bcba7' , 'f47903df1bde85906248a822b24bcb2f') AND (cmdb2.`sys_class_path` NOT LIKE '/!!/#L%' AND cmdb2.`sys_class_path` NOT LIKE '/!!/!D%' AND cmdb2.`sys_class_path` NOT LIKE '/!!/!(%' AND cmdb2.`sys_class_path` NOT LIKE '/!!/!M%' AND cmdb2.`sys_class_path` NOT LIKE '/!!/#3%' AND cmdb_rel_ci0.`type` != '11ee47317f723100ed1c3b19befa91f9') /* xxxdev021, gs:glide.scheduler.worker.5, tx:e5d1a3901bcfc5906248a822b24bcb13 */
2022-04-20 02:56:37 (678) worker.5 worker.5 txid=e5d1a3901bcf WARNING *** WARNING *** service_mapping.batch_manual_service_populator  : CI count in populate action exceeded the maximum allowed (1000). You can change the default by adding sys_property sa.service.max_ci_service_population
2022-04-20 02:56:37 (678) worker.5 worker.5 txid=e5d1a3901bcf WARNING *** WARNING *** service_mapping.cmdb_walker                     : Relation count 1,000 exceeded its defined limit (1,000). There will be no more relations added to the result
2022-04-20 02:56:39 (618) worker.5 worker.5 txid=e5d1a3901bcf identification_engine                           : logId:[4ad127901bcf] Encountered an insert during delay locking, restarting processing under lock
2022-04-20 02:56:40 (656) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.011] Compacting large row block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40 (673) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.011] Compacting large row block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40 (694) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.012] Compacting large row block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40 (711) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.011] Compacting large row block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40 (727) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.009] Compacting large row block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40 (751) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.013] Compacting large row block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40 (769) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.011] Compacting large row block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40 (790) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.012] Compacting large row block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40 (811) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.013] Compacting large row block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40 (830) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.012] Compacting large row block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)



Next Steps:

Please review these - Service Mapping Recomputation jobs and as mentioned in KB - KB0824377 lower the job count to 1 for immediate relief.

(https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0824377)


Please update the case if further assistance is required.

***

As part of the troubleshooting process, I and other ServiceNow personnel may need to access your instance(s) in order to review the service impact to your instance and determine the root cause. It may also be necessary to make some changes on a sub-production instance in order to troubleshoot the issue or to test a probable solution. These changes, if any, will be reverted back to the original state. If any change is not reverted a reason will be provided for the same.

If you need immediate assistance, please use one of the contact numbers from our support contact page:

http://www.servicenow.com/support/contact-support.html

You will then be able to enter your Case or Change number over the phone to have your call routed to the Support Team.

 ===

Thank you for your continued patience with this case.
We have identified that the memory issues are caused by excessively large service maps that pull in lots of their data during recomputation.

There isn't a lot we can do to reduce the impact of this but one option could be to reduce the 'sa.service.max_ci_service_population' property value to lower than 1000 which is OOB.
This property limits the amount of CIs in the dynamic service, so that once this value is reached the population logic stops which could impact functionality of larger maps.

Setting it to a lower value will stop the population phase at an earlier stage, thus it will not reached levels that contain thousands of CIs.
Note that this property is global, thus affects all Dynamic Services.

We have also engaged development for their assistance but they require SNC access to investigate further.

 

 

Comments

Popular posts from this blog

GlideRecord setValue

URL link in addInfoMessage