Please review these - Service Mapping Recomputation jobs and as mentioned in KB - KB0824377 lower the job count to 1 for immediate relief.
(https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0824377
https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0869419
)
see also below re: sa.service.max_ci_service_population sys property setting
Memory degradation caused by high number of "Service Mapping Recomputation" jobs
https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0824377
note my observation is that pulling this property via an update set DOES NOT WORK - it needs to be created manually per instance. On San Diego version at least
1) Go to the "System Properties" list (https://<instance>.service-now.com/sys_properties_list.do)
2) Click new
a) Fill out:
Name ==> glide.service_mapping.recomputation.job_count
Type: Integer
Value: 1
b) Save
can actually kill these jobs by setting this to 0
==============================
Issue:
Dev running slow
Investigation Summary:
I have reviewed the instance and could see the memory contention on different
nodes across the instance .
servlet started heapspace metaspace rss
virt cpu resp
tpm sess errors
------------ ------------------- --------------- ----------------- -------
----- ----- ------
--- ---- ------
xxxpdev015 2022-04-20 10:34:22 1.8 G / 1.9 G
147.0 M / 304.0 M 2.6 G 2.8 G 214.6 12.81
1.4 4 201
xxxdev021 2022-04-13 10:54:21 1.8 G / 1.9 G
201.0 M / 304.0 M 2.9 G 3.0 G
215.3 158.01 1.4
13 224975
xxxev022 2022-04-20 12:11:58 311.0 M / 1.9 G 106.0 M / 304.0 M 681.0 M
2.7 G 875.0 0.00
0.0 0
0
xxxdev023 2022-04-13 11:01:16 1.6 G / 1.9 G 186.0 M / 304.0 M 2.8 G
3.0 G 184.6 5.72
0.2 11 223210
job thread item
started age
------------ ------------------------ ---------------------------------------- ------------------- -------
xxxdev021 glide.scheduler.worker.6 Service Mapping Recomputation 2 2022-04-20 11:54:33 0:18:39
xxxdev021 glide.scheduler.worker.5 Service Mapping Recomputation 1 2022-04-20 11:56:36 0:16:35
xxxdev015 glide.scheduler.worker.2 Autoclose Incidents 2022-04-20 12:00:12 0:13:00
xxxdev015 glide.scheduler.worker.3 Service Mapping Recomputation 1 2022-04-20 12:04:02 0:09:10
xxxdev015 glide.scheduler.worker.0 Service Mapping Recomputation 2 2022-04-20 12:04:02 0:09:10
xxxdev023 glide.scheduler.worker.2 Service Mapping Recomputation 1 2022-04-20 12:09:38 0:03:36
xxxdev021 glide.scheduler.worker.1 ASYNC: Affected ci notifications 2022-04-20 12:09:45 0:03:26
xxxdev023 glide.scheduler.worker.4 Service Mapping Recomputation 2 2022-04-20 12:11:27 0:01:47
xxxdev021 glide.scheduler.worker.3 Run Instance-side Probes 2022-04-20 12:11:24 0:01:47
xxxdev023 glide.scheduler.worker.6 ASYNC: Affected ci notifications 2022-04-20 12:11:39 0:01:35
xxxdev023 glide.scheduler.worker.0 ASYNC: Affected ci notifications 2022-04-20 12:11:39 0:01:35
xxxdev015 glide.scheduler.worker.5 Event Management - Impact Calculator fo 2022-04-20 12:11:39 0:01:33
xxxdev023 glide.scheduler.worker.1 Event Management - Impact Calculator fo 2022-04-20 12:12:00 0:01:14
xxxdev021 glide.scheduler.worker.7 Update Business Service Status 2022-04-20 12:12:06 0:01:05
xxxdev015 glide.scheduler.worker.4 UsageAnalytics App Persistor 2022-04-20 12:12:42 0:00:29
xxxdev021 glide.scheduler.worker.4 ASYNC: Discovery - Sensors get (https:// 2022-04-20 12:12:43 0:00:28
xxxdev023 glide.scheduler.worker.7 GCF Download Definition Collections 2022-04-20 12:12:55 0:00:19
xxxdev023 glide.scheduler.worker.5 GCF Download Blacklist and Whitelist 2022-04-20 12:12:55 0:00:19
xxxdev021 glide.scheduler.worker.2 ASYNC: Discovery - Sensors get (https:// 2022-04-20 12:12:55 0:00:17
xxxdev022 glide.scheduler.worker.5 Init UI Metadata 2022-04-20 12:13:07 0:00:04
xxxdev022 glide.scheduler.worker.3 Init Service Designer Form 2022-04-20 12:13:07 0:00:04
xxxdev022 glide.scheduler.worker.0 Register Instance 2022-04-20
12:13:07 0:00:04
xxxdev022 glide.scheduler.worker.2 Init Service Portal SCSS 2022-04-20 12:13:07 0:00:04
I have analysed the heapdumps for nodes - xxxdev021 and xxxdev022. Observed
that multiple threads of 'Service Mapping Recomputation' jobs running on these
nodes occupying the major heapspace and are the main reason of slowness.
for node - xxxdev021 , around 67% heapspace was used by 'Service Mapping
Recomputation 1' job.
The thread
com.glide.schedule_v2.SchedulerWorkerThread @ 0x9b531838 glide.scheduler.worker.0
keeps local variables with total size 1.196.819.952 (67,21%) bytes.
The memory is
accumulated in one instance of com.glide.schedule_v2.SchedulerWorkerThread,
loaded by
com.snc.orbit.container.tomcat8.Tomcat8$OrbitTomcat8ClassLoader
@ 0x91c4fc78, which occupies 1.196.819.952 (67,21%) bytes.
2022-04-20 02:56:36 (969) worker.5
worker.5 txid=e5d1a3901bcf Name: Service Mapping Recomputation 1
Job Context:
#Wed Apr 20
02:56:33 PDT 2022
fcScriptName=in the
schedule record
Script:
SNC.ServiceMappingFactory.recompute();
2022-04-20 02:56:36
(979) worker.5 worker.5 txid=e5d1a3901bcf service_mapping.coordinator : Recomputing environment
'7216c62d1b3249106248a822b24bcb92'
2022-04-20 02:56:37
(054) worker.5 worker.5 txid=e5d1a3901bcf service_mapping.coordinator : Pre processing
environment '7216c62d1b3249106248a822b24bcb92'
2022-04-20 02:56:37
(054) worker.5 worker.5 txid=e5d1a3901bcf
service_mapping.service_populator
: Populating service 3e16c62d1b3249106248a822b24bcb4d via Script
Populator
2022-04-20 02:56:37
(055) worker.5 worker.5 txid=e5d1a3901bcf
service_mapping.service_populator
: About to acquire lock SMServicePopulatorLock
(service=3e16c62d1b3249106248a822b24bcb4d, mode=RECOMPUTATION)
2022-04-20 02:56:37
(661) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.165] id: xxxdev_1[glide.12
(connpid=756727)] for: DBQuery#loadResultSet[cmdb_rel_ci:
parentIN717907df1bde85906248a822b24bcb5a,a280ede31bde499049c38732f54bcb71,2e209ed61b464990d4802f0a2d4bcb49,1d7bfb0f1bdac1906248a822b24bcb9f,94c944521bb2411049c38732f54bcbae,8a8d9f3f1b2ac9946248a822b24bcbb9,b87903df1bde85906248a822b24bcb31,651b06261b3d09946248a822b24bcbfd,050e2be61beec55449c38732f54bcb6f,1b20ded61b464990d4802f0a2d4bcb5f,e3101ad61b464990d4802f0a2d4bcb86,098b7f4f1bdac1906248a822b24bcb29,7c7903df1bde85906248a822b24bcb33,1cf2341a1bb2811049c38732f54bcbe9,e97b3f0f1bdac1906248a822b24bcb00,3fc99f2e1b0e45506248a822b24bcbbb,31fa06e21b3d09946248a822b24bcbc1,864be7371b220d94987d1fc3b24bcb95,e47767a61b4e01d0987d1fc3b24bcbd3,8ace7f271b754d14987d1fc3b24bcb61,070825831b0a49d0d4802f0a2d4bcb2e,1050d25a1b464990d4802f0a2d4bcb0c,e57bfb0f1bdac1906248a822b24bcba7,f47903df1bde85906248a822b24bcb2f^child.sys_class_pathNOT
LIKE/!!/#L%^child.sys_class_pathNOT
LIKE/!!/!D%^child.sys_class_pathNOT LIKE/!!/!(%^child.sys...
2022-04-20 02:56:37
(661) worker.5 worker.5 txid=e5d1a3901bcf Time: 0:00:00.198 id: xxxdev_1[glide.12]
(connpid=756727) for: SELECT cmdb_rel_ci0.`parent`, cmdb_rel_ci0.`child`,
cmdb_rel_ci0.`type`, cmdb_rel_ci0.`sys_id`, cmdb_rel_ci0.`sys_updated_on`,
cmdb1.`sys_class_name` AS parent_sys_class_name, cmdb1.`sys_domain` AS
parent_sys_domain_sys_id, cmdb1.`name` AS parent_name, cmdb2.`sys_class_name`
AS child_sys_class_name, cmdb2.`sys_domain` AS child_sys_domain_sys_id,
cmdb2.`name` AS child_name, cmdb1.`sys_id` AS parent_sys_id, cmdb2.`sys_id` AS
child_sys_id FROM ((cmdb_rel_ci cmdb_rel_ci0
LEFT JOIN cmdb cmdb1 ON cmdb_rel_ci0.`parent` = cmdb1.`sys_id` ) LEFT JOIN cmdb cmdb2 ON cmdb_rel_ci0.`child`
= cmdb2.`sys_id` ) WHERE
cmdb_rel_ci0.`parent` IN ('717907df1bde85906248a822b24bcb5a' ,
'a280ede31bde499049c38732f54bcb71' , '2e209ed61b464990d4802f0a2d4bcb49' ,
'1d7bfb0f1bdac1906248a822b24bcb9f' , '94c944521bb2411049c38732f54bcbae' ,
'8a8d9f3f1b2ac9946248a822b24bcbb9' , 'b87903df1bde85906248a822b24bcb31' ,
'651b06261b3d09946248a822b24bcbfd' , '050e2be61beec55449c38732f54bcb6f' ,
'1b20ded61b464990d4802f0a2d4bcb5f' , 'e3101ad61b464990d4802f0a2d4bcb86' ,
'098b7f4f1bdac1906248a822b24bcb29' , '7c7903df1bde85906248a822b24bcb33' ,
'1cf2341a1bb2811049c38732f54bcbe9' , 'e97b3f0f1bdac1906248a822b24bcb00' ,
'3fc99f2e1b0e45506248a822b24bcbbb' , '31fa06e21b3d09946248a822b24bcbc1' ,
'864be7371b220d94987d1fc3b24bcb95' , 'e47767a61b4e01d0987d1fc3b24bcbd3' ,
'8ace7f271b754d14987d1fc3b24bcb61' , '070825831b0a49d0d4802f0a2d4bcb2e' ,
'1050d25a1b464990d4802f0a2d4bcb0c' , 'e57bfb0f1bdac1906248a822b24bcba7' ,
'f47903df1bde85906248a822b24bcb2f') AND (cmdb2.`sys_class_path` NOT LIKE
'/!!/#L%' AND cmdb2.`sys_class_path` NOT LIKE '/!!/!D%' AND
cmdb2.`sys_class_path` NOT LIKE '/!!/!(%' AND cmdb2.`sys_class_path` NOT LIKE
'/!!/!M%' AND cmdb2.`sys_class_path` NOT LIKE '/!!/#3%' AND cmdb_rel_ci0.`type`
!= '11ee47317f723100ed1c3b19befa91f9') /* xxxdev021, gs:glide.scheduler.worker.5,
tx:e5d1a3901bcfc5906248a822b24bcb13 */
2022-04-20 02:56:37
(678) worker.5 worker.5 txid=e5d1a3901bcf WARNING *** WARNING ***
service_mapping.batch_manual_service_populator
: CI count in populate action exceeded the maximum allowed (1000). You
can change the default by adding sys_property
sa.service.max_ci_service_population
2022-04-20 02:56:37
(678) worker.5 worker.5 txid=e5d1a3901bcf WARNING *** WARNING ***
service_mapping.cmdb_walker
: Relation count 1,000 exceeded its defined limit (1,000). There will be
no more relations added to the result
2022-04-20 02:56:39
(618) worker.5 worker.5 txid=e5d1a3901bcf identification_engine :
logId:[4ad127901bcf] Encountered an insert during delay locking, restarting
processing under lock
2022-04-20 02:56:40
(656) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.011] Compacting large row
block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40
(673) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.011] Compacting large row
block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40
(694) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.012] Compacting large row
block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40
(711) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.011] Compacting large row
block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40
(727) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.009] Compacting large row
block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40
(751) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.013] Compacting large row
block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40
(769) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.011] Compacting large row
block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40
(790) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.012] Compacting large row
block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40
(811) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.013] Compacting large row
block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
2022-04-20 02:56:40
(830) worker.5 worker.5 txid=e5d1a3901bcf [0:00:00.012] Compacting large row
block (file.write: cmdb_rel_ci 10000 rows 160000 saveSize)
Next Steps:
Please review these - Service Mapping Recomputation jobs and as mentioned in KB
- KB0824377 lower the job count to 1 for immediate relief.
(https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0824377)
Please update the case if further assistance is required.
***
As part of the troubleshooting process, I and other ServiceNow personnel may
need to access your instance(s) in order to review the service impact to your
instance and determine the root cause. It may also be necessary to make some
changes on a sub-production instance in order to troubleshoot the issue or to
test a probable solution. These changes, if any, will be reverted back to the
original state. If any change is not reverted a reason will be provided for the
same.
If you need immediate assistance, please use one of the contact numbers from
our support contact page:
http://www.servicenow.com/support/contact-support.html
You will then be able to enter your Case or Change number over the phone to have your call routed to the Support Team.
===
Thank you for your continued patience with this case.
We have identified that the memory issues are caused by excessively large
service maps that pull in lots of their data during recomputation.
There isn't a lot we can do to reduce the impact of this but one option could
be to reduce the 'sa.service.max_ci_service_population' property value to lower
than 1000 which is OOB.
This property limits the amount of CIs in the dynamic service, so that once
this value is reached the population logic stops which could impact
functionality of larger maps.
Setting it to a lower value will stop the population phase at an earlier stage,
thus it will not reached levels that contain thousands of CIs.
Note that this property is global, thus affects all Dynamic Services.
We have also engaged development for their assistance but they require SNC
access to investigate further.
Comments
Post a Comment