Hardware, Software, and Wetware: A Story of Performance Tuning

Introduction

Have you ever heard of Rudy Rucker? If not, he is the great-great-great-grandson of Hegel, the famous German philosopher. Not too interesting, more of a trivia fact. But it gets more exciting if you learn that Rucker (an American) is mathematician, computer scientist, and, as a science fiction author, one of the founders of “cyberpunk“. His non-fictional works include “Infinity and the Mind“, “The Fourth Dimension“, and “Mind Tools“, all located at the intersection of mathematics, computer science, philosophy, and formal logic. All highly inspiring and thus highly recommended. But be prepared for an intellectual roller coaster ride! His fictional “Ware Tetralogy” includes two novels “Software” and “Wetware“, see the title of this article.

Enter Gauss

Now you all know the story of little Gauss at school, where the teacher tried to occupy the children of the class by having them sum the numbers from 1 to 100. Before he had a chance to leave the classroom, Carl Friedrich was done and presented the correct solution of 5050 on his slate. He had “seen” that

1 + 2 + … + 50 + 51 + … + 99 + 100

could be regrouped to

(1 + 100) + (2 + 99) + … + (50 + 51)

and so had just to multiply the sum of each pair (101) with the number of pairs (50). By the way, it was not Gauss who discovered this. The fact had been known already in pre-Greek mathematics.

Perhaps you would feel tempted nowadays to use a computer for this. Try this in the Python prompt of Abaqus/CAE:

del sum # to use the pure Python version of “sum”, not the Abaqus one
sum(range(1+10**2)))

gives, of course 5050. Now try

sum(range(1+10**8)))

It takes quite some time before the prompt returns with

5000000050000000L

Some performance tuning? First, Hardware. Getting a faster machine might save some percent. Second, Software. Using a faster programming language (Python is terribly slow) such as C might give you a speedup factor of 100. Much better. But once we increase our exponent from 8 to 100 (to sum up to Googol) in the example above, we will again be out of luck. Third, Wetware. Using the “litte Gauss”, as German mathematicians in mock reference to the “big Gauss” (his fundamental theorem of algebra) name it, the answer is trivial and obtained very quickly:

50000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000005000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

(That is a 5, 99 zeros, another 5, and another 99 zeros.)

So Wetware beats Software beats Hardware.

Enter Abaqus

Last week I had on my desk a model about which the customer claimed that it prevented him to use newer Abaqus versions, because “they are slower”. Now that sounded pretty unlikely, after all, every update seminar in every year show dozens of slides about performance enhancements in various areas of the code.

The job was run on 2, 4, and 8 cores (more did not make sense with respect to the small model size) on identical hardware with the customer’s version, 6.14, which is 8 years old (!), and the current 2022. The numbers in hours:

Cores	Abaqus 6.14	Abaqus 2022
2	8.15	7.03
4	4.62	3.98
8	2.68	2.30

So the current version is 14 percent faster. Case closed? Not yet, I wanted to see what Wetware would be able to achieve.

.inp file

The customer’s model was nothing special, a simple and rather small middle-of-the-road Abaqus analysis:

~10 *PART
2 *ELEMENT, TYPE=C3D8R
*ELEMENT, TYPE=C3D10M for the rest
*ELASTIC, *PLASTIC
10 *TIE
5 *CONTACT PAIR
*FRICTION 0.15
*STATIC, NLGEOM=YES
Non-zero *BOUNDARY on a *PRE-TENSION SECTION node

.dat file

For 2 *TIEs the secondary surface was less fine than the main surface
150000 elements
1000000 DOFs

.msg file

50 increments
440 iterations

.sta file

Too large initial time increment resulting in cut-back
Several increments with 10 equilibrium iterations or more

Call in the “CUND” Squad

Changed C3D10M to C3D10 where element quality criteria permitted
Set ADJUST=NO on all *TIEs (used to be a problem many years back, but no more
Swapped 2 *TIEs
Converted *CONTACT PAIR, TIED to *TIE
*CONTACT CONTROLS, STABILIZE
Used my “Qonvergence Quartet” (CUND):
- *CONTACT
- *STEP, UNSYMM=YES
- *STEP, NLGEOM=YES (was already active)
- *DYNAMIC, APPLICATION=QUASI-STATIC

Results

38 increments (formerly 50)
208 iterations (formerly 440)
1.66 h (formerly 2.30 h)

So the better model ran 28 percent faster. Again, Wetware beats Software. Case closed.

Connect with Axel in the community.

SIMULIA offers an advanced simulation product portfolio, including Abaqus, Isight, fe-safe, Tosca, Simpoe-Mold, SIMPACK, CST Studio Suite, XFlow, PowerFLOW, and more. The SIMULIA Community is the place to find the latest resources for SIMULIA software and to collaborate with other users. The key that unlocks the door of innovative thinking and knowledge building, the SIMULIA Community provides you with the tools you need to expand your knowledge, whenever and wherever.

Axel Reichert
Axel is a SIMULIA Industry Process Consultant Senior Specialist.

Topics mentioned in this article

Design & Simulation