Intro
This reports presents a method for measuring the TCG emulation efficiency in QEMU. This is achieved for seventeen different targets by comparing the number of guest instructions (running the program natively on the target) and the number of QEMU instructions (running the program through QEMU). For each target, the ratio between these two numbers presents a rough estimation of the emulation efficiency for that target.
Beside the five newly introduced benchmarks in the previous report, the Coulomb benchmark is also reused in this report to provide a variety of workloads. This gives a total of six benchmark programs that can be categorized into two groups:
- Floating point operations (group 1):
- coulomb_double
- matmult_double
- qsort_double
- Basic int and char operations (group 2):
- matmult_int32
- qsort_int32
- qsort_string
All benchmarks are available on the project GitHub page.
Table of Contents
Setup
All the measurements in this report are based on the newly released QEMU version 5.1.0-rc2.
To measure the number of guest instructions, the libinsn
plugin is utilized which is available when QEMU is built with the --enable-plugins
option. The general syntax of using the plugin is:
<qemu-executable> -plugin <qemu-plugins-build>/tests/plugin/libinsn.so -d plugin <test-program>
To measure the number of QEMU instructions, Callgrind is used. Please refer to the “Measuring Basic Performance Metrics of QEMU” report for more details on setting up and using Callgrind.
To create a plugins build based on the latest QEMU version, this bash snippet is used:
wget https://download.qemu.org/qemu-5.1.0-rc2.tar.xz
tar xfv qemu-5.1.0-rc2.tar.xz
cd qemu-5.1.0-rc2
mkdir build-gcc-plugins
cd build-gcc-plugins
../configure --disable-system --disable-tools --enable-plugins
make
Measurements
The Python script below creates a CSV table for each of the six benchmarks. Each table contains seventeen rows, one for each target. A row contains the target name, number of guest instructions, number of QEMU instructions and the ratio between the two numbers.
import csv
import os
import subprocess
import sys
import tempfile
############### Script Options ###############
qemu_build = "<qemu-plugins-build>"
targets = {
"aarch64": "aarch64-linux-gnu-gcc",
"alpha": "alpha-linux-gnu-gcc",
"arm": "arm-linux-gnueabi-gcc",
"hppa": "hppa-linux-gnu-gcc",
"m68k": "m68k-linux-gnu-gcc",
"mips": "mips-linux-gnu-gcc",
"mipsel": "mipsel-linux-gnu-gcc",
"mips64": "mips64-linux-gnuabi64-gcc",
"mips64el": "mips64el-linux-gnuabi64-gcc",
"ppc": "powerpc-linux-gnu-gcc",
"ppc64": "powerpc64-linux-gnu-gcc",
"ppc64le": "powerpc64le-linux-gnu-gcc",
"riscv64": "riscv64-linux-gnu-gcc",
"s390x": "s390x-linux-gnu-gcc",
"sh4": "sh4-linux-gnu-gcc",
"sparc64": "sparc64-linux-gnu-gcc",
"x86_64": "gcc"
}
##############################################
def measure_qemu_instructions(qemu_exe_path, program_exe_path):
# Measure the number of QEMU instructions using Callgrind
with tempfile.NamedTemporaryFile() as tmp_out:
run_callgrind = subprocess.run(["valgrind",
"--tool=callgrind",
"--callgrind-out-file=" + tmp_out.name,
qemu_exe_path,
program_exe_path],
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE)
callgrind_output = run_callgrind.stderr.decode("utf-8").split("\n")
return int(callgrind_output[8].split(" ")[-1])
csv_header = ["Target", "Guest Instructions", "QEMU Instructions", "Ratio"]
benchmarks = os.listdir('benchmarks')
libinsn_path = os.path.join(qemu_build, "tests", "plugin", "libinsn.so")
os.mkdir("tables")
for benchmark in benchmarks:
data = []
benchmark_name = os.path.splitext(benchmark)[0]
benchmark_path = os.path.join("benchmarks", benchmark)
for target_name, target_compiler in targets.items():
with tempfile.NamedTemporaryFile() as tmp_exe:
# Compile target
subprocess.run([target_compiler, "-O2", "-static",
benchmark_path, "-o", tmp_exe.name, "-lm"])
# Run the libinsn plugin
run_qemu_plugin = subprocess.run(["{}/{}-linux-user/qemu-{}".
format(qemu_build,
target_name,
target_name),
"-plugin",
libinsn_path,
"-d",
"plugin",
tmp_exe.name],
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE)
# Measure the instructions
guest_instructions = int(run_qemu_plugin.stderr.decode("utf-8").
split()[-1])
qemu_instruction = measure_qemu_instructions("{}/{}-linux-user/qemu-{}".
format(qemu_build,
target_name,
target_name),
tmp_exe.name)
data.append([target_name,
format(guest_instructions, ","),
format(qemu_instruction, ","),
"1:" + str(round((qemu_instruction / guest_instructions), 3))])
with open(os.path.join("tables", benchmark_name) + ".csv", "w") as file:
writer = csv.writer(file)
writer.writerow(csv_header)
writer.writerows(data)
Results (Benchmarks Group 1)
coulomb_double
Target | Guest Instructions | QEMU Instructions | Ratio |
---|---|---|---|
aarch64 | 182 965 444 | 4 424 319 223 | 1:24.181 |
alpha | 287 894 875 | 10 720 832 859 | 1:37.239 |
arm | 4 353 433 161 | 39 328 640 162 | 1:9.034 |
hppa | 290 299 145 | 12 007 537 148 | 1:41.363 |
m68k | 55 464 791 | 7 107 559 194 | 1:128.145 |
mips | 286 969 260 | 9 957 633 056 | 1:34.699 |
mipsel | 300 313 870 | 11 123 315 018 | 1:37.039 |
mips64 | 255 992 742 | 9 855 532 178 | 1:38.499 |
mips64el | 266 739 104 | 11 004 724 703 | 1:41.257 |
ppc | 239 658 319 | 13 031 944 195 | 1:54.377 |
ppc64 | 228 263 889 | 13 034 833 440 | 1:57.104 |
ppc64le | 220 968 816 | 13 012 936 191 | 1:58.890 |
riscv64 | 209 944 207 | 4 069 430 554 | 1:19.383 |
s390x | 215 191 419 | 11 013 187 596 | 1:51.179 |
sh4 | 473 219 807 | 12 728 861 129 | 1:26.898 |
sparc64 | 263 295 373 | 11 969 980 973 | 1:45.462 |
x86_64 | 225 499 576 | 4 643 073 756 | 1:20.590 |
matmult_double
Target | Guest Instructions | QEMU Instructions | Ratio |
---|---|---|---|
aarch64 | 62 565 037 | 1 412 678 042 | 1:22.579 |
alpha | 120 146 835 | 3 021 375 794 | 1:25.147 |
arm | 917 721 514 | 8 723 369 272 | 1:9.505 |
hppa | 63 330 121 | 3 346 341 016 | 1:52.840 |
m68k | 62 270 262 | 3 327 921 564 | 1:53.443 |
mips | 87 981 027 | 2 263 506 435 | 1:25.727 |
mipsel | 95 981 109 | 3 176 876 928 | 1:33.099 |
mips64 | 80 557 580 | 2 277 631 169 | 1:28.273 |
mips64el | 88 557 574 | 3 190 361 616 | 1:36.026 |
ppc | 48 136 797 | 3 125 669 697 | 1:64.933 |
ppc64 | 64 408 551 | 3 203 728 174 | 1:49.741 |
ppc64le | 64 289 333 | 3 203 064 933 | 1:49.823 |
riscv64 | 78 623 128 | 1 222 950 784 | 1:15.555 |
s390x | 46 190 841 | 2 726 829 922 | 1:59.034 |
sh4 | 88 962 981 | 3 342 515 085 | 1:37.572 |
sparc64 | 79 003 237 | 3 207 541 031 | 1:40.600 |
x86_64 | 61 517 622 | 1 250 647 935 | 1:20.330 |
qsort_double
Target | Guest Instructions | QEMU Instructions | Ratio |
---|---|---|---|
aarch64 | 159 746 207 | 2 658 877 440 | 1:16.644 |
alpha | 228 521 249 | 1 949 737 992 | 1:8.532 |
arm | 662 068 324 | 9 121 836 857 | 1:13.778 |
hppa | 247 113 645 | 3 141 276 704 | 1:12.712 |
m68k | 203 935 507 | 4 934 908 874 | 1:24.198 |
mips | 207 350 635 | 2 099 043 136 | 1:10.123 |
mipsel | 207 350 618 | 2 099 343 286 | 1:10.125 |
mips64 | 188 086 328 | 1 971 371 119 | 1:10.481 |
mips64el | 188 086 318 | 1 968 839 700 | 1:10.468 |
ppc | 224 876 043 | 2 736 474 437 | 1:12.169 |
ppc64 | 203 809 886 | 2 685 763 461 | 1:13.178 |
ppc64le | 193 040 770 | 2 642 651 058 | 1:13.690 |
riscv64 | 167 397 846 | 1 590 611 459 | 1:9.502 |
s390x | 130 867 251 | 2 475 571 654 | 1:18.917 |
sh4 | 244 843 868 | 2 563 068 375 | 1:10.468 |
sparc64 | 190 084 290 | 3 919 439 599 | 1:20.619 |
x86_64 | 156 689 097 | 1 987 553 774 | 1:12.685 |
Results (Benchmarks Group 2)
matmult_int32
Target | Guest Instructions | QEMU Instructions | Ratio |
---|---|---|---|
aarch64 | 62 555 845 | 596 194 508 | 1:9.531 |
alpha | 96 215 385 | 370 654 042 | 1:3.852 |
arm | 63 690 750 | 736 994 597 | 1:11.571 |
hppa | 103 978 473 | 667 790 898 | 1:6.422 |
m68k | 62 534 491 | 407 647 521 | 1:6.519 |
mips | 88 083 941 | 497 767 190 | 1:5.651 |
mipsel | 88 083 929 | 497 780 326 | 1:5.651 |
mips64 | 89 460 954 | 479 725 676 | 1:5.362 |
mips64el | 89 460 943 | 463 106 726 | 1:5.177 |
ppc | 55 843 156 | 338 959 876 | 1:6.070 |
ppc64 | 64 204 690 | 390 884 485 | 1:6.088 |
ppc64le | 64 205 395 | 390 743 122 | 1:6.086 |
riscv64 | 86 448 202 | 349 669 158 | 1:4.045 |
s390x | 62 614 807 | 492 407 746 | 1:7.864 |
sh4 | 72 780 143 | 399 937 800 | 1:5.495 |
sparc64 | 86 423 179 | 489 936 356 | 1:5.669 |
x86_64 | 61 590 922 | 400 190 791 | 1:6.498 |
qsort_int32
Target | Guest Instructions | QEMU Instructions | Ratio |
---|---|---|---|
aarch64 | 151 968 514 | 2 132 112 102 | 1:14.030 |
alpha | 221 192 248 | 1 460 982 497 | 1:6.605 |
arm | 160 875 621 | 3 375 777 484 | 1:20.984 |
hppa | 201 401 936 | 2 199 407 458 | 1:10.920 |
m68k | 169 894 134 | 1 780 208 909 | 1:10.478 |
mips | 176 712 823 | 1 501 040 830 | 1:8.494 |
mipsel | 176 712 809 | 1 503 808 218 | 1:8.510 |
mips64 | 176 020 831 | 1 504 536 270 | 1:8.547 |
mips64el | 176 020 824 | 1 483 550 240 | 1:8.428 |
ppc | 202 473 828 | 1 668 592 063 | 1:8.241 |
ppc64 | 198 918 772 | 1 780 051 140 | 1:8.949 |
ppc64le | 188 749 603 | 1 728 567 792 | 1:9.158 |
riscv64 | 159 048 448 | 1 289 755 584 | 1:8.109 |
s390x | 132 119 768 | 2 114 840 292 | 1:16.007 |
sh4 | 205 090 416 | 1 879 285 254 | 1:9.163 |
sparc64 | 185 195 979 | 3 352 756 658 | 1:18.104 |
x86_64 | 145 621 672 | 1 751 799 973 | 1:12.030 |
qsort_string
Target | Guest Instructions | QEMU Instructions | Ratio |
---|---|---|---|
aarch64 | 237 478 279 | 2 530 968 853 | 1:10.658 |
alpha | 310 349 344 | 1 794 207 498 | 1:5.781 |
arm | 277 491 839 | 7 167 746 267 | 1:25.830 |
hppa | 286 010 885 | 4 608 364 139 | 1:16.113 |
m68k | 242 574 561 | 2 295 663 078 | 1:9.464 |
mips | 331 063 420 | 2 114 226 632 | 1:6.386 |
mipsel | 331 063 408 | 2 111 085 204 | 1:6.377 |
mips64 | 304 640 414 | 1 969 109 275 | 1:6.464 |
mips64el | 304 640 409 | 1 951 425 342 | 1:6.406 |
ppc | 320 946 236 | 2 429 421 810 | 1:7.570 |
ppc64 | 272 956 914 | 2 404 978 156 | 1:8.811 |
ppc64le | 273 392 915 | 2 386 256 069 | 1:8.728 |
riscv64 | 216 826 004 | 1 564 149 511 | 1:7.214 |
s390x | 165 265 303 | 4 189 211 923 | 1:25.348 |
sh4 | 287 459 667 | 2 098 659 130 | 1:7.301 |
sparc64 | 304 142 262 | 4 130 702 783 | 1:13.581 |
x86_64 | 234 574 652 | 2 865 446 064 | 1:12.215 |
Analysis
The tables above are color coded to show the three best and worst emulation ratios for each benchmark. It can be noticed that within the same benchmark group, the ratios for all seventeen targets are nearly consistent.
It’s also clear that the ratio depends on the type of the program being emulated. Benchmarks in group 1 have a considerably larger emulation ratio compared to benchmarks in group 2.
The Python script below averages the ratios across different tables for each target. The results give a very good overview of QEMU’s emulation efficiency for each of the seventeen targets.
import os
import csv
# Tables directory
tables = os.listdir("tables")
csv_headers = ["Target", "QEMU Efficiency"]
# Initialize target arrays
target_names, target_ratio_sums = [], []
with open(os.path.join("tables", tables[0]), "r") as file:
# Skip headers line
file.readline()
lines = file.readlines()
for line in lines:
# Add target name
target_names.append(line.split(",")[0])
# Initialize sum to zero
target_ratio_sums.append(0)
# Number of benchmarks and targets
no_benchmarks = len(tables)
no_targets = len(target_names)
for table in tables:
with open(os.path.join("tables", table), "r") as file:
file.readline()
lines = file.readlines()
for i in range(len(lines)):
target_ratio_sums[i] += float(lines[i].split(",")
[-1].split(":")[-1])
target_ratio_avgs = ["1:"+str(round((x / no_benchmarks), 3))
for x in target_ratio_sums]
with open("efficiency.csv", "w") as file:
writer = csv.writer(file)
writer.writerow(csv_headers)
for i in range(no_targets):
writer.writerow([target_names[i], target_ratio_avgs[i]])
The script can be ran three times to obtain three tables.
On the left is the table for averaging the three benchmarks in group 1. The table in the middle represents the average ratio for benchmarks in group 2. Lastly, the table on the right is the average of all six benchmarks.
Target | QEMU Efficiency (group 1) |
---|---|
aarch64 | 1:21.135 |
alpha | 1:23.639 |
arm | 1:10.772 |
hppa | 1:35.638 |
m68k | 1:68.595 |
mips | 1:23.516 |
mipsel | 1:26.754 |
mips64 | 1:25.751 |
mips64el | 1:29.250 |
ppc | 1:43.826 |
ppc64 | 1:40.008 |
ppc64le | 1:40.801 |
riscv64 | 1:14.813 |
s390x | 1:43.043 |
sh4 | 1:24.979 |
sparc64 | 1:35.560 |
x86_64 | 1:17.868 |
Target | QEMU Efficiency (group 2) |
---|---|
aarch64 | 1:11.406 |
alpha | 1:5.413 |
arm | 1:19.462 |
hppa | 1:11.152 |
m68k | 1:8.82 |
mips | 1:6.844 |
mipsel | 1:6.846 |
mips64 | 1:6.791 |
mips64el | 1:6.670 |
ppc | 1:7.294 |
ppc64 | 1:7.949 |
ppc64le | 1:7.991 |
riscv64 | 1:6.456 |
s390x | 1:16.406 |
sh4 | 1:7.32 |
sparc64 | 1:12.451 |
x86_64 | 1:10.248 |
Target | QEMU Efficiency (overall) |
---|---|
aarch64 | 1:16.270 |
alpha | 1:14.526 |
arm | 1:15.117 |
hppa | 1:23.395 |
m68k | 1:38.708 |
mips | 1:15.180 |
mipsel | 1:16.800 |
mips64 | 1:16.271 |
mips64el | 1:17.960 |
ppc | 1:25.560 |
ppc64 | 1:23.979 |
ppc64le | 1:24.396 |
riscv64 | 1:10.635 |
s390x | 1:29.725 |
sh4 | 1:16.149 |
sparc64 | 1:24.006 |
x86_64 | 1:14.058 |