Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
Reading time: 20 minutes | Coding time: 5 minutes
Intel Software Development Emulator (SDE) is used to run a given program on a specific instruction set architecture and capture various performance details like generated instructions (using MIX tool) and much more.
In this article, we will learn to use Intel SDE using a basic C++ code and capture the generated instructions using MIX tool which is an in-built tool in Intel SDE.
The first step is to install Intel SDE on your platform. To do this, follow the following steps for Linux:
- go to this page (software.intel.com) and follow the steps to download the binary.
As it is a protected download, we have to agree to the license and then download manually. This step has not be automated.
You will have the file as:
sde-external-8.35.0-2019-03-11-lin.tar.bz2
Extract the binary as:
tar -xjf sde-external-8.35.0-2019-03-11-lin.tar.bz2
A new folder named sde-external-8.35.0-2019-03-11-lin will be created. Exact name may differ depending upon your version.
Following it, go inside the folder and let us explore its content.
cd sde-external-8.35.0-2019-03-11-lin
ls -a
Output:
ia32 Licenses misc sde src xed64
intel64 LICENSE.txt README.txt sde64 xed
It has internal libraries for emulating ia32 and intel64. Add SDE to your system path environment variable PATH:
export PATH=/path/sde-external-8.16.0-2018-01-30-lin/:$PATH
Check help for SDE as follows:
sde -help
Part of the Output looks as before:
Intel(R) Software Development Emulator. Version: 8.35.0 external
Copyright (C) 2008-2016, Intel Corporation. All rights reserved.
Usage: sde [args] -- application [application-args]
For the longer tool help, use "-help-long".
If one of "-mix", "-debugtrace", or "-t toolname" are not,
supplied, then just the underlying emulator will run.
-mix Run mix histogram tool
-omix Set the output file name for mix, Implies -mix
Default is "mix.out"
-footprint Run footprint tool
-ofootprint Set the output file name for footprint,
Implies -footprint. Default is "footprint.out"
-debugtrace Run mix debugtrace tool
-odebugtrace Set the output file name for debugtrace,
Implies -debugtrace
Default is "debugtrace.out"
-ast Run the Intel(R) AVX/SSE transition checker
-oast Set the output file name for the Intel AVX/SSE
transition checker. Implies -ast
Default is "avx-sse-transition.out"
-quark Set chip-check and CPUID for Intel(R) Quark CPU
-p4 Set chip-check and CPUID for Intel(R) Pentium4 CPU
In this guide, we will use the mix tool.
We need to take a code to run which we will analyze. For this, we will use a simple C++ code with a for loop with 5 iterations. The code (in a file named "code.cpp" is as follows:
#include <iostream>
int main()
{
int i = 0;
for(i=0; i<5; i++)
std::cout << i << " ";
}
To run it, use the following commands:
g++ -std=c++11 code.cpp
./a.out
Output:
0 1 2 3 4
Run on different architecture
Now, suppose you have a old computer using Intel Broadwell CPU but you want to run it on Intel Skylake CPU. We can do this using SDE. The syntax for this is:
sde -<platform_code> -- [command]
You will find the platform code in the help. For our case, we want to run it on Skylake (code: skx) and use the MIX tool. The command for this will be:
sde -skx -- ./a.out
Output will be same:
0 1 2 3 4
If we want to run it on CascadeLake, use the following command:
sde -ckx -- ./a.out
Output will be same:
0 1 2 3 4
There are different platforms some of which are:
- SkyLake: skx
- CascadeLake: ckx
- Quark CPU: quark
- Future Intel chip: future
- Knights mill CPU: knm
Run using a tool
While running it on a platform, we can add a tool as well. One such tool is MIX tool which will capture the generated instructions which will be useful to study the difference in execution on different platforms.
The syntax of adding a tool is as follows:
sde -<platform_code> -tool -- [command]
There are different tools like:
- MIX tool
- Footprint tool
- debugtrace tool
- ast transition checker tool
In our case, we want to run the command on Skylake (skx) and using the MIX tool (-mix). The command will be as follows:
sde -skx -mix -- ./a.out
This will generate a new file named sde-mix-out.txt which will have the output of the MIX tool.
It is a large file and the starting few lines are as follows:
# Mix output version 10
# Intel(R) SDE version: 8.35.0 external
# Starting tid 0, OS-TID 40701
# FINI: end of program
# EMIT_IMAGE_ADDRESSES
#
# IMAGE NAME LOW ADDRESS HIGH ADDRESS
#
/home/amd/Desktop/aditya/sde/sde-external-8.35.0-2019-03-11-lin/a.out 000000400000 0000004009db
/lib64/ld-linux-x86-64.so.2 7f06c2f38000 7f06c2f5993f
[vdso] 7fffa195d000 7fffa195de6c
/lib64/libstdc++.so.6 7f06adbc1000 7f06adec735f
/lib64/libm.so.6 7f06ad845000 7f06adb46137
/lib64/libgcc_s.so.1 7f06ad625000 7f06ad83a3ff
/lib64/libc.so.6 7f06ad257000 7f06ad6241df
# END_IMAGE_ADDRESSES
# ==============================================
# STATS FOR TID 0 OS-TID 40701 EMIT# 1
# ==============================================
# EMIT_TOP_BLOCK_STATS FOR TID 0 OS-TID 40701 EMIT # 1 EVENT=ICOUNT
BLOCK: 0 PC: 00007f06c2f41f98 ICOUNT: 401832 EXECUTIONS: 50229 #BYTES: 24 %: 31.1 cumltv%: 31.1 FN: _dl_lookup_symbol_x IMG: /lib64/ld-linux-x86-64.so.2 OFFSET: 9f98
XDIS 00007f06c2f41f98: BASE 4C89F1 mov rcx, r14
XDIS 00007f06c2f41f9b: BASE 4883C201 add rdx, 0x1
XDIS 00007f06c2f41f9f: BASE 48C1E105 shl rcx, 0x5
XDIS 00007f06c2f41fa3: BASE 4901CE add r14, rcx
XDIS 00007f06c2f41fa6: BASE 4901C6 add r14, rax
The summary of the file is at the end. It is the count of each instruction. It is as follows:
# END_STATIC_STATS
# EMIT_GLOBAL_DYNAMIC_STATS EMIT# 1
#
# $global-dynamic-counts
#
# opcode count
#
*stack-read 88206
*stack-write 94189
*iprel-read 10362
*iprel-write 3192
*mem-read-1 108318
*mem-read-2 5363
*mem-read-4 40478
*mem-read-8 160505
*mem-read-16 135
*mem-write-1 6006
*mem-write-2 92
*mem-write-4 17210
*mem-write-8 96837
*mem-write-16 17
*mem-write-56 77
*mem-read 314876
*mem-write 120239
*mem 431262
*isa-ext-BASE 1289406
*isa-ext-LONGMODE 2921
*isa-ext-SSE 17
*isa-ext-SSE2 386
*isa-ext-SSE3 8
*isa-ext-SSE4 16
*isa-ext-SSE4A 77
*isa-ext-SSSE3 8
*isa-ext-XSAVE 78
*isa-ext-XSAVEC 77
*isa-set-CMOV 2163
*isa-set-FAT_NOP 6615
*isa-set-I186 59846
*isa-set-I386 74493
*isa-set-I486REAL 53
*isa-set-I86 1146227
*isa-set-LONGMODE 2921
*isa-set-PENTIUMREAL 9
*isa-set-SSE 17
*isa-set-SSE2 386
*isa-set-SSE3 8
*isa-set-SSE4 16
*isa-set-SSSE3 8
*isa-set-XSAVE 78
*isa-set-XSAVEC 77
*category-BINARY 308809
*category-BITBYTE 1404
*category-CALL 8211
*category-CMOV 2163
*category-COND_BR 199661
*category-CONVERT 35
*category-DATAXFER 421085
*category-LOGICAL 162419
*category-MISC 31371
*category-NOP 27
*category-POP 25409
*category-PUSH 25655
*category-RET 8207
*category-ROTATE 51
*category-SEMAPHORE 25
*category-SETCC 5918
*category-SHIFT 72515
*category-SSE 199
*category-STRINGOP 5610
*category-SYSCALL 80
*category-SYSTEM 9
*category-UNCOND_BR 7284
*category-WIDENOP 6615
*category-XSAVE 155
*ilen-1 28831
*ilen-2 321305
*ilen-3 419435
*ilen-4 243809
*ilen-5 78386
*ilen-6 89976
*ilen-7 79250
*ilen-8 21368
*ilen-9 6027
*ilen-10 288
*ilen-11 4230
*ilen-12 12
*nop-ilen-1 18
*nop-ilen-2 9
*nop-ilen-3 208
*nop-ilen-4 62
*nop-ilen-5 30
*nop-ilen-6 1353
*nop-ilen-7 1337
*nop-ilen-8 3550
*nop-ilen-9 8
*nop-ilen-10 67
*scalar-simd 2
*sse-scalar 2
*sse-packed 433
*gpr64 735707
*legacy-prefixes-0 549469
*legacy-prefixes-1 742674
*legacy-prefixes-2 774
*legacy-prefixes-and-escapes-0 408895
*legacy-prefixes-and-escapes-1 871048
*legacy-prefixes-and-escapes-2 12942
*legacy-prefixes-and-escapes-3 30
*legacy-prefixes-and-escapes-4 2
*segment_prefix 505
*rep_prefix 5610
*66_prefix 3387
*rex_prefix 734720
*rexw_prefix 660561
*rexr_prefix 142599
*rexx_prefix 2796
*rexb_prefix 222778
*loadop 146486
*one-memops 437559
*two-memops 318
*cond-branch-forward 108842
*cond-branch-backward 90819
*disp_only 363
*index_disp 20
*base_only 193858
*base_disp 220932
*base_index 16954
*base_index_disp 5750
*scale_1 438994
*scale_2 10184
*scale_4 5397
*scale_8 14640
*memdisp8 135427
*memdisp32 108032
*elements_i1_128 125
*elements_i8_16 86
*elements_i32_1 78
*elements_i32_4 19
*elements_i64_2 8
*elements_i128_1 8
*convert_i32_1 34
*convert_i64_1 1
*dataxfer_i32_1 1
*dataxfer_i32_4 91
*dataxfer_fp_single_4 17
*dataxfer_fp_double_1 2
*PL-prefixed-UNCOND-BR 77
ADD 187340
AND 19786
BSF 73
BSR 2
BT 1329
CALL_NEAR 8211
CDQE 34
CMOVB 625
CMOVBE 21
CMOVNBE 15
CMOVNS 128
CMOVNZ 45
CMOVZ 1329
CMP 76504
CMPXCHG 25
CPUID 28
CQO 1
DEC 173
DIV 1660
IDIV 1
IMUL 112
INC 34837
JB 191
JBE 4114
JLE 108
JMP 7284
JNB 1807
JNBE 9377
JNL 41
JNLE 1331
JNS 94
JNZ 115680
JS 72
JZ 66846
LDDQU 8
LEA 31338
LEAVE 5
MOV 351381
MOVAPS 17
MOVD 1
MOVDQA 18
MOVDQU 73
MOVSD_XMM 2
MOVSX 547
MOVSXD 2408
MOVZX 66604
MUL 12
NEG 181
NOP 6642
NOT 20
OR 2599
PALIGNR 8
PCMPEQB 86
PMOVMSKB 78
POP 25409
PSHUFD 1
PSLLDQ 8
PSUBB 8
PTEST 16
PUNPCKLBW 2
PUSH 25655
PXOR 109
RDTSC 9
REP_STOSB 5192
REP_STOSD 20
REP_STOSQ 398
RET_NEAR 8207
ROL 46
ROR 5
SAR 1381
SBB 61
SETNZ 96
SETZ 5822
SHL 51132
SHR 20002
SUB 7928
SYSCALL 80
TEST 122320
XCHG 34
XGETBV 1
XOR 17569
XRSTOR 77
XSAVEC 77
*total 1292917
# END_GLOBAL_DYNAMIC_STATS
This gives you a basic idea of how to use Intel SDE. We would suggest you to try the other platform options and tools and explore the output.
Happy learning.