C++: Represent temporary object initialization in AST and IR #4432

dbartol · 2020-10-07T15:52:37Z

Fixes github/codeql-c-analysis-team#157, which is the first part of github/codeql-c-analysis-team#154.

This is the github/codeql side of the work. The dbscheme is updated to add a new expression kind, @temp_init, which is treated as a conversion and thus not in the parent-child tree in the AST. These expressions are represented by the new TemporaryObjectExpr (naming bikeshed still open), which has the initializer as its getExpr(). Very few queries should notice the change, since most don't look at conversions anyway, and those that do generally care specifically about numeric conversions anyway.

The IR changes to support these expressions went pretty smoothly. The code is very similar to how we handle ThrowExpr, where we also have an initializer that initializes a temporary object.

This fixes all remaining consistency failures in the IR test directory, as well as a bunch of similar consistency failures in the syntax-zoo.

Most of the test diffs are just in tests that dump lists of expected expressions, and now include the occasional TemporaryObjectExpr. However, the diffs in library-tests/dataflow/taint-tests bear close examination (@geoffw0?):

This change fixes dozens of false negatives.
Fixes several ODR violations that were making it difficult to dump ASTs or IR.
Subtle tweaks to our mock STL implementation to make the behavior consistent with the standard. The biggest of these is to make std::make_pair("foo", "blah") return a std::pair<const char*, const char*>, rather than std::pair<const char[4], const char[5]> (the argument types are supposed to undergo "decay").
Several models have been updated to use isParamDeref() instead of isParam() as appropriate.

I also made some changes to how local taint flow is computed in DefaultTaintTracking.qll. The main change is that model flow now goes from Operand->Instruction, rather than Instruction->Instruction. This fixes a couple dozen cases of missed taint flow in library-tests/dataflow/DefaultTaintTracking.

To do:

No-op upgrade script
Diffs: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1532//artifact/changes.html

This change adds a new module, `PrintIRLocalFlow.qll`, which can be imported into any query that uses both `PrintIR.qll` and the IR dataflow library. The IR dump printed by `PrintIR.qll` will be annotated with information about how each operand and instruction participates in dataflow. For each operand and instruction, the following propeties are displayed: - `flow`: Which local operands/instructions have flow to this node, and which local operands/instruction this node has flow to. - `source`: `true` if this node is a source - `sink`: `true` if this node is a sink - `barrier`: Lists which kinds of barrier this node is. Can be zero or more of `full`, `in`, `out`, and `guard`. If the node is a guard barrier, the IR of the guarding instruction is also printed. We already had a way to print additional properties for instructions and blocks, but not for operands. I added support for operand properties to `IRPropertyProvider`. These are now printed in a curly-brace-enclosed list immediately after the corresponding operand. When printing flow, instructions are identified by their result ID (e.g., `m128`). Operands are identified by both the result ID of their instruction and their kind (e.g., `r145.left`). For flow from an operand to its use instruction, it just prints `result` at the operand, and prints only the operand kind on the instruction. Example output: ``` # 344| m344_34(vector<int, allocator<int>>) = Chi : total:m344_20{flow:def->@, @->result}, partial:m344_33{flow:def->@, @->result} # 344| flow = total->@, partial->@, +m344_33->@, @->+r347_3, @->v347_7.side_effect, @->m347_9.total, @->m344_20.1 ``` The `+` annotations indicate when the flow came from `isAdditionalFlowStep()`, rather than built-in local flow.

PrintAST now works on `library-tests/dataflow/taint-tests`.

MathiasVP · 2020-10-26T17:16:35Z

cpp/ql/src/semmle/code/cpp/ir/dataflow/DefaultTaintTracking.qll

 cached
-private predicate instructionTaintStep(Instruction i1, Instruction i2) {
+private predicate commonTaintStep(DataFlow::Node fromNode, DataFlow::Node toNode) {
+  instructionToInstructionTaintStep(fromNode.asInstruction(), toNode.asInstruction())


To be consistent with how we interleave instruction -> operand steps in dataflow we should probably convert this to pure instruction -> operand and operand -> instruction flow at some point once this PR has been merged. But I'm fine with these changes as they are now!

Yeah, I just didn't want to change more than I already had to.

dbartol · 2020-11-02T19:34:58Z

Very few diffs. The implicit-bitfield-downcast is a side effect of fixing a "one Expr with multiple immediate Conversions" issue. The correct fix is suggested in https://github.com/github/codeql-c-analysis-team/issues/180.

dbartol · 2020-11-02T21:51:17Z

JDK diff looks like a true positive. A call to a pure constructor wasn't treated as being a PureExprInVoidContext because it had a void return type. With the addition of the temporary object into which the object is constructed, the return type is now the type of the class, so we'll treat it the same as a call to a pure function that returns that class.

dbartol · 2020-11-02T22:27:51Z

php-src diff did not repo locally.

dbartol · 2020-11-02T23:05:26Z

Diffs look good. Perf is nearly unchanged. This is ready for final sign-off.

geoffw0

I've just reviewed the code changes as best I can and they look good to me. I remember checking the taint test result changes from this PR quite thoroughly before and was very happy with them as well.

A comment by @jbj appears to have gone missing from the front page of the PR. Apart from that, it appears all concerns have been addressed.

I'm keen to merge this today. We've certainly all had plenty of opportunity to review at this point. I've started SAMATE runs (https://jenkins.internal.semmle.com/job/Security/job/SAMATE/job/SAMATE-cpp/548/ and https://jenkins.internal.semmle.com/job/Security/job/SAMATE/job/SAMATE-cpp2/31/) on this branch, mostly for my own information - they will take ~18 hours to complete so lets not wait for them.

cpp/ql/src/semmle/code/cpp/ir/dataflow/DefaultTaintTracking.qll

jbj · 2020-11-03T12:28:55Z

Please don't merge this PR until the rc/1.26 branch has been created. At the time of writing this, it's waiting for a submodule bump to finish testing.

geoffw0 · 2020-11-03T14:08:10Z

Please don't merge this PR until the rc/1.26 branch has been created.

I foolishly assumed that had happened yesterday as planned. Is there somewhere I can check for sure, or can you post here when it's done?

jbj · 2020-11-03T15:26:54Z

I foolishly assumed that had happened yesterday as planned. Is there somewhere I can check for sure, or can you post here when it's done?

It's done, and we can move forward. But this branch can only be merged together with its internal PR, and the internal PR has both a test failure and a submodule conflict.

dbartol · 2020-11-03T18:46:16Z

@jbj

You've added a lot of code to the IR generation to bridge between how the IR likes to see the AST and how the AST looks in the database, especially with the last commit about member accesses on prvalues. Would it be an advantage to move any of that code to the layer that's in between the IR and the database: the QL AST classes?

The fixups that I'm doing involve introducing new temporary objects where there were none before, so I'm not sure how I'd do that in the AST layer without drastically changing how we implement the AST. I've considered fixing this all up in the extractor, but if it's primarily for the IR's benefit, I think I'd prefer to add more QL to the IR construction than more C++ to the extractor.

A performance regression in `definitionByReferenceNodeFromArgument#ff` was ultimately caused by a join on parameter indexes in `DefinitionByReferenceNode.getArgument`. Joining on numbers in QL is always fragile, and somehow the changes in github#4432 had caused the join order here to break. Instead of tweaking the join order in the slow predicate itself, I added `pragma[noinline]` to one of the predicates involved in the join on parameter indexes. This should prevent us from getting similar performance problems in the future when we write code that joins on parameter numbers. Joining on indexes is always risky, but it's even more risky when one of the predicates in the join is inlined by the compiler and expands to further joins. I tested performance by running `CgiXss.ql` on a ChakraCore snapshot. Tuple counts before (I interrupted execution after five minutes or so): (626s) Tuple counts for DataFlowUtil::definitionByReferenceNodeFromArgument#ff: 58162 ~0% {3} r1 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, -1, I.<0> 26934 ~0% {2} r2 = JOIN r1 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 2 OUTPUT r1.<0>, r1.<2> 26934 ~1% {2} r3 = JOIN r2 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r2.<1> 26850 ~1% {2} r4 = JOIN r3 WITH Instruction::CallInstruction::getThisArgumentOperand_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r3.<1> 26850 ~0% {2} r5 = JOIN r4 WITH Operand::Operand::getDef_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r4.<1> 26850 ~1% {2} r6 = JOIN r5 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r5.<1> 58162 ~0% {2} r7 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, I.<0> 58162 ~4% {3} r8 = JOIN r7 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 1 OUTPUT R.<1>, r7.<1>, r7.<0> 4026581120 ~0% {4} r9 = JOIN r8 WITH Instruction::CallInstruction::getPositionalArgumentOperand_dispred#fff_102#join_rhs AS R ON FIRST 1 OUTPUT r8.<2>, R.<1>, r8.<1>, R.<2> 31154 ~4% {2} r10 = JOIN r9 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 2 OUTPUT r9.<3>, r9.<2> 31154 ~8% {2} r11 = JOIN r10 WITH Operand::Operand::getDef_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r10.<1> 31154 ~0% {2} r12 = JOIN r11 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r11.<1> 58004 ~0% {2} r13 = r6 \/ r12 return r13 Tuple counts after: (0s) Tuple counts for DataFlowUtil::definitionByReferenceNodeFromArgument#ff: 385785 ~6% {2} r1 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, I.<0> 385785 ~0% {3} r2 = JOIN r1 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 1 OUTPUT r1.<0>, r1.<1>, R.<1> 385785 ~1% {3} r3 = JOIN r2 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r2.<2>, r2.<1> 198736 ~4% {2} r4 = JOIN r3 WITH Instruction::CallInstruction::getPositionalArgument#fff AS R ON FIRST 2 OUTPUT R.<2>, r3.<2> 198736 ~0% {2} r5 = JOIN r4 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r4.<1> 385785 ~1% {3} r6 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, -1, I.<0> 186891 ~1% {2} r7 = JOIN r6 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 2 OUTPUT r6.<0>, r6.<2> 186891 ~2% {2} r8 = JOIN r7 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r7.<1> 183201 ~3% {2} r9 = JOIN r8 WITH Instruction::CallInstruction::getThisArgumentOperand_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r8.<1> 183201 ~0% {2} r10 = JOIN r9 WITH Operand::Operand::getDef_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r9.<1> 175449 ~8% {2} r11 = JOIN r10 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r10.<1> 374185 ~3% {2} r12 = r5 \/ r11 return r12

Dave Bartolomeo added 4 commits October 5, 2020 17:53

AST and IR support for TemporaryObjectExpr

badb117

Fix test expectations

80b832e

Merge remote-tracking branch 'upstream/main' into work

22638fd

Fix test expectations in syntax-zoo

bf8340f

dbartol added C++ depends on internal PR This PR should only be merged in sync with an internal Semmle PR labels Oct 7, 2020

Dave Bartolomeo added 21 commits October 7, 2020 13:14

Fix test expectations

1e455f0

C++: Fix pointer flow through temporary objects

e4bfb75

Update taint test to propagate through string constructor

dfe69d8

Clean up test formatting and accept new lines in results

93f5ae4

Merge remote-tracking branch 'upstream/main' into work

fba4313

Fix test code that returns reference to local

ceea5b3

Fix ODR violations in test code

3767a52

Remove more ODR violations from test code.

14ac985

PrintAST now works on `library-tests/dataflow/taint-tests`.

Dump static call target for Call instructions

6a9ecf7

Insert a load of the temporary object for arguments passed by value

9afddf0

C++: Improve dataflow model for copy/move constructors

cf19fcf

Accept good test diffs

af799a7

Accept test diffs from set.cpp (50 new good results!)

675256a

Fix subtle typing issue with std::makr_pair

7da4eef

Handle parameter indirections in make_shared and make_unique

686f5aa

Fix test expectations due to pair/make_pair fixes

1b53c46

Model copy-ish constructors for std::pair

1dae8f6

Print partial flow info in PrintIRLocalFlow.qll

4814dcf

Accept improved test results

e4fdf69

Print targets of Load and Store instructions in IR dump

4e0afb0

github-actions bot added the C# label Oct 17, 2020

Dave Bartolomeo added 2 commits October 17, 2020 15:14

Merge from main

40cd96e

Fix modeling of std::set::emplace

eb9cea4

Fix taint propagation to qualifier objects and update test expectations

3fce971

MathiasVP reviewed Oct 26, 2020

View reviewed changes

Dave Bartolomeo added 3 commits October 28, 2020 14:47

Fix test expectations for new nodes and edges in path queries

5a6cd4a

Merge from main

7a2c59c

Fixup after merge

c49e33f

MathiasVP mentioned this pull request Oct 29, 2020

C++/Python: Better syntax for false positives and negatives in inline expectations #4571

Merged

Dave Bartolomeo added 4 commits October 30, 2020 12:02

Merge from main

4237341

Merge remote-tracking branch 'upstream/main' into work

ec398b2

Fixup after merge

be180aa

Fix PR feedback

69dee15

Dave Bartolomeo added 2 commits November 2, 2020 17:31

Merge remote-tracking branch 'upstream/main' into work

e9d1f0d

Fix annotations

0d1fbd1

criemen mentioned this pull request Nov 3, 2020

C++: Model std::vector emplace and emplace_back() #4589

Merged

geoffw0 reviewed Nov 3, 2020

View reviewed changes

cpp/ql/src/semmle/code/cpp/ir/dataflow/DefaultTaintTracking.qll Show resolved Hide resolved

Dave Bartolomeo added 2 commits November 3, 2020 11:33

Merge remote-tracking branch 'upstream/main' into work

f0b9794

Fix test expectation

4cc9110

jbj approved these changes Nov 4, 2020

View reviewed changes

igfoo merged commit 6ff939d into github:main Nov 4, 2020

jbj mentioned this pull request Nov 9, 2020

C++: pragma[noinline] parameter index predicates #4637

Merged

MathiasVP mentioned this pull request Nov 9, 2020

C++: instruction -> operand field flow #4573

Merged

geoffw0 mentioned this pull request Nov 10, 2020

C++: Improve MemberFunction.qll data flows #4642

Merged

Nov	DEC	Jan
	26
2024	2025	2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++: Represent temporary object initialization in AST and IR #4432

C++: Represent temporary object initialization in AST and IR #4432

Uh oh!

dbartol commented Oct 7, 2020 •

edited

Loading

Uh oh!

MathiasVP Oct 26, 2020

Uh oh!

dbartol Oct 28, 2020

Uh oh!

dbartol commented Nov 2, 2020

Uh oh!

dbartol commented Nov 2, 2020

Uh oh!

dbartol commented Nov 2, 2020

Uh oh!

dbartol commented Nov 2, 2020

Uh oh!

geoffw0 left a comment

Uh oh!

Uh oh!

jbj commented Nov 3, 2020

Uh oh!

geoffw0 commented Nov 3, 2020

Uh oh!

jbj commented Nov 3, 2020

Uh oh!

dbartol commented Nov 3, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

C++: Represent temporary object initialization in AST and IR #4432

C++: Represent temporary object initialization in AST and IR #4432

Uh oh!

Conversation

dbartol commented Oct 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MathiasVP Oct 26, 2020

Choose a reason for hiding this comment

Uh oh!

dbartol Oct 28, 2020

Choose a reason for hiding this comment

Uh oh!

dbartol commented Nov 2, 2020

Uh oh!

dbartol commented Nov 2, 2020

Uh oh!

dbartol commented Nov 2, 2020

Uh oh!

dbartol commented Nov 2, 2020

Uh oh!

geoffw0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jbj commented Nov 3, 2020

Uh oh!

geoffw0 commented Nov 3, 2020

Uh oh!

jbj commented Nov 3, 2020

Uh oh!

dbartol commented Nov 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dbartol commented Oct 7, 2020 •

edited

Loading

dbartol commented Nov 3, 2020 •

edited

Loading