Ren'Py Atom improved search function

Playstorepers · Dec 22, 2020

Does anyone know, whether it is possible to look up specific things only a person said?

e.g.

a "hello"
b "hello"
c "no"
b "no yeah no"
c "hello"

and now I want to look up the word "no" for example, but only when c said it.
I have about 50000 lines of text, which I have to revise and looking that up with this method would save me a lot of time.

TDoddery · Dec 22, 2020

The way I use it (Ctrl F) it looks for an exact literal match (but case insensitive) so it would do that yeah.

Just type: c "no"

mickydoo · Dec 22, 2020

Use CTRL shift F (or find/find in project) and search like TDoddery said, or try "c no" if that way don't work

Edit - if you know what script the line is in you can just use ctrl F

79flavors · Dec 23, 2020

I would search in Atom by pressing <SHIFT+CTRL+F> (find in all files) or <CTRL+F> (find in this file).
If I was searching in all files, I would set the File/Directory pattern to *.rpy to only search source renpy files (rather than trying to look in all your pictures too).

You're talking about all text for a specific character... I would search for c (<space> <space> c <space>).
Most games will have ALL characters speech indented by at least 4 spaces, and you're unlikely to have any other text anywhere in the game that is a single characters with 2 spaces before it and 1 space after it. (imagine if you had a character called i or a - that's why I use 2 spaces before rather than just 1).

If you're looking for a specific phrase, by a specific character... Well, yeah... c "no".
That's pretty straight forward if the text is always going to be in the same place and the author didn't do something like end some lines with a fullstop or exclamation mark and then on other lines forgot to end with punctuation.

If it's more complicated than that... you'll need a more complicated solution...

So if you want to search for say character "c" saying "mom", but anywhere within the whole dialogue line... you'll need to learn a bit of

You must be registered to see the links

. (regular expression). regex has been around a lot of years and is very well established, but it can be very intimating when you first see it. It's a pattern matching tool, where rather than searching for things like "c" you can search for all characters from "a" to "z" (and WAY more complicated searches).

And to the RexEx experts... Yeah, I know I'm massively oversimplifying things here and using incorrect terminology. I'm trying to keep things as simple as possible for this example.
Edit: Plus I've no soddin' clue about RegEx beyond what I've bungled my way through. This is definitely the blind leading the blind.

Regex has special strings for matching. Like . is "any character", \s is "a space" is \S is "not a space" (edit: Note the use of lowercase and uppercase "s" / "S"). There are also qualifiers like *, which is "any number of the previous ~~character~~ pattern"... so .* is "any number of characters". The list is extensive, but it's the combination of stuff that gives it it's power.

In Atom, you'd use the same "find" or "find all" search buttons, but then switch on "Use regex" by selecting the .* button in the lower right of the screen.

Before saying the search pattern I would probably use to search, I'd like to introduce 2 other regex patterns. \b specifies a "word boundary" and pretty much ensures that you don't accidentally find a smaller word as part of a larger word (like searching "ass", but matching "assume"). It's great for stuff like searching for "c". The second thing is something called capture groups, where you can put parenthesis around parts of the text to group them together. They aren't specifically needed for this example, but I would tend to use them anyway out of habit (I'll explain later).

So I would use a regex enabled search to search for (\bc\b)(.*)(\bmom\b).

Breaking this down...

(\bc\b) (.*) (\bmom\b) -- spreading things out to make sections more obvious.

\bc\b .* \bmom\b -- removing the brackets for groupings.

c .* mom -- removing the "\b" for word boundaries.

... so we're searching for "c" "some random characters" and "mom".

The reason I used the brackets to group specific parts of the search together is because of how often I also use search and replace.

Once you use those brackets, Atom marks the groups with numbers. (\bc\b) is group #1, (.*) is group #2 and (\bmom\b) is group #3. You can use these in the replace field as $1, $2 and $3.

So if you wanted to replace "mom" with "land lady", you could:

RegEx search : (\bc\b)(.*)(\bmom\b)

RegEx replace : $1$2land lady

Anyway, that's why I use brackets.

If you do use RegEx through... just be careful. It's WAY to easy too think you're being clever and then accidentally replace a lot of text which is part of a filename rather than dialogue for example (I speak from painful experience). Or part of a screen: code. Yeah... be careful.

anne O'nymous · Dec 23, 2020

79flavors said:
And to the RexEx experts... Yeah, I know I'm massively oversimplifying things here and using incorrect terminology. I'm trying to keep things as simple as possible for this example.

As far as I'm concerned, you're doing it fine. You just simplified few things a little too much, what can have unwanted side effects. It's the problem with RegEx, it's something really sensible, and it's not this easy to simplify the explanations without taking risks.

As example, the use of a quantifier followed by "?" (so by example "*?") mean "and please don't be greedy". Therefore, the difference between [ab]*a{2} (whatever the character "a" or "b", any often that they appear consecutively, ended by two consecutive "a") and [ab]*?a{2} (what mean the same things, but in an none greedy way) will make the result radically change. This despite the difference seeming anecdotal ; only the second one would match "abaaabbbbaa".
For the first one, it will be :

"abaaabbbbaa" match "[ab]*"
I don't found the two trailing "aa" that should follow.

While for the second one it will be :

"abaaabbbbaa" match "[ab]*"
I don't found the two trailing "aa" that should follow.
What happen if I'm less greedy ?
"abaaabbbb" match "[ab]*"
Now there's trailing "aa".

In the same time, it imply that [ab]* isn't the same thing that [ab]*?a{2}. Since the first one would effectively catch "abaaabbbbaa", but also catch things like "abaaabbbbab" or ""abaaabbbbba".

It's what make RegEx difficult to use correctly, even for experts, because you don't just need to split the string you're searching into a pattern, but also have to make this pattern in such way that it will only match what you are searching.
But for basic search, it's not this difficult, because you apply it to words, and "mom" will always be "mom". Just be sure that you'll not catch words like "moment" or (well, don't find an example for "xxxmom") by using \b before and after the pattern (\bmom\b) to tell that what you search is a full word, and not a part of another word.

79flavors said:
\s is "a space" is \S is "not a space".

Here, you simplified a little too much, what can have unwanted side effects. It's not "space", but "blank character", with a "blank character" being any character that isn't shown when printed. Therefore it should (it depend of the language) match "space", "tabulation", and "carriage return" (the last character of the line, telling the editor to go to the next line).

Also, and absolutely not your fault, but the inline code make it difficult to distinguish the case. So the first one is the lower case ("s"), and the second the upper case ("S").

79flavors said:
There are also qualifiers like *, which is "any number of the previous character"...

Precisely it's "any number of the previous pattern, with '0' being a valid number".
Note the difference, it's not specifically a character that can come more than once, but the "pattern" right before it.

Keeping my [ab]*?a{2} pattern, it will still catch something like "abaa", but also catch something like "aa" ("0 time a letter that is "a" or "b", followed by two consecutive time the letter "a").

If one want to have "at least one occurrence of the pattern, that can be repeated as often as it want", it's + that have to be used.
Therefore, while [ab]*?a{2} will catch "aa", [ab]+?a{2} will not, because there isn't either "a" or "b" before the two consecutive "a".

If you've difficulty to works with RegEx, one things that generally help is to describe it by words in a very precise way. Therefore, the pattern [ab]*?a{2} mean :

Either the character "a" or the character "b", excluding all the other possible characters
That can be omitted or be present more than once
In such way that it will not impact the following pattern
Followed by the character "a"
That have to be present exactly two time.

When expressed this way, you'll be more likely to found the possible error in the logic behind the pattern you're using.

79flavors said:
So I would use a regex enabled search to search for (\bc\b)(.*)(\bmom\b).

An independent pattern that is
- Anything except a letter
- Followed by the sayer variable name
- Followed by anything except a letter
Followed by another independent pattern that is
- Any character
- That can be omitted or present more than once

Beeeeeeep ! Logical error detected.

It will catch "object.c", "object.c.whatever", "object.c( parameters)" and few things like that.

The pattern should start in a more precise way :

Starts by
Only spaces repeated any number of time, or omitted [This is to catch those who don't always indent]

Therefore what it should be ^\s*c\b(.*)

Yet it will still catch things like " c.whatever" and " c( parameters)".
[Side note: The second is a valid catch for a dialog line, but don't correspond to the present case, where the dialog lines are expected to be sayer "line")

Therefore, you need to be more explicit on what is expected by (.*) :

separated by an optional space that can be repeated without limits
then something that will start by a simple or a double quote
And finally there can be any kind of characters that you want

So: ^\s*c\b\s*['"].*

It's possible to be even more precise (having the ending quote match the starting one, and taking count of the possible parameters added to the "say" statement), but it's over killing and far to be an "apprentice included" course.

Edit: Correcting typos that messed with the presentation.

hiya02 · Dec 23, 2020

Search by using regular expressions?

Playstorepers · Dec 23, 2020

First of all:
I'm amazed by all the detailed answer and the help, that I got.

Second of all:
My bad, for explaining my problem with such a shitty example.
It is indeed, the most complicated case: I'm looking for words spoken by one character, which can be found anywhere in the sentence.

Third of all:
I'll try learning all suggested methods, like regex, etc.
There's nothing better than learning new tricks.

Thanks everyone for your suggestions!

EDIT: Regular expression works like a charm.
I don't know the details and I might trip up, but for my use (which is search only), it works very fine.
Your explanations were more than enough for me to handle this.
Thanks, everyone.

anne O'nymous · Dec 23, 2020

Playstorepers said:
EDIT: Regular expression works like a charm.
I don't know the details and I might trip up, but for my use (which is search only), it works very fine.

For everything that imply human control (like searching for something), really anyone can use them. Perhaps that it will catch more things than expected, but it will still limit the number of result found, what is already a benefit.

Ren'Py Atom improved search function

Playstorepers

Member

TDoddery

Member

mickydoo

Fudged it again.

79flavors

Well-Known Member

anne O'nymous

I'm not grumpy, I'm just coded that way.

hiya02

Member

Playstorepers

Member

anne O'nymous

I'm not grumpy, I'm just coded that way.