Skip to content

Commit de2dc81

Browse files
committed
tab navigation improvements, search updates, and additional stemming language support
1 parent 39bf9f3 commit de2dc81

138 files changed

Lines changed: 16953 additions & 2854 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@ The InterroBot plugin ecosystem is designed for power users. Whether you're buil
2121
InterroBot hosts an iframe of your webpage and exposes an API from which you can pull data down for analysis.
2222

2323
If you're familiar with vanilla TypeScript or JavaScript, creating a custom plugin script for InterroBot is remarkably straight forward. First you start with a [bare-bones HTML file](https://raw.githubusercontent.com/interrobot/interrobot-plugin/refs/heads/master/examples/vanillajs/basic.html) and a script extending the Plugin base class.
24-
2524
```javascript
2625
// TypeScript vs. JavaScript, both are fine. See examples.
2726
import { Plugin } from "./src/ts/core/plugin";
@@ -48,13 +47,15 @@ Plugin.initialize(BasicExamplePlugin);
4847

4948
BasicExamplePlugin will not do much at this point, but it will load and run the default `index()` behavior.
5049
You can, of course, override the default `index()` behavior, rendering your page however you wish.
51-
5250
```javascript
5351
protected async index() {
52+
5453
// add your form and supporting HTML
5554
this.render(`<div>HTML</div>`);
55+
5656
// initialize the plugin within InterroBot, from within iframe
5757
await this.initData({}, []);
58+
5859
// add handlers to the form
5960
const button = document.querySelector("button");
6061
button.addEventListener("click", async (ev) => {
@@ -65,10 +66,9 @@ protected async index() {
6566

6667
The `process()` method called above would be where you process data. Here a query is executed on
6768
the crawl index, and each result run through the exampleResultsHandler.
68-
69-
7069
```javascript
7170
protected async process() {
71+
7272
// gather title words and running counts with a result handler
7373
const titleWords: Map<string, number> = new Map<string, number>();
7474
let resultsMap: Map<number, SearchResult>;
@@ -81,24 +81,36 @@ protected async process() {
8181

8282
// projectId comes for free as a member of Plugin
8383
const projectId = this.getProjectId();
84+
8485
// build a query, these are exactly as you'd type them into InterroBot search
8586
const freeQueryString = "headers: text/html";
86-
// pipe delimited fields you want retrieved
87+
88+
8789
// id and url come with the base model, everything else costs time
88-
const fields = "name";
90+
// here, I just grab the "name" field
8991
let internalHtmlPagesQuery = new InterroBot.Core.SearchQuery({
9092
project: projectId,
9193
query: freeQueryString,
92-
fields: fields,
94+
fields: ["name"],
9395
type: InterroBot.Core.SearchQueryType.Any,
9496
includeExternal: false,
9597
includeNoRobots: false,
9698
});
9799

98100
// run each SearchResult through its handler, and we're done processing
99-
await InterroBot.Core.Search.execute(internalHtmlPagesQuery, this.resultsMap, async (result) => {
100-
await exampleResultHandler(result, titleWords);
101-
}, true, false, "Processing…");
101+
await InterroBot.Core.Search.execute(
102+
internalHtmlPagesQuery,
103+
this.resultsMap,
104+
async (result) => {
105+
await exampleResultHandler(result, titleWords);
106+
},
107+
{
108+
paginate: true,
109+
showProgress: false,
110+
progressMessage: "Processing…"
111+
}
112+
);
113+
102114
// call for HTML presentation of titleWords with processing complete
103115
await this.report(titleWords);
104116
}
@@ -164,4 +176,4 @@ Retrieves a list of crawls using the Plugin API.
164176
MPL 2.0, with exceptions. This repo contains JavaScript to TypeScript ports and a Markdown library based on existing code, all contained within `./src/lib`. As they arrived under existing licenses, they will remain under those.
165177
166178
* *Typo.js*: TypeScript port continues under the original [Modified BSD License](https://raw.githubusercontent.com/cfinke/Typo.js/master/license.txt).
167-
* *Snowball.js*: TypeScript port continues under the original [MPL 1.1](https://raw.githubusercontent.com/fortnightlabs/snowball-js/master/LICENSE) license.
179+
* *Snowball.js*: TypeScript port continues under the original [MPL 1.1](https://raw.githubusercontent.com/fortnightlabs/snowball-js/master/LICENSE) license.

dist/js/commonjs/core/api.d.ts

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22
* Enumeration for different types of search queries.
33
*/
44
declare enum SearchQueryType {
5-
Page = 0,
6-
Asset = 1,
7-
Any = 2
5+
Page = "page",
6+
Asset = "asset",
7+
Any = "any"
88
}
99
interface SearchQueryParams {
1010
project: number;
1111
query: string;
12-
fields: string;
12+
fields: string[];
1313
type: SearchQueryType;
1414
includeExternal?: boolean;
1515
includeNoRobots?: boolean;
@@ -34,6 +34,11 @@ interface SearchResultJson {
3434
assets?: string[];
3535
origin?: string;
3636
}
37+
interface SearchExecuteOptions {
38+
paginate?: boolean;
39+
showProgress?: boolean;
40+
progressMessage?: string;
41+
}
3742
interface CrawlParams {
3843
id: number;
3944
project: number;
@@ -71,7 +76,7 @@ declare class PluginData {
7176
private project;
7277
/**
7378
* Creates an instance of PluginData.
74-
* @param params - The plugin data parameters.
79+
* @param params - Configuration object containing projectId, meta, defaultData, and autoformInputs
7580
*/
7681
constructor(params: PluginDataParams);
7782
/**
@@ -116,15 +121,15 @@ declare class SearchQuery {
116121
private static readonly validSorts;
117122
readonly project: number;
118123
readonly query: string;
119-
readonly fields: string;
124+
readonly fields: string[];
120125
readonly type: SearchQueryType;
121126
readonly includeExternal: boolean;
122127
readonly includeNoRobots: boolean;
123128
readonly sort: string;
124129
readonly perPage: number;
125130
/**
126131
* Creates an instance of SearchQuery.
127-
* @param params - The search query parameters.
132+
* @param params - Configuration object containing project, query, fields, type, includeExternal, and includeNoRobots
128133
*/
129134
constructor(params: SearchQueryParams);
130135
/**
@@ -138,13 +143,13 @@ declare class Search {
138143
private static resultsHaystackCacheKey;
139144
/**
140145
* Executes a search query.
141-
* @param query - The search query to execute.
142-
* @param existingResults - Map of existing results.
143-
* @param processingMessage - Message to display during processing.
144-
* @param resultHandler - Function to handle each search result.
145-
* @returns A promise that resolves to a boolean indicating if results were from cache.
146+
* @param query - The search query to execute
147+
* @param resultsMap - Map of existing results
148+
* @param resultHandler - Function to handle each search result
149+
* @param options - Optional configuration for pagination, progress display, and custom messages
150+
* @returns A promise that resolves to a boolean indicating if results were from cache
146151
*/
147-
static execute(query: SearchQuery, existingResults: Map<number, SearchResult>, resultHandler: any, deep?: boolean, quiet?: boolean, processingMessage?: string): Promise<boolean>;
152+
static execute(query: SearchQuery, resultsMap: Map<number, SearchResult>, resultHandler: (result: SearchResult) => Promise<void>, options?: SearchExecuteOptions): Promise<boolean>;
148153
/**
149154
* Sleeps for the specified number of milliseconds.
150155
* @param millis - The number of milliseconds to sleep.
@@ -241,7 +246,7 @@ declare class Crawl {
241246
report?: any;
242247
/**
243248
* Creates an instance of Crawl.
244-
* @param params - The crawl parameters.
249+
* @param params - Configuration object containing id, project, created, modified, complete, time, and report
245250
*/
246251
constructor(params: CrawlParams);
247252
/**
@@ -276,7 +281,7 @@ declare class Project {
276281
static readonly urlDeprectionWarning: string;
277282
/**
278283
* Creates an instance of Project.
279-
* @param params - The project parameters.
284+
* @param params - Configuration object containing id, created, modified, name, type, url, urls, and imageDataUri
280285
*/
281286
constructor(params: ProjectParams);
282287
/**
@@ -303,4 +308,4 @@ declare class Project {
303308
*/
304309
static getApiCrawls(project: number): Promise<Crawl[]>;
305310
}
306-
export { Project, Crawl, SearchQueryType, SearchQuery, Search, SearchResult, SearchResultJson, PluginData };
311+
export { Project, ProjectParams, Crawl, CrawlParams, SearchQueryType, SearchQuery, SearchQueryParams, Search, SearchExecuteOptions, SearchResult, SearchResultJson, PluginData, PluginDataParams };

dist/js/commonjs/core/api.js

Lines changed: 35 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -10,30 +10,18 @@ const plugin_js_1 = require("./plugin.js");
1010
*/
1111
var SearchQueryType;
1212
(function (SearchQueryType) {
13-
SearchQueryType[SearchQueryType["Page"] = 0] = "Page";
14-
SearchQueryType[SearchQueryType["Asset"] = 1] = "Asset";
15-
SearchQueryType[SearchQueryType["Any"] = 2] = "Any";
13+
SearchQueryType["Page"] = "page";
14+
SearchQueryType["Asset"] = "asset";
15+
SearchQueryType["Any"] = "any";
1616
})(SearchQueryType || (SearchQueryType = {}));
1717
exports.SearchQueryType = SearchQueryType;
18-
var SearchQuerySortField;
19-
(function (SearchQuerySortField) {
20-
SearchQuerySortField[SearchQuerySortField["Id"] = 0] = "Id";
21-
SearchQuerySortField[SearchQuerySortField["Time"] = 1] = "Time";
22-
SearchQuerySortField[SearchQuerySortField["Status"] = 2] = "Status";
23-
SearchQuerySortField[SearchQuerySortField["Url"] = 3] = "Url";
24-
})(SearchQuerySortField || (SearchQuerySortField = {}));
25-
var SearchQuerySortDirection;
26-
(function (SearchQuerySortDirection) {
27-
SearchQuerySortDirection[SearchQuerySortDirection["Ascending"] = 0] = "Ascending";
28-
SearchQuerySortDirection[SearchQuerySortDirection["Descending"] = 1] = "Descending";
29-
})(SearchQuerySortDirection || (SearchQuerySortDirection = {}));
3018
/**
3119
* Container for plugin settings
3220
*/
3321
class PluginData {
3422
/**
3523
* Creates an instance of PluginData.
36-
* @param params - The plugin data parameters.
24+
* @param params - Configuration object containing projectId, meta, defaultData, and autoformInputs
3725
*/
3826
constructor(params) {
3927
var _a;
@@ -378,15 +366,21 @@ exports.PluginData = PluginData;
378366
class SearchQuery {
379367
/**
380368
* Creates an instance of SearchQuery.
381-
* @param params - The search query parameters.
369+
* @param params - Configuration object containing project, query, fields, type, includeExternal, and includeNoRobots
382370
*/
383371
constructor(params) {
384372
var _a, _b, _c;
385373
this.includeExternal = true;
386374
this.includeNoRobots = false;
387375
this.project = params.project;
388376
this.query = params.query;
389-
this.fields = params.fields;
377+
// backcompat <=0.17 (piped string handling)
378+
if (typeof params.fields === "string") {
379+
this.fields = params.fields.split("|");
380+
}
381+
else {
382+
this.fields = params.fields;
383+
}
390384
this.type = params.type;
391385
this.includeExternal = (_a = params.includeExternal) !== null && _a !== void 0 ? _a : true;
392386
this.includeNoRobots = (_b = params.includeNoRobots) !== null && _b !== void 0 ? _b : false;
@@ -403,7 +397,7 @@ class SearchQuery {
403397
* @returns A string representing the cache key.
404398
*/
405399
getHaystackCacheKey() {
406-
return `${this.project}~${this.fields}~${this.type}~${this.includeExternal}~${this.includeNoRobots}`;
400+
return `${this.project}~${this.fields.join("|")}~${this.type}~${this.includeExternal}~${this.includeNoRobots}`;
407401
}
408402
}
409403
exports.SearchQuery = SearchQuery;
@@ -412,35 +406,36 @@ SearchQuery.validSorts = ["?", "id", "-id", "time", "-time", "status", "-status"
412406
class Search {
413407
/**
414408
* Executes a search query.
415-
* @param query - The search query to execute.
416-
* @param existingResults - Map of existing results.
417-
* @param processingMessage - Message to display during processing.
418-
* @param resultHandler - Function to handle each search result.
419-
* @returns A promise that resolves to a boolean indicating if results were from cache.
409+
* @param query - The search query to execute
410+
* @param resultsMap - Map of existing results
411+
* @param resultHandler - Function to handle each search result
412+
* @param options - Optional configuration for pagination, progress display, and custom messages
413+
* @returns A promise that resolves to a boolean indicating if results were from cache
420414
*/
421-
static async execute(query, existingResults, resultHandler, deep = false, quiet = true, processingMessage = "Processing...") {
415+
static async execute(query, resultsMap, resultHandler, options) {
422416
const timeStart = new Date().getTime();
417+
const { paginate = false, showProgress = true, progressMessage = "Processing..." } = options !== null && options !== void 0 ? options : {};
423418
// Promise<boolean> returned is a from-cache flag, true if cached
424-
if (query.getHaystackCacheKey() === Search.resultsHaystackCacheKey && existingResults) {
425-
const resultTotal = existingResults.size;
419+
if (query.getHaystackCacheKey() === Search.resultsHaystackCacheKey && resultsMap) {
420+
const resultTotal = resultsMap.size;
426421
// reuse api reuslts
427422
// print something to screen to inform user of operation
428423
// this is a blitz, doesn't get the http request breathing room of api http requests
429424
// anyways, paint first, then saturate cpu
430-
if (quiet === false) {
431-
const eventStart = new CustomEvent("ProcessingMessage", { detail: { action: "set", message: processingMessage } });
425+
if (showProgress === true) {
426+
const eventStart = new CustomEvent("ProcessingMessage", { detail: { action: "set", message: progressMessage } });
432427
document.dispatchEvent(eventStart);
433428
}
434429
// give main thread a short break to render progress
435430
await Search.sleep(16);
436431
// note for of loop with sleep mod 100 works, looks smooth, but slows the operation by > 20%
437432
// this is faster, but it can't paint progress well as it can saturate the main thread
438433
let i = 0;
439-
await existingResults.forEach(async (result, resultId) => {
434+
await resultsMap.forEach(async (result, resultId) => {
440435
await resultHandler(result);
441436
});
442437
plugin_js_1.Plugin.logTiming(`Processed ${resultTotal.toLocaleString()} search result(s)`, new Date().getTime() - timeStart);
443-
if (quiet === false) {
438+
if (showProgress === true) {
444439
const msg = { detail: { action: "clear" } };
445440
const eventFinished = new CustomEvent("ProcessingMessage", msg);
446441
document.dispatchEvent(eventFinished);
@@ -455,9 +450,9 @@ class Search {
455450
"project": query.project,
456451
"query": query.query,
457452
"external": query.includeExternal,
458-
"type": SearchQueryType[query.type].toLowerCase(),
453+
"type": query.type,
459454
"offset": 0,
460-
"fields": query.fields.split("|"),
455+
"fields": query.fields,
461456
"norobots": query.includeNoRobots,
462457
"sort": query.sort,
463458
"perpage": query.perPage,
@@ -470,9 +465,13 @@ class Search {
470465
const result = results[i];
471466
await Search.handleResult(result, resultTotal, resultHandler);
472467
}
473-
while (responseJson["__meta__"]["results"]["pagination"]["nextOffset"] !== null && deep === true) {
468+
while (responseJson["__meta__"]["results"]["pagination"]["nextOffset"] !== null && paginate === true) {
474469
const next = responseJson["__meta__"]["results"]["pagination"]["nextOffset"];
475470
kwargs["offset"] = next;
471+
if (query.sort === "?" && next > 0) {
472+
console.warn("Random sort (?) with pagination generates fresh randomness on each page. " +
473+
"Consider maxing perpage (100) and using 1 page of results when sampling.");
474+
}
476475
responseJson = await plugin_js_1.Plugin.postApiRequest("GetResources", kwargs);
477476
results = responseJson.results;
478477
for (let i = 0; i < results.length; i++) {
@@ -633,7 +632,7 @@ SearchResult.wordWhitespaceRe = /\s+/g;
633632
class Crawl {
634633
/**
635634
* Creates an instance of Crawl.
636-
* @param params - The crawl parameters.
635+
* @param params - Configuration object containing id, project, created, modified, complete, time, and report
637636
*/
638637
constructor(params) {
639638
this.id = -1;
@@ -690,7 +689,7 @@ exports.Crawl = Crawl;
690689
class Project {
691690
/**
692691
* Creates an instance of Project.
693-
* @param params - The project parameters.
692+
* @param params - Configuration object containing id, created, modified, name, type, url, urls, and imageDataUri
694693
*/
695694
constructor(params) {
696695
this.id = -1;

0 commit comments

Comments
 (0)